I want to be able to use $\LaTeX{}$ in most of the documents I create, in order to produce formatted mathematics1. This is particularly true when I’m teaching, where I want to produce typed-up lecture notes, problems sheets, and solutions sheets, all of which require this kind of formatting. Those who work on topics which require this kind of mathematical content are therefore used to producing many or most of their documents as $\LaTeX{}$ documents, probably compiled to pdf using pdflatex2.

Accessibility

Unfortunately, despite being a common format, pdf is not very accessible. By this, I mean that a user cannot make the document easier to read for themselves by changing the font size or type, the contrast of the document, or any other thing that they may wish to change. A pdf can be zoomed, but this doesn’t change how many words appear in any line, and so the user then has to also manage horizontal scrolling as well as vertical scrolling. Anyone who’s ever tried to read a double column document on their phone likely knows how frustrating a user experience this is.

Moreover, to the best of my understanding, screenreaders are usually unable to handle pdf documents. This might make pdfs simply inaccessible to some members of your audience.

Here’s an example document, which was written in $\LaTeX{}$, with the pdf being produced by pdflatex. If you’re viewing this on a computer, try resizing your browser window to a variety of sizes, and think at which sizes this would be an easily legible document for you. The source tex for this file can be found here 3; thanks to Anne Taormina for the tex file.

The best way to make sure your content is accessible to all is to publish it as an html document. A user can then change the font sizes to their needs, and the document can easily be made responsive, so that the content is sized appropriately based on the size of the device being used to access it. With a little more effort, html can also support keyboard navigation to assist those with reduced mobility, as well as adjustable contrast and font settings.

Screenreaders can also much more easily parse the structure of an html document, and so html is far more accessible to those that rely on such devices.

Pandoc

Here’s the same example document, now in html format. Again, if you’re reading this on a computer, try resizing your browser window to see how it behaves at different sizes. You can also change the text size using your browser (possibly using a setting in your browsers ‘view’ menu). This version of the document should also be far more accessible to a screenreader4.

This html file was produced using the same tex file as was used to produce the pdf. Instead of running pdflatex against the tex file, we instead run a tool called pandoc, with the tex file as the source file and asking pandoc to produce html output5.

If you install pandoc, you can then use pandoc from the command-line as follows6. Using the file Fourier.tex available from this github repo, open your command-line tool and change directory to the directory containing Fourier.tex. You can then run the following command to use pandoc:

pandoc Fourier.tex -s --mathjax -o Fourier.html

Notice that we use the -s option to tell pandoc to produce a standalone file, which is ready to be opened by a web browser. If this option isn’t used, pandoc produces a snippet — this is like having the tex commands for a file, without putting in \documentclass{article}, \begin{document}, etc. We also use the --mathjax option to tell pandoc that we’d like the html file to load the mathjax library, which is used to make the equations in your document display nicely in the browser.

Pandoc + tex2html

The default output of pandoc can be improved with a few simple additions - expanding macros defined in your tex, loading a stylesheet in the output html to make this output look better (and be more responsive on mobile devices), and support for some additional latex environments.

To make this as simple as possible, I’ve created a project on GitHub which contains the required configuration changes and additional scripts to use with pandoc. These files are available in this github repo. To use these additions, you can download a zip of the files from the previous link (click the green ‘Code’ button, then ‘download ZIP’, or use git to clone the repository if you already have git installed). Once you download and unzip the files from github, you should have a folder called tex2html-master (or just tex2html if using git clone). These files need to be moved from this folder and placed alongside Fourier.tex.

Pandoc can then be run using the following command:

pandoc Fourier.tex -d tex2html -o Fourier.html

The -d option tells pandoc to use the tex2html.yaml configuration file, which in turn loads some additional scripts and template files. The output from this command should now match this demonstration file, and (at least in my opinion) looks much better than the default pandoc output.

Summary

Using pandoc + tex2html, the tex source file for your notes or worksheets can be converted to an html file, which is more accessible than the pdf. Pandoc does have limitations, and isn’t capable of converting everything you can put in a valid tex file. The simpler the source tex file, the more likely it is to convert without any issues7.

The additions to pandoc included in tex2html were written during the summer of 2020, at a time when there were many other things to get sorted for the upcoming academic year. There are therefore likely to still be some small bugs, which I’ll try to fix as soon as I’m made aware of them. However, I have used pandoc + tex2html throughout the last year in both courses I taught, and a number of staff at Durham have also been using this successfully during the last year.

In the next post, I’m going to go into more detail about how pandoc works, what the various parts of my tex2html package do, as well as how to make further modifications to the pandoc output.

  1. For instance, how many solutions are there to $a^n + b^n = c^n$, for $a,b,c,n \in \mathbb{Z}$? This footnote isn’t big enough to give the full answer! 

  2. For a beginners introduction to $\LaTeX{}$, see this post, or the overleaf documentation

  3. Though I modified the tex ever so slightly to produce the document for this post. 

  4. I’d be very happy for any feedback on how easy this is to read with a screenreader, from anyone who has experience using one. 

  5. Pandoc is capable of converting to and from a wide variety of file types, as detailed on their website. For the rest of this post I’ll focus entirely on converting from tex to html. 

  6. There are graphical interfaces which use pandoc behind the scenes, such as PanWriter, but I haven’t tested these, and don’t know how easy it is to specify the necessary additional options. 

  7. Boxes are a common source of problems, so look out for \mbox, \fbox etc.