The process of writing a LaTeX document can be one full of manual steps, resulting in a patchwork document that is not exercisable nor complete. This makes it impossible to reproduce the document from code and data. In this post we will create a pipeline for compiling a LaTeX document that works both locally and using GitLab CI. This is part of a series to create the perfect open science
When writing a document in LaTeX I’d like to use
git for version control even if am working alone on a project. This allows me to track my progress, have a backup, and make sure the document is completely reproducible from raw data. The principle of Reproducible Research (Buckheit & Donoho, 1995; Claerbout & Karrenbach, 1992; Association for Computing Machinery (ACM), n.d.) is to make data and computer code available for others to analyze and criticize.
A good open source repository is exercisable and complete (Monperrus, 2018; Association for Computing Machinery (ACM), n.d.). This means that it must be possible to fully reproduce the document, down to the last pixel, from running a single script in the repository.
In this post we will take a look at the practicalities of writing a reproducible document in LaTeX using a Gitlab CI pipeline to ensure that we pass these requirements.
This post is part of a series and follows Publication ready figures. To see more on requirements on open source repositories see Reproducibility aspects of the Swedish COVID–19 estimate report.
- We define three phases of document compilation that compile figures, compile the main document and test the compiled document against some set of known requirements.
- We contruct a local compilation pipeline based on
makeand Docker (Merkel, 2014).
- We construct a Gitlab CI pipeline that automatically compile the document when we push new code to the remote repository.
What we will need
In this post we will use
git and target the Gitlab CI pipeline framework, and so you will need a repository on Gitlab.
I recommend adding a
.gitignore based on the Gitlab TeX .gitignore template.
The local build system
We will use
latexmk to build our LaTeX document. There are other build systems for LaTeX such as
latexrun which also can be used, but
latexmk has the advantage as is robust and already installed in the Docker image we are using.
We will use GNU make to trigger the
latexmk build locally, or in the GitLab runner. The entry points will be slightly different in these cases.
Here we assume that running
make figures is a step that is very time consuming so we would like to avoid running that all the time.
The command that we run from the command line to compile the LaTeX document is
make. This will first run a Docker container, mount the working directory as and run
make pdf inside the container. Since the working directory is mounted, the pdf-file will remain after the Docker container has been shut down and removed.
The figures resides in the subdirectory
figures which contains a
Makefile. We can compile the figures locally with
make -C figures or in a Docker container with
1 docker run --rm -w /data/ -v`pwd`:/data python:3.8 make -C /data/figures
Each figure is generated from raw data and plotted using a Python script. Each script generates a figure in TiKZ format with the same base name, but with extension “.tex”.
Compiling the document
Our document can be compiled using
latexmk inside a Docker container with
make. This is the same as running
1 docker run --rm -w /data/ -v`pwd`:/data martisak/texlive2020 make pdf
The document will be compiled inside the container using
1 latexmk -bibtex -pdf -pdflatex="pdflatex -interaction=nonstopmode" main.tex
The container we are using is based on a Docker image which has TeXLive 2020 installed on top of an Ubuntu base image.
Running unit tests
The test cases, written in Ruby can either be run locally with
1 rspec spec/pdf_spec.rb
which is the same as
make check or in a Docker container with
1 docker run --rm -w /data/ -v`pwd`:/data ruby:2.7.1 bundle update --bundler; make check
This is the same as running
make check_docker. For a more in-depth guide to LaTeX document unit testing see How to beat publisher PDF checks with LaTeX document unit testing.
LaTeX development environment
When writing a paper we would of course like to see the results of our changes in near real time, and not have to commit our changes to
git in order to compile the document.
We can tweak the
make target a bit so that
latexmk will be run with the
-pvc flag (Wienke, 2018). This puts
latexmk into preview and continuously update mode.
1 make clean render LATEXMK_OPTIONS_EXTRA=-pvc
This means we can run this command once and just edit our document in our favorite text editor.
The GitLab CI pipeline
In GitLab we have a possibility to run a pipeline for each commit using GitLab CI/CD. For this project we have defined three stages: the first stage
figures creates the plots in Python; the second
build compiles the LaTeX document and the third
test runs unit tests on the compiled PDF document.
The complete script
.gitlab-ci.yml can be found in the GitLab repository.
Our first pipeline stage will compile figures according to Publication ready figures. For this we use the official
python:3.8 Docker image. Any job artifacts created in this step will be carried over to the next stage.
1 2 3 4 5 6 7 8 figures: image: python:3.8 stage: figures script: - make -C figures artifacts: untracked: true expire_in: 1 week
The reason for separating this step into a separate stage is that we assume generating figures can take a very long time, for example if a Machine Learning model is trained in this step. In this way we can also keep it separate when running it locally, so that we don’t have to regenerate the figures everytime we want to compile the LaTeX document.
Speeding up the build with caching
figures stage can take a very long time since we need to download and install packages every time the stage runs. To avoid this we can use the example from Cache dependencies in GitLab CI/CD so that the figure stage becomes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 variables: PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip" cache: key: "$CI_JOB_STAGE-$CI_COMMIT_REF_SLUG" paths: - .cache/pip - venv/ figures: image: python:3.8 stage: figures before_script: - python -V - pip install virtualenv - virtualenv venv - source venv/bin/activate script: - make -C figures artifacts: untracked: true expire_in: 1 week
We are using a
virtualenv (Gabor, 2020) to be able to cache the installed packages as well.
Care has to be taken with this - the cache can become to big for Gitlab to handle.
Compiling the LaTeX document
The second stage in the pipeline will compile the actual LaTeX document. Here, we need to use a docker image that have LaTeX and all needed packages installed. The Docker image we use is
martisak/texlive2020, which is using TeXLive 2020.
The job artifact of interest is of course the compiled pdf-document, but we include any untracked file so that any logfiles and other generated files will be included.
1 2 3 4 5 6 7 8 9 10 11 compile: image: martisak/texlive2020 stage: build script: - make pdf dependencies: - figures artifacts: untracked: true expire_in: 1 week when: on_success
Running unit tests
The final stage of the pipeline will run unit tests on the created pdf-file. This is useful to for example make sure the number of pages are as expected, to check that the fonts are embedded properly and that any metadata is set correctly. We will cover these tests in detail in a later post, for now it is enough to say that these tests are written in Ruby, so we will use an appropriate Docker image.
1 2 3 4 5 6 7 8 9 test: image: ruby:2.7.1 stage: test dependencies: - compile script: - bundle install - make check when: on_success
Adding a “Download PDF” button
Now when we have gone through all of this, we would like to share our final document with others. I like using a Gitlab badge for this.
Since we named our document
main.pdf and the compilation stage is named
compile we can find our document at
Of course, we need a fancy image to go with it, and we can generate one using shields.io.
You can add this badge either by adding it to your
README.md or in your Gitlab settings under General and Badges.
A common way of writing LaTeX documents together with others is to use Overleaf. Editing can be done by all authors in real time and the compilation of the document is very fast. However, the online version doesn’t allow us to run arbitrary code, or perform test cases on our document. Furthermore, the version control is hidden from us. Overleaf has a few ways of letting us share the work. In my work, some of the content is proprietary and can be sensitive until the document is reviewed. This means I am not able to use cloud solutions to write my documents. However, Overleaf provides a Docker image that can be deployed locally.
Many authors have looked into using Gitlab CI for building LaTeX documents, for example (Manik, 2019; Lühr, 2018; Khan, 2018; Ergus, 2016). (Ajayakumar, 2020) wrote a very nice and complete guide, and used Gitlab Pages to deploy the compiled document.
In this post we extend this work and make a complete pipeline that also be run locally. Our pipeline consists of three stages,
test each responsible for a separate part of the build process.
We have constructed a simple pipeline for compiling LaTeX documents in a Docker container. This fulfills the requirements that our repository shall be complete and exercisable (Monperrus, 2018; Association for Computing Machinery (ACM), n.d.).
In upcoming posts we will further look into defining test cases for documents, complicating the build with Pandoc and other tricks to annoy your co-authors.
- Buckheit, J. B., & Donoho, D. L. (1995). Wavelab and reproducible research. In Wavelets and statistics (pp. 55–81). Springer.
- Claerbout, J. F., & Karrenbach, M. (1992). Electronic documents give reproducible research a new meaning. In SEG Technical Program Expanded Abstracts 1992 (pp. 601–604). Society of Exploration Geophysicists. https://doi.org/10.1190/1.1822162
- Association for Computing Machinery (ACM). Artifact Review and Badging. https://www.acm.org/publications/policies/artifact-review-badging
- Monperrus, M. (2018). How to make a good open-science repository? https://researchdata.springernature.com/users/336958-martin-monperrus/posts/57389-how-to-make-a-good-open-science-repository
- Merkel, D. (2014). Docker: Lightweight Linux Containers for Consistent Development and Deployment. Linux J., 2014(239). http://dl.acm.org/citation.cfm?id=2600239.2600241
- Wienke, J. (2018). LaTeX Best Practices: Lessons Learned from Writing a PhD Thesis. https://www.semipol.de/2018/06/12/latex-best-practices.html
- Gabor, B. (2020). virtualenv. https://virtualenv.pypa.io/
- Manik, D. (2019). GitLab pipelines for every need: testing, documentation, and writing a paper. In deRSE 2019 - Konferenz für ForschungssoftwareentwicklerInnen in Deutschland. https://doi.org/10.5446/42490
- Lühr, L. (2018). Automate Awesome CV with XeLaTeX and GitLab CI. https://ayeks.de/post/2018-01-25-awesome-cv-cicd/
- Khan, S. (2018). Setting up GitLab to automatically generate PDFs from committed LaTeX files. https://sayantangkhan.github.io/latex-gitlab-ci.html
- Ergus, A. (2016). Using GitLab CI for Building LaTeX. https://github.com/aufenthaltsraum/stuff/wiki/Using-GitLab-CI-for-Building-LaTeX
- Ajayakumar, V. (2020). Continuous Integration of LaTeX projects with GitLab Pages. https://www.vipinajayakumar.com/continuous-integration-of-latex-projects-with-gitlab-pages.html
If you would like to cite this work, here is a suggested citation in BibTeX format.