Mediawiki2latex
Introduction
Mediwiki2latex or wb2pdf is a tool created by Dirk Huenniger that allows exporting Mediawiki pages and article collections to create Latex, PDF, Epub and ODF.
It can be used to create (1) print documents on demand and (2) export for book projects that start from wiki pages as draft documents.
See also:
- Mediawiki collection extension installation, a system that is currently a bit broken on this wiki, i.e. it has trouble retrieving and using all the images.
Disclaimer. This is not an official documentation page. Also, prior to feb 10 2019, this toolset was designed to work with standard installations, i.e. not our type of mediawikis. A nice and quick fix now allows to create PDFs from book collections on demand. Other functionality may be implemented at a later stage.
This page explains how to use mediawiki2latex and how to install it on an Ubuntu system.
Using
Mediawiki2latex works best with Wikipedia. As of Feb 13, certain functionality does not work with this wiki, but may work on others. Below we introduce the options you have for using this platform. Command line is probably the most productive option.
Official online server
The official online server allows processing books within limits, so we recommend installing your own platform if you got a Debian/Ubuntu machine.
- http://mediawiki2latex.wmflabs.org/, max 200 pages, time limit = 1 hour.
- http://mediawiki2latex-large.wmflabs.org/ max 800 pages, max time = 6 hours
Using your own is faster and will take take load off the official server.
Your own local server
You could run your own server, either as public or local server.
mediawiki2latex -s PORT_NUMBER- e.g.
- mediawiki2latex -s 8080
Command line
Again, some of these may not work with your wiki. Some combinations of parameters do not work, e.g. one cannot combine "bookmode" and "user templates".
See also: official manual. It it includes more information.
-V, -?, -v --version, --help show version number -o FILE --output=FILE output FILE (REQUIRED) -f START:END --featured=START:END run selftest on featured article numbers from START to END -x CONFIG --hex=CONFIG hex encoded full configuration for run -s PORT --server=PORT run in server mode listen on the given port -t FILE --templates=FILE user template map FILE -r INTEGER --resolution=INTEGER maximum image resolution in dpi INTEGER -u URL --url=URL input URL (REQUIRED) -p PAPER --paper=PAPER paper size, one of A4,A5,B5,letter,legal,executive -m --mediawiki use MediaWiki to expand templates -h --html use MediaWiki generated html as input (default) -e --tableslatex use LaTeX to gernerate tables -n --noparent only include urls which a children of start url -k --bookmode use book-namespace mode for expansion -z --zip output zip archive of latex source -b --epub output epub file -d --odt output odt file -g --vector keep vector graphics in vector form -i --internal use internal template definitions -l DIRECTORY --headers=DIRECTORY use user supplied latex headers -c DIRECTORY --copy=DIRECTORY copy LaTeX tree to DIRECTORY
Example code for books (replace URL by your own)
- Generate a PDF from a wiki book ("Collection extension), starting from the HTML code
mediawiki2latex -o book.pdf -k -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Book_title
- with some more parameters
mediawiki2latex -o book.pdf -k -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Broderie_num%C3%A9rique -c livrelatex -t /usr/share/mediawiki2latex/latex/templates.user -r 250 -p A4
- Generate a PDF from a wikibook, starting with wiki code and use templates.
mediawiki2latex -o book.pdf -m -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Broderie_num%C3%A9rique -c livrelatex -t /usr/share/mediawiki2latex/latex/templates.user
- Create a Libre Office document from a collection (see comments below with respect to LibreOffice)
mediawiki2latex -o book.odf -k -d -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/XXX_YYY -c .
- Create a zip file with latex and assets
mediawiki2latex -o book.zip -k -z -u https://edutechwiki.unige.ch/fr/BookNS:Books/Book_title
Example code for articles
- Create a page using wiki expansion (not working as of Feb 11 2019 in this wiki)
mediawiki2latex -o article.pdf -m -u "https://edutechwiki.unige.ch/fr/STIC:STIC_III_(2018)/Prototypes_de_physicalisation_-_broderie_machine"
- Create a page using internally defined templates (using the -t option specifying a template file)
mediawiki2latex -o article.pdf -u "https://edutechwiki.unige.ch/fr/STIC:STIC_III_(2018)/Prototypes_de_physicalisation_-_broderie_machine" -t /usr/share/mediawiki2latex/latex/templates.user
Template Tweaking
The easiest way is to use HTML mode since templates will be expanded. However you then may get unwanted contents. Therefore you could retrieve in wiki mode, but you then have to define latex templates. See the official documentation
If you set $wgDefaultUserOptions['numberheadings'] = 1; in LocalSettings, remove it temporarily while mediawiki2latex downloads the articles. Alternatively, use wiki mode, if it works in your wiki.
Reduce image size to 400px. (I have to test if this works with thumbnails).
Wrapping of images. There seems to be a model that could be used (to do).
Exclude all templates you don't want, by editing /usr/share/mediawiki2latex/latex/templates.user and by using the "-t" option. The templates.user file is not read automatically by the system.
E.g. add
["tutorial","LaTeXNullTemplate"], ["tutoriel","LaTeXNullTemplate"], ["syllabus","LaTeXNullTemplate"]
Copyright information / header
You can define your own headers by modifying and recompiling
document/headers/options.tex
Else use the --headers option.
.... not tested so far.
Creating wiki books
Using transclusion
You could create a wiki page that includes other articles. However, there will be a processing limit. E.g. If you include dozens of pages you may experience slow down or exceed max. number of templates allowed. However, as of April 2019, you must use this to create books with template expansion (-m or -t flag).
= title 1 =
{{:MyPageOne}}
= title 2 =
{{:MyPageTwo}}
Example using template expansion
mediawiki2latex -o book.pdf -u https://edutechwiki.unige.ch/fr/Daniel_K._Schneider/My_Book -t /usr/share/mediawiki2latex/latex/templates.user
Make sure that included pages start title numbering with "==" and not "=".
Using collection extension
The recommended solution is to use wiki books defined by the collection extension, i.e. use a feature from the alternative PediaPress technology.
As of Feb 2019, this works with our wikis using default mode (html-based). It fails using "wiki" template expansion.
Installation for Ubuntu
sudo apt-get install mediawiki2latex mediawiki2latex
This worked for Ubuntu 24.04.2 LTS (better than the manual installation below)
Manual install under Ubuntu
Installation of Latex
It is likely that your current latex installation is not good enough. Therefore we suggest installing the latest version of Tex Live manually. It can be installed over an existing system and co-exist. Tex Live should include important packages like xetex.
cd /src# working directory of your choice- Download:
wgethttps://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz or https://www.tug.org/texlive/acquire-netinstall.html zcat < install-tl-unx.tar.gz | tar xf - # note final - on that command linecd install-tl-2*perl ./install-tl --no-interaction # as root or with writable destination# may take several hours to run- Finally, prepend
/usr/local/texlive/YYYY/bin/PLATFORMto your PATH, e.g.,/usr/local/texlive/2025/bin/x86_64-linuxi.e. Edit ~/.profile and set PATH=/usr/local/texlive/2025/bin/x86_64-linux/:$PATH
Test the installation:
latex small2e
Alternatively you could just install the system's package, however, depending on the age of your OS it may be outdated.
sudo apt-get install texlive-xetex
You also could consider installing a latex editor:
Installation of Mediawiki2LaTex
See the official links below first. Below we just wrote down what did work on Feb 2019 for Ubuntu 18x LTS. Somewhat updated for Ubuntu 24.04 LTS on June 2025 for version 8.28
(1) Install the (probably old) default version, which will also install lots of run time dependencies (compatible with your current Ubuntu system).
sudo apt-get install mediawiki2latex
(2) Then install the build time dependencies (as root) as explained [Benutzer:Dirk Hünniger/wb2pdf/installing here] , i.e. about 10 different packages
apt-get install ghc libghc-x509-dev libghc-pem-dev chromium chromium-sandbox apt-get install libghc-regex-compat-dev libghc-http-dev cabal-install libghc-hxt-dev apt-get install libghc-split-dev libghc-blaze-html-dev libghc-file-embed-dev apt-get install libghc-hxt-http-dev apt-get install libghc-temporary-dev libghc-url-dev libghc-utf8-string-dev apt-get install libghc-utility-ht-dev libghc-http-client-tls-dev libghc-happstack-server-dev apt-get install libghc-directory-tree-dev libghc-zip-archive-dev libghc-strict-dev apt-get install libghc-network-uri-dev libghc-tagsoup-dev libghc-word8-dev apt-get install ghostscript make latex2rtf libreoffice curl texlive-extra-utils apt-get install pdftk libimage-exiftool-perl
(3) Then install the new version from the git repository
git clone https://git.code.sf.net/p/wb2pdf/git wb2pdf-git cd wb2pdf-git sudo make sudo make install
To update:
- cd into the wb2pdf-git directory
sudo git pull sudo make install
Important: Do this, each time you update your latex installation.
Mediawiki2Latex configuration
(1) Add a list of templates you want the system to ignore
- Edit the config file:
/usr/share/mediawiki2latex/latex/templates.user
- Or copy it and then use it with the "-t" option.
I, for example, had to add the following. Make sure to respect syntax, e.g. no comma in the last line or else the program will just fade out ...
..... ["tutoriel","LaTeXNullTemplate"], ["brouillon","LaTeXNullTemplate"], ["ebauche","LaTeXNullTemplate"], ["incomplet","LaTeXNullTemplate"], ["citation","LaTeXZeroBoxTemplate","1"], ["lien","LaTeXZeroBoxTemplate","1"] ]
(2) Add fonts if necessary
- Get the file from here https://packages.debian.org/trixie/all/fonts-unifont/download
sudo dpkg -i fonts-unifont_15.1.01-1_all.deb
- Install the following fonts in the system
sudo apt-get install fonts-cmu
(3) Install imagemagick if it's not already in the system
sudo apt install imagemagick
Then adapt permissions and processing parameters as root in /etc/ImageMagick-6/policy.xml
<policy domain="coder" rights="read|write" pattern="PS" /> <policy domain="coder" rights="read|write" pattern="PS2" /> <policy domain="coder" rights="read|write" pattern="PS3" /> <policy domain="coder" rights="read|write" pattern="EPS" /> <policy domain="coder" rights="read|write" pattern="PDF" /> <policy domain="coder" rights="read|write" pattern="XPS" />
<policy domain="resource" name="memory" value="8GiB"/> <policy domain="resource" name="map" value="8GiB"/> <policy domain="resource" name="width" value="100KP"/> <policy domain="resource" name="height" value="100KP"/> <policy domain="resource" name="area" value="10GP"/> <policy domain="resource" name="disk" value="20GiB"/>
(7) Increase Latex buffer size
It is very difficult to find correct information on how to do this
- Find config file :
kpsewhich texmf.cnf - Edit this file and add
buf_size=10000000
Troubleshooting
Missing pictures will stop the program
If you link to an nonexisting picture, the program will crash (June 2025).
Here is an example. First line is OK, after second line it stopped.
MediaWiki2LaTeX-tmp-d95fda37057d5e71/document/images/321.jpg JPEG 2252x2316 2252x2316+0+0 8-bit sRGB 1.2829MiB 0.050u 0:00.062 mediawiki2latex: /tmp/MediaWiki2LaTeXImages-a2e64d0a3634b499/322: withBinaryFile: does not exist (No such file or directory)
In order to figure out where this picture is:
- open /tmp/MediaWiki2LaTeXImages-a2e64d0a3634b499/321 and /tmp/MediaWiki2LaTeXImages-a2e64d0a3634b499/323. The missing one is in between.
Debugging
You can ask mediawikitolatex to copy all the latex file into a directory and then look at the Latex code
- Use the -c option
To run the tex file manually, try cd document/main, then (if we are correct)
lualatex main.tex or xelatex main.tex
The instructions in the official mediawiki2latex installation manual seem to be outdated as of June 2025.
I also added:
buf_size=10000000
main_memory=20000000
pool_size=20000000
main_memory.xetex = 20000000
extra_mem_top.xetex = 1000000
extra_mem_bot.xetex = 1000000
main_memory.luatex = 20000000
extra_mem_top.luatex = 1000000
extra_mem_bot.luatex = 1000000
However, memory errors are probably related to mistakes in the latex code.
Libre office installation and creation tips
As of Jan 7 2020:
(1) Get the latest libre office, read https://wiki.ubuntu.com/LibreOffice
sudo apt install python-software-properties sudo apt-add-repository ppa:libreoffice/ppa sudo apt update sudo apt install libreoffice $ libreoffice --version LibreOffice 6.3.4.2 30(Build:2)
(2) Make sure that imagemagik has permission to transform PS and PDF files to PNG
In /etc/ImageMagick-6/policy.xml
<policy domain="coder" rights="read|write" pattern="PS" /> <policy domain="coder" rights="none|write" pattern="PS2" /> <policy domain="coder" rights="none|write" pattern="PS3" /> <policy domain="coder" rights="none|write" pattern="EPS" /> <policy domain="coder" rights="read|write" pattern="PDF" /> <policy domain="coder" rights="read|write" pattern="XPS" />
(3) (Fixed) In an older than Jan 13 2020 version, the ODF could not find the image files, but that is fixed now. I did the following ativate "copy to latex", then make sure that LibreOffice can find the images and formulas directories it is looking for, e.g. if you start from Pediapress book definition:
mkdir somedirectory cd somedirectory mediawiki2latex -o ct.odf -d -u https://edutechwiki.unige.ch/fr/EduTech_Wiki:Livres/Initiation_%C3%A0_la_pens%C3%A9e_computationnelle_avec_JavaScript -k -c .
then
mv document/images/ . mv document/formulas/ .