ConvertDoc User's Guide

Synopsis

ConvertDoc [options] [input-file]

Description

ConvertDoc is a Command Line application for converting from one markup format to another, and a command-line tool that uses this library.

ConvertDoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx. For the full lists of input and output formats, see the --from and --to options below. ConvertDoc can also produce PDF output: see creating a PDF, below.

ConvertDoc's enhanced version of Markdown includes syntax for tables, definition lists, metadata blocks, footnotes, citations, math, and much more. See below under ConvertDoc's Markdown.

ConvertDoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document (an abstract syntax tree or AST), and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer. Users can also run custom ConvertDoc filters to modify the intermediate AST.

Because ConvertDoc's intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. ConvertDoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into ConvertDoc's simple document model. While conversions from ConvertDoc's Markdown to all formats aspire to be perfect, conversions from formats more expressive than ConvertDoc's Markdown can be expected to be lossy.

Using ConvertDoc

If no input-files are specified, input is read from stdin. Output goes to stdout by default. For output to a file, use the -o option:

ConvertDoc -o output.html input.txt

By default, ConvertDoc produces a document fragment. To produce a standalone document (e.g. a valid HTML file including <head> and <body>), use the -s or --standalone flag:

ConvertDoc -s -o output.html input.txt

For more information on how standalone documents are produced, see Templates below.

If multiple input files are given, ConvertDoc will concatenate them all (with blank lines between them) before parsing. (Use --file-scope to parse files individually.)

Specifying formats

The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the -f/--from option, the output format using the -t/--to option. Thus, to convert hello.txt from Markdown to LaTeX, you could type:

ConvertDoc -f markdown -t latex hello.txt

To convert hello.html from HTML to Markdown:

ConvertDoc -f html -t markdown hello.html

Supported input and output formats are listed below under Options (see -f for input formats and -t for output formats). You can also use ConvertDoc --list-input-formats and ConvertDoc --list-output-formats to print lists of supported formats.

If the input or output format is not specified explicitly, ConvertDoc will attempt to guess it from the extensions of the filenames. Thus, for example,

ConvertDoc -o hello.tex hello.txt

will convert hello.txt from Markdown to LaTeX. If no output file is specified (so that output goes to stdout), or if the output file's extension is unknown, the output format will default to HTML. If no input file is specified (so that input comes from stdin), or if the input files' extensions are unknown, the input format will be assumed to be Markdown.

Character encoding

ConvertDoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through iconv:

iconv -t utf-8 input.txt | ConvertDoc | iconv -f utf-8

Note that in some output formats (such as HTML, LaTeX, ConTeXt, RTF, OPML, DocBook, and Texinfo), information about the character encoding is included in the document header, which will only be included if you use the -s/--standalone option.

Creating a PDF

To produce a PDF, specify an output file with a .pdf extension:

ConvertDoc test.txt -o test.pdf

By default, ConvertDoc will use LaTeX to create the PDF, which requires that a LaTeX engine be installed (see --pdf-engine below). Alternatively, ConvertDoc can use ConTeXt, roff ms, or HTML as an intermediate format. To do this, specify an output file with a .pdf extension, as before, but add the --pdf-engine option or -t context, -t html, or -t ms to the command line. The tool used to generate the PDF from the intermediate format may be specified using --pdf-engine.

You can control the PDF style using variables, depending on the intermediate format used: see variables for LaTeX, variables for ConTeXt, variables for wkhtmltopdf, variables for ms. When HTML is used as an intermediate format, the output can be styled using --css.

To debug the PDF creation, it can be useful to look at the intermediate representation: instead of -o test.pdf, use for example -s -o test.tex to output the generated LaTeX. You can then test it with pdflatex test.tex.

When using LaTeX, the following packages need to be available (they are included with all recent versions of TeX Live): amsfonts, amsmath, lm, unicode-math, ifxetex, ifluatex, listings (if the --listings option is used), fancyvrb, longtable, booktabs, graphicx (if the document contains images), hyperref, xcolor, ulem, geometry (with the geometry variable set), setspace (with linestretch), and babel (with lang). The use of xelatex or lualatex as the PDF engine requires fontspec. xelatex uses polyglossia (with lang), xecjk, and bidi (with the dir variable set). If the mathspec variable is set, xelatex will use mathspec instead of unicode-math. The upquote and microtype packages are used if available, and csquotes will be used for typography if the csquotes variable or metadata field is set to a true value. The natbib, biblatex, bibtex, and biber packages can optionally be used for citation rendering. The following packages will be used to improve output quality if present, but ConvertDoc does not require them to be present: upquote (for straight quotes in verbatim environments), microtype (for better spacing adjustments), parskip (for better inter-paragraph spaces), xurl (for better line breaks in URLs), bookmark (for better PDF bookmarks), and footnotehyper or footnote (to allow footnotes in tables).

Reading from the Web

Instead of an input file, an absolute URI may be given. In this case ConvertDoc will fetch the content using HTTP:

ConvertDoc -f html -t markdown https://veryutils.com

It is possible to supply a custom User-Agent string or other header when requesting a document from a URL:

ConvertDoc -f html -t markdown --request-header User-Agent:"Mozilla/5.0" https://veryutils.com

Options

General options

-f FORMAT, -r FORMAT, --from=FORMAT, --read=FORMAT

Specify input format. FORMAT can be:

  • commonmark (CommonMark Markdown)
  • creole (Creole 1.0)
  • csv (CSV table)
  • docbook (DocBook)
  • docx (Word docx)
  • dokuwiki (DokuWiki markup)
  • epub (EPUB)
  • fb2 (FictionBook2 e-book)
  • gfm (GitHub-Flavored Markdown), or the deprecated and less accurate markdown_github; use markdown_github only if you need extensions not supported in gfm.
  • haddock (Haddock markup)
  • html (HTML)
  • ipynb (Jupyter notebook)
  • jats (JATS XML)
  • jira (Jira wiki markup)
  • json (JSON version of native AST)
  • latex (LaTeX)
  • markdown (ConvertDoc's Markdown)
  • markdown_mmd (MultiMarkdown)
  • markdown_phpextra (PHP Markdown Extra)
  • markdown_strict (original unextended Markdown)
  • mediawiki (MediaWiki markup)
  • man (roff man)
  • muse (Muse)
  • native (native Haskell)
  • odt (ODT)
  • opml (OPML)
  • org (Emacs Org mode)
  • rst (reStructuredText)
  • t2t (txt2tags)
  • textile (Textile)
  • tikiwiki (TikiWiki markup)
  • twiki (TWiki markup)
  • vimwiki (Vimwiki)

Extensions can be individually enabled or disabled by appending +EXTENSION or -EXTENSION to the format name. See Extensions below, for a list of extensions and their names. See --list-input-formats and --list-extensions, below.

-t FORMAT, -w FORMAT, --to=FORMAT, --write=FORMAT

Specify output format. FORMAT can be:

  • asciidoc (AsciiDoc) or asciidoctor (AsciiDoctor)
  • beamer (LaTeX beamer slide show)
  • commonmark (CommonMark Markdown)
  • context (ConTeXt)
  • docbook or docbook4 (DocBook 4)
  • docbook5 (DocBook 5)
  • docx (Word docx)
  • dokuwiki (DokuWiki markup)
  • epub or epub3 (EPUB v3 book)
  • epub2 (EPUB v2)
  • fb2 (FictionBook2 e-book)
  • gfm (GitHub-Flavored Markdown), or the deprecated and less accurate markdown_github; use markdown_github only if you need extensions not supported in gfm.
  • haddock (Haddock markup)
  • html or html5 (HTML, i.e.聽HTML5/XHTML polyglot markup)
  • html4 (XHTML 1.0 Transitional)
  • icml (InDesign ICML)
  • ipynb (Jupyter notebook)
  • jats_archiving (JATS XML, Archiving and Interchange Tag Set)
  • jats_articleauthoring (JATS XML, Article Authoring Tag Set)
  • jats_publishing (JATS XML, Journal Publishing Tag Set)
  • jats (alias for jats_archiving)
  • jira (Jira wiki markup)
  • json (JSON version of native AST)
  • latex (LaTeX)
  • man (roff man)
  • markdown (ConvertDoc's Markdown)
  • markdown_mmd (MultiMarkdown)
  • markdown_phpextra (PHP Markdown Extra)
  • markdown_strict (original unextended Markdown)
  • mediawiki (MediaWiki markup)
  • ms (roff ms)
  • muse (Muse),
  • native (native Haskell),
  • odt (OpenOffice text document)
  • opml (OPML)
  • opendocument (OpenDocument)
  • org (Emacs Org mode)
  • pdf (PDF)
  • plain (plain text),
  • pptx (PowerPoint slide show)
  • rst (reStructuredText)
  • rtf (Rich Text Format)
  • texinfo (GNU Texinfo)
  • textile (Textile)
  • slideous (Slideous HTML and JavaScript slide show)
  • slidy (Slidy HTML and JavaScript slide show)
  • dzslides (DZSlides HTML5 + JavaScript slide show),
  • revealjs (reveal.js HTML5 + JavaScript slide show)
  • s5 (S5 HTML and JavaScript slide show)
  • tei (TEI Simple)
  • xwiki (XWiki markup)
  • zimwiki (ZimWiki markup)
  • the path of a custom Lua writer, see Custom writers below

Note that odt, docx, epub, and pdf output will not be directed to stdout unless forced with -o -.

Extensions can be individually enabled or disabled by appending +EXTENSION or -EXTENSION to the format name. See Extensions below, for a list of extensions and their names. See --list-output-formats and --list-extensions, below.

-o FILE, --output=FILE

Write output to FILE instead of stdout. If FILE is -, output will go to stdout, even if a non-textual format (docx, odt, epub2, epub3) is specified.

--data-dir=DIRECTORY

Specify the user data directory to search for ConvertDoc data files. If this option is not specified, the default user data directory will be used. On *nix and macOS systems this will be the ConvertDoc subdirectory of the XDG data directory (by default, $HOME/.local/share, overridable by setting the XDG_DATA_HOME environment variable). If that directory does not exist, $HOME/.ConvertDoc will be used (for backwards compatibility). In Windows the default user data directory is C:\Users\USERNAME\AppData\Roaming\ConvertDoc. You can find the default user data directory on your system by looking at the output of ConvertDoc --version. A reference.odt, reference.docx, epub.css, templates, slidy, slideous, or s5 directory placed in this directory will override ConvertDoc's normal defaults.

-d FILE, --defaults=FILE

Specify a set of default option settings. FILE is a YAML file whose fields correspond to command-line option settings. All options for document conversion, including input and output files, can be set using a defaults file. The file will be searched for first in the working directory, and then in the defaults subdirectory of the user data directory (see --data-dir). The .yaml extension may be omitted. See the section Default files for more information on the file format. Settings from the defaults file may be overridden or extended by subsequent options on the command line.

--bash-completion

Generate a bash completion script. To enable bash completion with ConvertDoc, add this to your .bashrc:

eval "$(ConvertDoc --bash-completion)"
--verbose

Give verbose debugging output. Currently this only has an effect with PDF output.

--quiet

Suppress warning messages.

--fail-if-warnings

Exit with error status if there are any warnings.

--log=FILE

Write log messages in machine-readable JSON format to FILE. All messages above DEBUG level will be written, regardless of verbosity settings (--verbose, --quiet).

--list-input-formats

List supported input formats, one per line.

--list-output-formats

List supported output formats, one per line.

--list-extensions[=FORMAT]

List supported extensions for FORMAT, one per line, preceded by a + or - indicating whether it is enabled by default in FORMAT. If FORMAT is not specified, defaults for ConvertDoc's Markdown are given.

--list-highlight-languages

List supported languages for syntax highlighting, one per line.

--list-highlight-styles

List supported styles for syntax highlighting, one per line. See --highlight-style.

-v, --version

Print version.

-h, --help

Show usage message.

Reader options

--shift-heading-level-by=NUMBER

Shift heading levels by a positive or negative integer. For example, with --shift-heading-level-by=-1, level 2 headings become level 1 headings, and level 3 headings become level 2 headings. Headings cannot have a level less than 1, so a heading that would be shifted below level 1 becomes a regular paragraph. Exception: with a shift of -N, a level-N heading at the beginning of the document replaces the metadata title. --shift-heading-level-by=-1 is a good choice when converting HTML or Markdown documents that use an initial level-1 heading for the document title and level-2+ headings for sections. --shift-heading-level-by=1 may be a good choice for converting Markdown documents that use level-1 headings for sections to HTML, since ConvertDoc uses a level-1 heading to render the document title.

--base-header-level=NUMBER

Deprecated. Use --shift-heading-level-by=X instead, where X = NUMBER - 1. Specify the base level for headings (defaults to 1).

--strip-empty-paragraphs

Deprecated. Use the +empty_paragraphs extension instead. Ignore paragraphs with no content. This option is useful for converting word processing documents where users have used empty paragraphs to create inter-paragraph space.

--indented-code-classes=CLASSES

Specify classes to use for indented code blocks or example, perl,numberLines or haskell. Multiple classes may be separated by spaces or commas.

--default-image-extension=EXTENSION

Specify a default extension to use when image paths/URLs have no extension. This allows you to use the same source for formats that require different kinds of images. Currently this option only affects the Markdown and LaTeX readers.

--file-scope

Parse each file individually before combining for multifile documents. This will allow footnotes in different files with the same identifiers to work as expected. If this option is set, footnotes and links will not work across files. Reading binary files (docx, odt, epub) implies --file-scope.

-F PROGRAM, --filter=PROGRAM

Specify an executable to be used as a filter transforming the ConvertDoc AST after the input is parsed and before the output is written. The executable should read JSON from stdin and write JSON to stdout. The JSON must be formatted like ConvertDoc's own JSON input and output. The name of the output format will be passed to the filter as the first argument. Hence,

ConvertDoc --filter ./caps.py -t latex

is equivalent to

ConvertDoc -t json | ./caps.py latex | ConvertDoc -f json -t latex

The latter form may be useful for debugging filters.

Filters may be written in any language. Text.ConvertDoc.JSON exports toJSONFilter to facilitate writing filters in Haskell. Those who would prefer to write filters in python can use the module ConvertDocfilters, installable from PyPI. There are also ConvertDoc filter libraries in PHP, perl, and JavaScript/node.js.

In order of preference, ConvertDoc will look for filters in

  1. a specified full or relative path (executable or non-executable)

  2. $DATADIR/filters (executable or non-executable) where $DATADIR is the user data directory (see --data-dir, above).

  3. $PATH (executable only)

Filters and Lua-filters are applied in the order specified on the command line.

-L SCRIPT, --lua-filter=SCRIPT

Transform the document in a similar fashion as JSON filters (see --filter), but use ConvertDoc's build-in Lua filtering system. The given Lua script is expected to return a list of Lua filters which will be applied in order. Each Lua filter must contain element-transforming functions indexed by the name of the AST element on which the filter function should be applied.

The ConvertDoc Lua module provides helper functions for element creation. It is always loaded into the script's Lua environment.

The following is an example Lua script for macro-expansion:

function expand_hello_world(inline)
  if inline.c == '{{helloworld}}' then
    return ConvertDoc.Emph{ ConvertDoc.Str "Hello, World" }
  else
    return inline
  end
end

return {{Str = expand_hello_world}}

In order of preference, ConvertDoc will look for Lua filters in

  1. a specified full or relative path (executable or non-executable)

  2. $DATADIR/filters (executable or non-executable) where $DATADIR is the user data directory (see --data-dir, above).

-M KEY[=VAL], --metadata=KEY[:VAL]

Set the metadata field KEY to the value VAL. A value specified on the command line overrides a value specified in the document using YAML metadata blocks. Values will be parsed as YAML boolean or string values. If no value is specified, the value will be treated as Boolean true. Like --variable, --metadata causes template variables to be set. But unlike --variable, --metadata affects the metadata of the underlying document (which is accessible from filters and may be printed in some output formats) and metadata values will be escaped when inserted into the template.

--metadata-file=FILE

Read metadata from the supplied YAML (or JSON) file. This option can be used with every input format, but string scalars in the YAML file will always be parsed as Markdown. Generally, the input will be handled the same as in YAML metadata blocks. This option can be used repeatedly to include multiple metadata files; values in files specified later on the command line will be preferred over those specified in earlier files. Metadata values specified inside the document, or by using -M, overwrite values specified with this option.

-p, --preserve-tabs

Preserve tabs instead of converting them to spaces. (By default, ConvertDoc converts tabs to spaces before parsing its input.) Note that this will only affect tabs in literal code spans and code blocks. Tabs in regular text are always treated as spaces.

--tab-stop=NUMBER

Specify the number of spaces per tab (default is 4).

--track-changes=accept|reject|all

Specifies what to do with insertions, deletions, and comments produced by the MS Word "Track Changes鈥?feature. accept (the default), inserts all insertions, and ignores all deletions. reject inserts all deletions and ignores insertions. Both accept and reject ignore comments. all puts in insertions, deletions, and comments, wrapped in spans with insertion, deletion, comment-start, and comment-end classes, respectively. The author and time of change is included. all is useful for scripting: only accepting changes from a certain reviewer, say, or before a certain date. If a paragraph is inserted or deleted, track-changes=all produces a span with the class paragraph-insertion/paragraph-deletion before the affected paragraph break. This option only affects the docx reader.

--extract-media=DIR

Extract images and other media contained in or linked from the source document to the path DIR, creating it if necessary, and adjust the images references in the document so they point to the extracted files. If the source format is a binary container (docx, epub, or odt), the media is extracted from the container and the original filenames are used. Otherwise the media is read from the file system or downloaded, and new filenames are constructed based on SHA1 hashes of the contents.

--abbreviations=FILE

Specifies a custom abbreviations file, with abbreviations one to a line. If this option is not specified, ConvertDoc will read the data file abbreviations from the user data directory or fall back on a system default. To see the system default, use ConvertDoc --print-default-data-file=abbreviations. The only use ConvertDoc makes of this list is in the Markdown reader. Strings ending in a period that are found in this list will be followed by a nonbreaking space, so that the period will not produce sentence-ending space in formats like LaTeX.

General writer options

-s, --standalone

Produce output with an appropriate header and footer (e.g. a standalone HTML, LaTeX, TEI, or RTF file, not a fragment). This option is set automatically for pdf, epub, epub3, fb2, docx, and odt output. For native output, this option causes metadata to be included; otherwise, metadata is suppressed.

--template=FILE|URL

Use the specified file as a custom template for the generated document. Implies --standalone. See Templates, below, for a description of template syntax. If no extension is specified, an extension corresponding to the writer will be added, so that --template=special looks for special.html for HTML output. If the template is not found, ConvertDoc will search for it in the templates subdirectory of the user data directory (see --data-dir). If this option is not used, a default template appropriate for the output format will be used (see -D/--print-default-template).

-V KEY[=VAL], --variable=KEY[:VAL]

Set the template variable KEY to the value VAL when rendering the document in standalone mode. If no VAL is specified, the key will be given the value true.

-D FORMAT, --print-default-template=FORMAT

Print the system default template for an output FORMAT. (See -t for a list of possible FORMATs.) Templates in the user data directory are ignored. This option may be used with -o/--output to redirect output to a file, but -o/--output must come before --print-default-template on the command line.

Note that some of the default templates use partials, for example styles.html. To print the partials, use --print-default-data-file: for example, --print-default-data-file=templates/styles.html.

--print-default-data-file=FILE

Print a system default data file. Files in the user data directory are ignored. This option may be used with -o/--output to redirect output to a file, but -o/--output must come before --print-default-data-file on the command line.

--eol=crlf|lf|native

Manually specify line endings: crlf (Windows), lf (macOS/Linux/UNIX), or native (line endings appropriate to the OS on which ConvertDoc is being run). The default is native.

--dpi=NUMBER

Specify the default dpi (dots per inch) value for conversion from pixels to inch/centimeters and vice versa. (Technically, the correct term would be ppi: pixels per inch.) The default is 96dpi. When images contain information about dpi internally, the encoded value is used instead of the default specified by this option.

--wrap=auto|none|preserve

Determine how text is wrapped in the output (the source code, not the rendered version). With auto (the default), ConvertDoc will attempt to wrap lines to the column width specified by --columns (default 72). With none, ConvertDoc will not wrap lines at all. With preserve, ConvertDoc will attempt to preserve the wrapping from the source document (that is, where there are nonsemantic newlines in the source, there will be nonsemantic newlines in the output as well). Automatic wrapping does not currently work in HTML output. In ipynb output, this option affects wrapping of the contents of markdown cells.

--columns=NUMBER

Specify length of lines in characters. This affects text wrapping in the generated source code (see --wrap). It also affects calculation of column widths for plain text tables (see Tables below).

--toc, --table-of-contents

Include an automatically generated table of contents (or, in the case of latex, context, docx, odt, opendocument, rst, or ms, an instruction to create one) in the output document. This option has no effect unless -s/--standalone is used, and it has no effect on man, docbook4, docbook5, or jats output.

Note that if you are producing a PDF via ms, the table of contents will appear at the beginning of the document, before the title. If you would prefer it to be at the end of the document, use the option --pdf-engine-opt=--no-toc-relocation.

--toc-depth=NUMBER

Specify the number of section levels to include in the table of contents. The default is 3 (which means that level-1, 2, and 3 headings will be listed in the contents).

--strip-comments

Strip out HTML comments in the Markdown or Textile source, rather than passing them on to Markdown, Textile or HTML output as raw HTML. This does not apply to HTML comments inside raw HTML blocks when the markdown_in_html_blocks extension is not set.

--no-highlight

Disables syntax highlighting for code blocks and inlines, even when a language attribute is given.

--highlight-style=STYLE|FILE

Specifies the coloring style to be used in highlighted source code. Options are pygments (the default), kate, monochrome, breezeDark, espresso, zenburn, haddock, and tango. For more information on syntax highlighting in ConvertDoc, see Syntax highlighting, below. See also --list-highlight-styles.

Instead of a STYLE name, a JSON file with extension .theme may be supplied. This will be parsed as a KDE syntax highlighting theme and (if valid) used as the highlighting style.

To generate the JSON version of an existing style, use --print-highlight-style.

--print-highlight-style=STYLE|FILE

Prints a JSON version of a highlighting style, which can be modified, saved with a .theme extension, and used with --highlight-style. This option may be used with -o/--output to redirect output to a file, but -o/--output must come before --print-highlight-style on the command line.

--syntax-definition=FILE

Instructs ConvertDoc to load a KDE XML syntax definition file, which will be used for syntax highlighting of appropriately marked code blocks. This can be used to add support for new languages or to use altered syntax definitions for existing languages. This option may be repeated to add multiple syntax definitions.

-H FILE, --include-in-header=FILE|URL

Include contents of FILE, verbatim, at the end of the header. This can be used, for example, to include special CSS or JavaScript in HTML documents. This option can be used repeatedly to include multiple files in the header. They will be included in the order specified. Implies --standalone.

-B FILE, --include-before-body=FILE|URL

Include contents of FILE, verbatim, at the beginning of the document body (e.g.聽after the <body> tag in HTML, or the \begin{document} command in LaTeX). This can be used to include navigation bars or banners in HTML documents. This option can be used repeatedly to include multiple files. They will be included in the order specified. Implies --standalone.

-A FILE, --include-after-body=FILE|URL

Include contents of FILE, verbatim, at the end of the document body (before the </body> tag in HTML, or the \end{document} command in LaTeX). This option can be used repeatedly to include multiple files. They will be included in the order specified. Implies --standalone.

--resource-path=SEARCHPATH

List of paths to search for images and other resources. The paths should be separated by : on Linux, UNIX, and macOS systems, and by ; on Windows. If --resource-path is not specified, the default resource path is the working directory. Note that, if --resource-path is specified, the working directory must be explicitly listed or it will not be searched. For example: --resource-path=.:test will search the working directory and the test subdirectory, in that order.

--resource-path only has an effect if (a) the output format embeds images (for example, docx, pdf, or html with --self-contained) or (b) it is used together with --extract-media.

--request-header=NAME:VAL

Set the request header NAME to the value VAL when making HTTP requests (for example, when a URL is given on the command line, or when resources used in a document must be downloaded). If you're behind a proxy, you also need to set the environment variable http_proxy to http://....

Options affecting specific writers

--self-contained

Produce a standalone HTML file with no external dependencies, using data: URIs to incorporate the contents of linked scripts, stylesheets, images, and videos. Implies --standalone. The resulting file should be "self-contained", in the sense that it needs no external files and no net access to be displayed properly by a browser. This option works only with HTML output formats, including html4, html5, html+lhs, html5+lhs, s5, slidy, slideous, dzslides, and revealjs. Scripts, images, and stylesheets at absolute URLs will be downloaded; those at relative URLs will be sought relative to the working directory (if the first source file is local) or relative to the base URL (if the first source file is remote). Elements with the attribute data-external="1" will be left alone; the documents they link to will not be incorporated in the document. Limitation: resources that are loaded dynamically through JavaScript cannot be incorporated; as a result, --self-contained does not work with --mathjax, and some advanced features (e.g. zoom or speaker notes) may not work in an offline "self-contained" reveal.js slide show.

--html-q-tags

Use <q> tags for quotes in HTML.

--ascii

Use only ASCII characters in output. Currently supported for XML and HTML formats (which use entities instead of UTF-8 when this option is selected), CommonMark, gfm, and Markdown (which use entities), roff ms (which use hexadecimal escapes), and to a limited degree LaTeX (which uses standard commands for accented characters when possible). roff man output uses ASCII by default.

--reference-links

Use reference-style links, rather than inline links, in writing Markdown or reStructuredText. By default inline links are used. The placement of link references is affected by the --reference-location option.

--reference-location = block|section|document

Specify whether footnotes (and references, if reference-links is set) are placed at the end of the current (top-level) block, the current section, or the document. The default is document. Currently only affects the markdown writer.

--atx-headers

Use ATX-style headings in Markdown output. The default is to use setext-style headings for levels 1 to 2, and then ATX headings. (Note: for gfm output, ATX headings are always used.) This option also affects markdown cells in ipynb output.

--top-level-division=[default|section|chapter|part]

Treat top-level headings as the given division type in LaTeX, ConTeXt, DocBook, and TEI output. The hierarchy order is part, chapter, then section; all headings are shifted such that the top-level heading becomes the specified type. The default behavior is to determine the best division type via heuristics: unless other conditions apply, section is chosen. When the documentclass variable is set to report, book, or memoir (unless the article option is specified), chapter is implied as the setting for this option. If beamer is the output format, specifying either chapter or part will cause top-level headings to become \part{..}, while second-level headings remain as their default type.

-N, --number-sections

Number section headings in LaTeX, ConTeXt, HTML, or EPUB output. By default, sections are not numbered. Sections with class unnumbered will never be numbered, even if --number-sections is specified.

--number-offset=NUMBER[,NUMBER,]

Offset for section headings in HTML output (ignored in other output formats). The first number is added to the section number for top-level headings, the second for second-level headings, and so on. So, for example, if you want the first top-level heading in your document to be numbered specify --number-offset=5. If your document starts with a level-2 heading which you want to be numbered 5 specify --number-offset=1,4. Offsets are 0 by default. Implies --number-sections.

--listings

Use the listings package for LaTeX code blocks. The package does not support multi-byte encoding for source code. To handle UTF-8 you would need to use a custom template. This issue is fully documented here: Encoding issue with the listings package.

-i, --incremental

Make list items in slide shows display incrementally (one by one). The default is for lists to be displayed all at once.

--slide-level=NUMBER

Specifies that headings with the specified level create slides (for beamer, s5, slidy, slideous, dzslides). Headings above this level in the hierarchy are used to divide the slide show into sections; headings below this level create subheads within a slide. Note that content that is not contained under slide-level headings will not appear in the slide show. The default is to set the slide level based on the contents of the document; see Structuring the slide show.

--section-divs

Wrap sections in <section> tags (or <div> tags for html4), and attach identifiers to the enclosing <section> (or <div>) rather than the heading itself. See Heading identifiers, below.

--email-obfuscation=none|javascript|references

Specify a method for obfuscating mailto: links in HTML documents. none leaves mailto: links as they are. javascript obfuscates them using JavaScript. references obfuscates them by printing their letters as decimal or hexadecimal character references. The default is none.

--id-prefix=STRING

Specify a prefix to be added to all identifiers and internal links in HTML and DocBook output, and to footnote numbers in Markdown and Haddock output. This is useful for preventing duplicate identifiers when generating fragments to be included in other pages.

-T STRING, --title-prefix=STRING

Specify STRING as a prefix at the beginning of the title that appears in the HTML header (but not in the title as it appears at the beginning of the HTML body). Implies --standalone.

-c URL, --css=URL

Link to a CSS style sheet. This option can be used repeatedly to include multiple files. They will be included in the order specified.

A stylesheet is required for generating EPUB. If none is provided using this option (or the css or stylesheet metadata fields), ConvertDoc will look for a file epub.css in the user data directory (see --data-dir). If it is not found there, sensible defaults will be used.

--reference-doc=FILE

Use the specified file as a style reference in producing a docx or ODT file.

Docx

For best results, the reference docx should be a modified version of a docx file produced using ConvertDoc. The contents of the reference docx are ignored, but its stylesheets and document properties (including margins, page size, header, and footer) are used in the new docx. If no reference docx is specified on the command line, ConvertDoc will look for a file reference.docx in the user data directory (see --data-dir). If this is not found either, sensible defaults will be used.

To produce a custom reference.docx, first get a copy of the default reference.docx: ConvertDoc -o custom-reference.docx --print-default-data-file reference.docx. Then open custom-reference.docx in Word, modify the styles as you wish, and save the file. For best results, do not make changes to this file other than modifying the styles used by ConvertDoc:

Paragraph styles:

  • Normal
  • Body Text
  • First Paragraph
  • Compact
  • Title
  • Subtitle
  • Author
  • Date
  • Abstract
  • Bibliography
  • Heading 1
  • Heading 2
  • Heading 3
  • Heading 4
  • Heading 5
  • Heading 6
  • Heading 7
  • Heading 8
  • Heading 9
  • Block Text
  • Footnote Text
  • Definition Term
  • Definition
  • Caption
  • Table Caption
  • Image Caption
  • Figure
  • Captioned Figure
  • TOC Heading

Character styles:

  • Default Paragraph Font
  • Body Text Char
  • Verbatim Char
  • Footnote Reference
  • Hyperlink

Table style:

  • Table
ODT

For best results, the reference ODT should be a modified version of an ODT produced using ConvertDoc. The contents of the reference ODT are ignored, but its stylesheets are used in the new ODT. If no reference ODT is specified on the command line, ConvertDoc will look for a file reference.odt in the user data directory (see --data-dir). If this is not found either, sensible defaults will be used.

To produce a custom reference.odt, first get a copy of the default reference.odt: ConvertDoc -o custom-reference.odt --print-default-data-file reference.odt. Then open custom-reference.odt in LibreOffice, modify the styles as you wish, and save the file.

PowerPoint

Templates included with Microsoft PowerPoint 2013 (either with .pptx or .potx extension) are known to work, as are most templates derived from these.

The specific requirement is that the template should begin with the following first four layouts:

  1. Title Slide
  2. Title and Content
  3. Section Header
  4. Two Content

All templates included with a recent version of MS PowerPoint will fit these criteria. (You can click on Layout under the Home menu to check.)

You can also modify the default reference.pptx: first run ConvertDoc -o custom-reference.pptx --print-default-data-file reference.pptx, and then modify custom-reference.pptx in MS PowerPoint (ConvertDoc will use the first four layout slides, as mentioned above).

--epub-cover-image=FILE

Use the specified image as the EPUB cover. It is recommended that the image be less than 1000px in width and height. Note that in a Markdown source document you can also specify cover-image in a YAML metadata block (see EPUB Metadata, below).

--epub-metadata=FILE

Look in the specified XML file for metadata for the EPUB. The file should contain a series of Dublin Core elements. For example:

 <dc:rights>Creative Commons</dc:rights>
 <dc:language>es-AR</dc:language>

By default, ConvertDoc will include the following metadata elements: <dc:title> (from the document title), <dc:creator> (from the document authors), <dc:date> (from the document date, which should be in ISO 8601 format), <dc:language> (from the lang variable, or, if is not set, the locale), and <dc:identifier id="BookId"> (a randomly generated UUID). Any of these may be overridden by elements in the metadata file.

Note: if the source document is Markdown, a YAML metadata block in the document can be used instead. See below under EPUB Metadata.

--epub-embed-font=FILE

Embed the specified font in the EPUB. This option can be repeated to embed multiple fonts. Wildcards can also be used: for example, DejaVuSans-*.ttf. However, if you use wildcards on the command line, be sure to escape them or put the whole filename in single quotes, to prevent them from being interpreted by the shell. To use the embedded fonts, you will need to add declarations like the following to your CSS (see --css):

@font-face {
font-family: DejaVuSans;
font-style: normal;
font-weight: normal;
src:url("DejaVuSans-Regular.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: normal;
font-weight: bold;
src:url("DejaVuSans-Bold.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: italic;
font-weight: normal;
src:url("DejaVuSans-Oblique.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: italic;
font-weight: bold;
src:url("DejaVuSans-BoldOblique.ttf");
}
body { font-family: "DejaVuSans"; }
--epub-chapter-level=NUMBER

Specify the heading level at which to split the EPUB into separate "chapter" files. The default is to split into chapters at level-1 headings. This option only affects the internal composition of the EPUB, not the way chapters and sections are displayed to users. Some readers may be slow if the chapter files are too large, so for large documents with few level-1 headings, one might want to use a chapter level of 2 or 3.

--epub-subdirectory=DIRNAME

Specify the subdirectory in the OCF container that is to hold the EPUB-specific contents. The default is EPUB. To put the EPUB contents in the top level, use an empty string.

--ipynb-output=all|none|best

Determines how ipynb output cells are treated. all means that all of the data formats included in the original are preserved. none means that the contents of data cells are omitted. best causes ConvertDoc to try to pick the richest data block in each output cell that is compatible with the output format. The default is best.

--pdf-engine=PROGRAM

Use the specified engine when producing PDF output. Valid values are pdflatex, lualatex, xelatex, latexmk, tectonic, wkhtmltopdf, weasyprint, prince, context, and pdfroff. If the engine is not in your PATH, the full path of the engine may be specified here. If this option is not specified, ConvertDoc uses the following defaults depending on the output format specified using -t/--to:

--pdf-engine-opt=STRING

Use the given string as a command-line argument to the pdf-engine. For example, to use a persistent directory foo for latexmk's auxiliary files, use --pdf-engine-opt=-outdir=foo. Note that no check for duplicate options is done.

Citation rendering

--bibliography=FILE

Set the bibliography field in the document's metadata to FILE, overriding any value set in the metadata, and process citations using ConvertDoc-citeproc. (This is equivalent to --metadata bibliography=FILE --filter ConvertDoc-citeproc.) If --natbib or --biblatex is also supplied, ConvertDoc-citeproc is not used, making this equivalent to --metadata bibliography=FILE. If you supply this argument multiple times, each FILE will be added to bibliography.

--csl=FILE

Set the csl field in the document's metadata to FILE, overriding any value set in the metadata. (This is equivalent to --metadata csl=FILE.) This option is only relevant with ConvertDoc-citeproc.

--citation-abbreviations=FILE

Set the citation-abbreviations field in the document's metadata to FILE, overriding any value set in the metadata. (This is equivalent to --metadata citation-abbreviations=FILE.) This option is only relevant with ConvertDoc-citeproc.

--natbib

Use natbib for citations in LaTeX output. This option is not for use with the ConvertDoc-citeproc filter or with PDF output. It is intended for use in producing a LaTeX file that can be processed with bibtex.

--biblatex

Use biblatex for citations in LaTeX output. This option is not for use with the ConvertDoc-citeproc filter or with PDF output. It is intended for use in producing a LaTeX file that can be processed with bibtex or biber.

Math rendering in HTML

The default is to render TeX math as far as possible using Unicode characters. Formulas are put inside a span with class="math", so that they may be styled differently from the surrounding text if needed. However, this gives acceptable results only for basic math, usually you will want to use --mathjax or another of the following options.

--mathjax[=URL]

Use MathJax to display embedded TeX math in HTML output. TeX math will be put between \(...\) (for inline math) or \[...\] (for display math) and wrapped in <span> tags with class math. Then the MathJax JavaScript will render it. The URL should point to the MathJax.js load script. If a URL is not provided, a link to the Cloudflare CDN will be inserted.

--mathml

Convert TeX math to MathML (in epub3, docbook4, docbook5, jats, html4 and html5). This is the default in odt output. Note that currently only Firefox and Safari (and select e-book readers) natively support MathML.

--webtex[=URL]

Convert TeX formulas to <img> tags that link to an external script that converts formulas to images. The formula will be URL-encoded and concatenated with the URL provided. For SVG images you can for example use --webtex https://latex.codecogs.com/svg.latex?. If no URL is specified, the CodeCogs URL generating PNGs will be used (https://latex.codecogs.com/png.latex?). Note: the --webtex option will affect Markdown output as well as HTML, which is useful if you're targeting a version of Markdown without native math support.

--katex[=URL]

Use KaTeX to display embedded TeX math in HTML output. The URL is the base URL for the KaTeX library. That directory should contain a katex.min.js and a katex.min.css file. If a URL is not provided, a link to the KaTeX CDN will be inserted.

--gladtex

Enclose TeX math in <eq> tags in HTML output. The resulting HTML can then be processed by GladTeX to produce images of the typeset formulas and an HTML file with links to these images. So, the procedure is:

ConvertDoc -s --gladtex input.md -o myfile.htex
gladtex -d myfile-images myfile.htex
# produces myfile.html and images in myfile-images

Options for wrapper scripts

--dump-args

Print information about command-line arguments to stdout, then exit. This option is intended primarily for use in wrapper scripts. The first line of output contains the name of the output file specified with the -o option, or - (for stdout) if no output file was specified. The remaining lines contain the command-line arguments, one per line, in the order they appear. These do not include regular ConvertDoc options and their arguments, but do include any options appearing after a -- separator at the end of the line.

--ignore-args

Ignore command-line arguments (for use in wrapper scripts). Regular ConvertDoc options are not ignored. Thus, for example,

ConvertDoc --ignore-args -o foo.html -s foo.txt -- -e latin1

is equivalent to

ConvertDoc -o foo.html -s

Exit codes

If ConvertDoc completes successfully, it will return exit code 0. Nonzero exit codes have the following meanings:

Code Error
3 ConvertDocFailOnWarningError
4 ConvertDocAppError
5 ConvertDocTemplateError
6 ConvertDocOptionError
21 ConvertDocUnknownReaderError
22 ConvertDocUnknownWriterError
23 ConvertDocUnsupportedExtensionError
31 ConvertDocEpubSubdirectoryError
43 ConvertDocPDFError
47 ConvertDocPDFProgramNotFoundError
61 ConvertDocHttpError
62 ConvertDocShouldNeverHappenError
63 ConvertDocSomeError
64 ConvertDocParseError
65 ConvertDocParsecError
66 ConvertDocMakePDFError
67 ConvertDocSyntaxMapError
83 ConvertDocFilterError
91 ConvertDocMacroLoop
92 ConvertDocUTF8DecodingError
93 ConvertDocIpynbDecodingError
97 ConvertDocCouldNotFindDataFileError
99 ConvertDocResourceNotFound

Templates

When the -s/--standalone option is used, ConvertDoc uses a template to add header and footer material that is needed for a self-standing document. To see the default template that is used, just type

ConvertDoc -D *FORMAT*

where FORMAT is the name of the output format. A custom template can be specified using the --template option. You can also override the system default templates for a given output format FORMAT by putting a file templates/default.*FORMAT* in the user data directory (see --data-dir, above). Exceptions:

Templates contain variables, which allow for the inclusion of arbitrary information at any point in the file. They may be set at the command line using the -V/--variable option. If a variable is not set, ConvertDoc will look for the key in the document's metadata, which can be set using either YAML metadata blocks or with the -M/--metadata option. In addition, some variables are given default values by ConvertDoc. See Variables below for a list of variables used in ConvertDoc's default templates.

If you use custom templates, you may need to revise them as ConvertDoc changes. We recommend tracking the changes in the default templates, and modifying your custom templates accordingly. An easy way to do this is to fork the ConvertDoc-templates repository and merge in changes after each ConvertDoc release.