ConvertDoc
[options] [input-file]
ConvertDoc is a Command Line application for converting from one markup format to another, and a command-line tool that uses this library.
ConvertDoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx. For the full lists of input and output formats, see the --from
and --to
options below. ConvertDoc can also produce PDF output: see creating a PDF, below.
ConvertDoc's enhanced version of Markdown includes syntax for tables, definition lists, metadata blocks, footnotes, citations, math, and much more. See below under ConvertDoc's Markdown.
ConvertDoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document (an abstract syntax tree or AST), and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer. Users can also run custom ConvertDoc filters to modify the intermediate AST.
Because ConvertDoc's intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. ConvertDoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into ConvertDoc's simple document model. While conversions from ConvertDoc's Markdown to all formats aspire to be perfect, conversions from formats more expressive than ConvertDoc's Markdown can be expected to be lossy.
If no input-files are specified, input is read from stdin. Output goes to stdout by default. For output to a file, use the -o
option:
ConvertDoc -o output.html input.txt
By default, ConvertDoc produces a document fragment. To produce a standalone document (e.g. a valid HTML file including <head>
and <body>
), use the -s
or --standalone
flag:
ConvertDoc -s -o output.html input.txt
For more information on how standalone documents are produced, see Templates below.
If multiple input files are given, ConvertDoc
will concatenate them all (with blank lines between them) before parsing. (Use --file-scope
to parse files individually.)
The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the -f/--from
option, the output format using the -t/--to
option. Thus, to convert hello.txt
from Markdown to LaTeX, you could type:
ConvertDoc -f markdown -t latex hello.txt
To convert hello.html
from HTML to Markdown:
ConvertDoc -f html -t markdown hello.html
Supported input and output formats are listed below under Options (see -f
for input formats and -t
for output formats). You can also use ConvertDoc --list-input-formats
and ConvertDoc --list-output-formats
to print lists of supported formats.
If the input or output format is not specified explicitly, ConvertDoc
will attempt to guess it from the extensions of the filenames. Thus, for example,
ConvertDoc -o hello.tex hello.txt
will convert hello.txt
from Markdown to LaTeX. If no output file is specified (so that output goes to stdout), or if the output file's extension is unknown, the output format will default to HTML. If no input file is specified (so that input comes from stdin), or if the input files' extensions are unknown, the input format will be assumed to be Markdown.
ConvertDoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through
iconv
:
iconv -t utf-8 input.txt | ConvertDoc | iconv -f utf-8
Note that in some output formats (such as HTML, LaTeX, ConTeXt, RTF, OPML, DocBook, and Texinfo), information about the character encoding is included in the document header, which will only be included if you use the -s/--standalone
option.
To produce a PDF, specify an output file with a .pdf
extension:
ConvertDoc test.txt -o test.pdf
By default, ConvertDoc will use LaTeX to create the PDF, which requires that a LaTeX engine be installed (see --pdf-engine
below). Alternatively, ConvertDoc can use ConTeXt, roff ms, or HTML as an intermediate format. To do this, specify an output file with a .pdf
extension, as before, but add the --pdf-engine
option or -t context
, -t html
, or -t ms
to the command line. The tool used to generate the PDF from the intermediate format may be specified using --pdf-engine
.
You can control the PDF style using variables, depending on the intermediate format used: see variables for LaTeX, variables for ConTeXt, variables for wkhtmltopdf
, variables for ms. When HTML is used as an intermediate format, the output can be styled using --css
.
To debug the PDF creation, it can be useful to look at the intermediate representation: instead of -o test.pdf
, use for example -s -o test.tex
to output the generated LaTeX. You can then test it with pdflatex test.tex
.
When using LaTeX, the following packages need to be available (they are included with all recent versions of TeX Live): amsfonts, amsmath, lm, unicode-math, ifxetex, ifluatex, listings (if the --listings option is used), fancyvrb, longtable, booktabs, graphicx (if the document contains images), hyperref, xcolor, ulem, geometry (with the geometry variable set), setspace (with linestretch), and babel (with lang). The use of xelatex or lualatex as the PDF engine requires fontspec. xelatex uses polyglossia (with lang), xecjk, and bidi (with the dir variable set). If the mathspec variable is set, xelatex will use mathspec instead of unicode-math. The upquote and microtype packages are used if available, and csquotes will be used for typography if the csquotes variable or metadata field is set to a true value. The natbib, biblatex, bibtex, and biber packages can optionally be used for citation rendering. The following packages will be used to improve output quality if present, but ConvertDoc does not require them to be present: upquote (for straight quotes in verbatim environments), microtype (for better spacing adjustments), parskip (for better inter-paragraph spaces), xurl (for better line breaks in URLs), bookmark (for better PDF bookmarks), and footnotehyper or footnote (to allow footnotes in tables).
Instead of an input file, an absolute URI may be given. In this case ConvertDoc will fetch the content using HTTP:
ConvertDoc -f html -t markdown https://veryutils.com
It is possible to supply a custom User-Agent string or other header when requesting a document from a URL:
ConvertDoc -f html -t markdown --request-header User-Agent:"Mozilla/5.0" https://veryutils.com
-f
FORMAT, -r
FORMAT, --from=
FORMAT, --read=
FORMATSpecify input format. FORMAT can be:
commonmark
(CommonMark Markdown)Extensions can be individually enabled or disabled by appending +EXTENSION
or -EXTENSION
to the format name. See Extensions below, for a list of extensions and their names. See --list-input-formats
and --list-extensions
, below.
-t
FORMAT, -w
FORMAT, --to=
FORMAT, --write=
FORMATSpecify output format. FORMAT can be:
asciidoc
(AsciiDoc) or asciidoctor
(AsciiDoctor)Note that odt
, docx
, epub
, and pdf
output will not be directed to stdout unless forced with -o -
.
Extensions can be individually enabled or disabled by appending +EXTENSION
or -EXTENSION
to the format name. See Extensions below, for a list of extensions and their names. See --list-output-formats
and --list-extensions
, below.
-o
FILE, --output=
FILEWrite output to FILE instead of stdout. If FILE is -
, output will go to stdout, even if a non-textual format (docx
, odt
, epub2
, epub3
) is specified.
--data-dir=
DIRECTORYSpecify the user data directory to search for ConvertDoc data files. If this option is not specified, the default user data directory will be used. On *nix and macOS systems this will be the ConvertDoc
subdirectory of the XDG data directory (by default, $HOME/.local/share
, overridable by setting the XDG_DATA_HOME
environment variable). If that directory does not exist, $HOME/.ConvertDoc
will be used (for backwards compatibility). In Windows the default user data directory is C:\Users\USERNAME\AppData\Roaming\ConvertDoc
. You can find the default user data directory on your system by looking at the output of ConvertDoc --version
. A reference.odt
, reference.docx
, epub.css
, templates
, slidy
, slideous
, or s5
directory placed in this directory will override ConvertDoc's normal defaults.
-d
FILE, --defaults=
FILESpecify a set of default option settings. FILE is a YAML file whose fields correspond to command-line option settings. All options for document conversion, including input and output files, can be set using a defaults file. The file will be searched for first in the working directory, and then in the defaults
subdirectory of the user data directory (see --data-dir
). The .yaml
extension may be omitted. See the section Default files for more information on the file format. Settings from the defaults file may be overridden or extended by subsequent options on the command line.
--bash-completion
Generate a bash completion script. To enable bash completion with ConvertDoc, add this to your .bashrc
:
eval "$(ConvertDoc --bash-completion)"
--verbose
Give verbose debugging output. Currently this only has an effect with PDF output.
--quiet
Suppress warning messages.
--fail-if-warnings
Exit with error status if there are any warnings.
--log=
FILEWrite log messages in machine-readable JSON format to FILE. All messages above DEBUG level will be written, regardless of verbosity settings (--verbose
, --quiet
).
--list-input-formats
List supported input formats, one per line.
--list-output-formats
List supported output formats, one per line.
--list-extensions
[=
FORMAT]List supported extensions for FORMAT, one per line, preceded by a +
or -
indicating whether it is enabled by default in FORMAT. If FORMAT is not specified, defaults for ConvertDoc's Markdown are given.
--list-highlight-languages
List supported languages for syntax highlighting, one per line.
--list-highlight-styles
List supported styles for syntax highlighting, one per line. See --highlight-style
.
-v
, --version
Print version.
-h
, --help
Show usage message.
--shift-heading-level-by=
NUMBERShift heading levels by a positive or negative integer. For example, with --shift-heading-level-by=-1
, level 2 headings become level 1 headings, and level 3 headings become level 2 headings. Headings cannot have a level less than 1, so a heading that would be shifted below level 1 becomes a regular paragraph. Exception: with a shift of -N, a level-N heading at the beginning of the document replaces the metadata title. --shift-heading-level-by=-1
is a good choice when converting HTML or Markdown documents that use an initial level-1 heading for the document title and level-2+ headings for sections. --shift-heading-level-by=1
may be a good choice for converting Markdown documents that use level-1 headings for sections to HTML, since ConvertDoc uses a level-1 heading to render the document title.
--base-header-level=
NUMBERDeprecated. Use --shift-heading-level-by
=X instead, where X = NUMBER - 1. Specify the base level for headings (defaults to 1).
--strip-empty-paragraphs
Deprecated. Use the +empty_paragraphs
extension instead. Ignore paragraphs with no content. This option is useful for converting word processing documents where users have used empty paragraphs to create inter-paragraph space.
--indented-code-classes=
CLASSESSpecify classes to use for indented code blocks or example, perl,numberLines
or haskell
. Multiple classes may be separated by spaces or commas.
--default-image-extension=
EXTENSIONSpecify a default extension to use when image paths/URLs have no extension. This allows you to use the same source for formats that require different kinds of images. Currently this option only affects the Markdown and LaTeX readers.
--file-scope
Parse each file individually before combining for multifile documents. This will allow footnotes in different files with the same identifiers to work as expected. If this option is set, footnotes and links will not work across files. Reading binary files (docx, odt, epub) implies --file-scope
.
-F
PROGRAM, --filter=
PROGRAMSpecify an executable to be used as a filter transforming the ConvertDoc AST after the input is parsed and before the output is written. The executable should read JSON from stdin and write JSON to stdout. The JSON must be formatted like ConvertDoc's own JSON input and output. The name of the output format will be passed to the filter as the first argument. Hence,
ConvertDoc --filter ./caps.py -t latex
is equivalent to
ConvertDoc -t json | ./caps.py latex | ConvertDoc -f json -t latex
The latter form may be useful for debugging filters.
Filters may be written in any language. Text.ConvertDoc.JSON
exports toJSONFilter
to facilitate writing filters in Haskell. Those who would prefer to write filters in python can use the module ConvertDocfilters
, installable from PyPI. There are also ConvertDoc filter libraries in PHP, perl, and JavaScript/node.js.
In order of preference, ConvertDoc will look for filters in
a specified full or relative path (executable or non-executable)
$DATADIR/filters
(executable or non-executable) where $DATADIR
is the user data directory (see --data-dir
, above).
$PATH
(executable only)
Filters and Lua-filters are applied in the order specified on the command line.
-L
SCRIPT, --lua-filter=
SCRIPTTransform the document in a similar fashion as JSON filters (see --filter
), but use ConvertDoc's build-in Lua filtering system. The given Lua script is expected to return a list of Lua filters which will be applied in order. Each Lua filter must contain element-transforming functions indexed by the name of the AST element on which the filter function should be applied.
The ConvertDoc
Lua module provides helper functions for element creation. It is always loaded into the script's Lua environment.
The following is an example Lua script for macro-expansion:
function expand_hello_world(inline)
if inline.c == '{{helloworld}}' then
return ConvertDoc.Emph{ ConvertDoc.Str "Hello, World" }
else
return inline
end
end
return {{Str = expand_hello_world}}
In order of preference, ConvertDoc will look for Lua filters in
a specified full or relative path (executable or non-executable)
$DATADIR/filters
(executable or non-executable) where $DATADIR
is the user data directory (see --data-dir
, above).
-M
KEY[=
VAL], --metadata=
KEY[:
VAL]Set the metadata field KEY to the value VAL. A value specified on the command line overrides a value specified in the document using YAML metadata blocks. Values will be parsed as YAML boolean or string values. If no value is specified, the value will be treated as Boolean true. Like --variable
, --metadata
causes template variables to be set. But unlike --variable
, --metadata
affects the metadata of the underlying document (which is accessible from filters and may be printed in some output formats) and metadata values will be escaped when inserted into the template.
--metadata-file=
FILERead metadata from the supplied YAML (or JSON) file. This option can be used with every input format, but string scalars in the YAML file will always be parsed as Markdown. Generally, the input will be handled the same as in YAML metadata blocks. This option can be used repeatedly to include multiple metadata files; values in files specified later on the command line will be preferred over those specified in earlier files. Metadata values specified inside the document, or by using -M
, overwrite values specified with this option.
-p
, --preserve-tabs
Preserve tabs instead of converting them to spaces. (By default, ConvertDoc converts tabs to spaces before parsing its input.) Note that this will only affect tabs in literal code spans and code blocks. Tabs in regular text are always treated as spaces.
--tab-stop=
NUMBERSpecify the number of spaces per tab (default is 4).
--track-changes=accept
|reject
|all
Specifies what to do with insertions, deletions, and comments produced by the MS Word "Track Changes鈥?feature. accept
(the default), inserts all insertions, and ignores all deletions. reject
inserts all deletions and ignores insertions. Both accept
and reject
ignore comments. all
puts in insertions, deletions, and comments, wrapped in spans with insertion
, deletion
, comment-start
, and comment-end
classes, respectively. The author and time of change is included. all
is useful for scripting: only accepting changes from a certain reviewer, say, or before a certain date. If a paragraph is inserted or deleted, track-changes=all
produces a span with the class paragraph-insertion
/paragraph-deletion
before the affected paragraph break. This option only affects the docx reader.
--extract-media=
DIRExtract images and other media contained in or linked from the source document to the path DIR, creating it if necessary, and adjust the images references in the document so they point to the extracted files. If the source format is a binary container (docx, epub, or odt), the media is extracted from the container and the original filenames are used. Otherwise the media is read from the file system or downloaded, and new filenames are constructed based on SHA1 hashes of the contents.
--abbreviations=
FILESpecifies a custom abbreviations file, with abbreviations one to a line. If this option is not specified, ConvertDoc will read the data file abbreviations
from the user data directory or fall back on a system default. To see the system default, use ConvertDoc --print-default-data-file=abbreviations
. The only use ConvertDoc makes of this list is in the Markdown reader. Strings ending in a period that are found in this list will be followed by a nonbreaking space, so that the period will not produce sentence-ending space in formats like LaTeX.
-s
, --standalone
Produce output with an appropriate header and footer (e.g. a standalone HTML, LaTeX, TEI, or RTF file, not a fragment). This option is set automatically for pdf
, epub
, epub3
, fb2
, docx
, and odt
output. For native
output, this option causes metadata to be included; otherwise, metadata is suppressed.
--template=
FILE|URLUse the specified file as a custom template for the generated document. Implies --standalone
. See Templates, below, for a description of template syntax. If no extension is specified, an extension corresponding to the writer will be added, so that --template=special
looks for special.html
for HTML output. If the template is not found, ConvertDoc will search for it in the templates
subdirectory of the user data directory (see --data-dir
). If this option is not used, a default template appropriate for the output format will be used (see -D/--print-default-template
).
-V
KEY[=
VAL], --variable=
KEY[:
VAL]Set the template variable KEY to the value VAL when rendering the document in standalone mode. If no VAL is specified, the key will be given the value true
.
-D
FORMAT, --print-default-template=
FORMATPrint the system default template for an output FORMAT. (See -t
for a list of possible FORMATs.) Templates in the user data directory are ignored. This option may be used with -o
/--output
to redirect output to a file, but -o
/--output
must come before --print-default-template
on the command line.
Note that some of the default templates use partials, for example styles.html
. To print the partials, use --print-default-data-file
: for example, --print-default-data-file=templates/styles.html
.
--print-default-data-file=
FILEPrint a system default data file. Files in the user data directory are ignored. This option may be used with -o
/--output
to redirect output to a file, but -o
/--output
must come before --print-default-data-file
on the command line.
--eol=crlf
|lf
|native
Manually specify line endings: crlf
(Windows), lf
(macOS/Linux/UNIX), or native
(line endings appropriate to the OS on which ConvertDoc is being run). The default is native
.
--dpi
=NUMBERSpecify the default dpi (dots per inch) value for conversion from pixels to inch/centimeters and vice versa. (Technically, the correct term would be ppi: pixels per inch.) The default is 96dpi. When images contain information about dpi internally, the encoded value is used instead of the default specified by this option.
--wrap=auto
|none
|preserve
Determine how text is wrapped in the output (the source code, not the rendered version). With auto
(the default), ConvertDoc will attempt to wrap lines to the column width specified by --columns
(default 72). With none
, ConvertDoc will not wrap lines at all. With preserve
, ConvertDoc will attempt to preserve the wrapping from the source document (that is, where there are nonsemantic newlines in the source, there will be nonsemantic newlines in the output as well). Automatic wrapping does not currently work in HTML output. In ipynb
output, this option affects wrapping of the contents of markdown cells.
--columns=
NUMBERSpecify length of lines in characters. This affects text wrapping in the generated source code (see --wrap
). It also affects calculation of column widths for plain text tables (see Tables below).
--toc
, --table-of-contents
Include an automatically generated table of contents (or, in the case of latex
, context
, docx
, odt
, opendocument
, rst
, or ms
, an instruction to create one) in the output document. This option has no effect unless -s/--standalone
is used, and it has no effect on man
, docbook4
, docbook5
, or jats
output.
Note that if you are producing a PDF via ms
, the table of contents will appear at the beginning of the document, before the title. If you would prefer it to be at the end of the document, use the option --pdf-engine-opt=--no-toc-relocation
.
--toc-depth=
NUMBERSpecify the number of section levels to include in the table of contents. The default is 3 (which means that level-1, 2, and 3 headings will be listed in the contents).
--strip-comments
Strip out HTML comments in the Markdown or Textile source, rather than passing them on to Markdown, Textile or HTML output as raw HTML. This does not apply to HTML comments inside raw HTML blocks when the markdown_in_html_blocks
extension is not set.
--no-highlight
Disables syntax highlighting for code blocks and inlines, even when a language attribute is given.
--highlight-style=
STYLE|FILESpecifies the coloring style to be used in highlighted source code. Options are pygments
(the default), kate
, monochrome
, breezeDark
, espresso
, zenburn
, haddock
, and tango
. For more information on syntax highlighting in ConvertDoc, see Syntax highlighting, below. See also --list-highlight-styles
.
Instead of a STYLE name, a JSON file with extension .theme
may be supplied. This will be parsed as a KDE syntax highlighting theme and (if valid) used as the highlighting style.
To generate the JSON version of an existing style, use --print-highlight-style
.
--print-highlight-style=
STYLE|FILEPrints a JSON version of a highlighting style, which can be modified, saved with a .theme
extension, and used with --highlight-style
. This option may be used with -o
/--output
to redirect output to a file, but -o
/--output
must come before --print-highlight-style
on the command line.
--syntax-definition=
FILEInstructs ConvertDoc to load a KDE XML syntax definition file, which will be used for syntax highlighting of appropriately marked code blocks. This can be used to add support for new languages or to use altered syntax definitions for existing languages. This option may be repeated to add multiple syntax definitions.
-H
FILE, --include-in-header=
FILE|URLInclude contents of FILE, verbatim, at the end of the header. This can be used, for example, to include special CSS or JavaScript in HTML documents. This option can be used repeatedly to include multiple files in the header. They will be included in the order specified. Implies --standalone
.
-B
FILE, --include-before-body=
FILE|URLInclude contents of FILE, verbatim, at the beginning of the document body (e.g.聽after the <body>
tag in HTML, or the \begin{document}
command in LaTeX). This can be used to include navigation bars or banners in HTML documents. This option can be used repeatedly to include multiple files. They will be included in the order specified. Implies --standalone
.
-A
FILE, --include-after-body=
FILE|URLInclude contents of FILE, verbatim, at the end of the document body (before the </body>
tag in HTML, or the \end{document}
command in LaTeX). This option can be used repeatedly to include multiple files. They will be included in the order specified. Implies --standalone
.
--resource-path=
SEARCHPATHList of paths to search for images and other resources. The paths should be separated by :
on Linux, UNIX, and macOS systems, and by ;
on Windows. If --resource-path
is not specified, the default resource path is the working directory. Note that, if --resource-path
is specified, the working directory must be explicitly listed or it will not be searched. For example: --resource-path=.:test
will search the working directory and the test
subdirectory, in that order.
--resource-path
only has an effect if (a) the output format embeds images (for example, docx
, pdf
, or html
with --self-contained
) or (b) it is used together with --extract-media
.
--request-header=
NAME:
VALSet the request header NAME to the value VAL when making HTTP requests (for example, when a URL is given on the command line, or when resources used in a document must be downloaded). If you're behind a proxy, you also need to set the environment variable http_proxy
to http://...
.
--self-contained
Produce a standalone HTML file with no external dependencies, using data:
URIs to incorporate the contents of linked scripts, stylesheets, images, and videos. Implies --standalone
. The resulting file should be "self-contained", in the sense that it needs no external files and no net access to be displayed properly by a browser. This option works only with HTML output formats, including html4
, html5
, html+lhs
, html5+lhs
, s5
, slidy
, slideous
, dzslides
, and revealjs
. Scripts, images, and stylesheets at absolute URLs will be downloaded; those at relative URLs will be sought relative to the working directory (if the first source file is local) or relative to the base URL (if the first source file is remote). Elements with the attribute data-external="1"
will be left alone; the documents they link to will not be incorporated in the document. Limitation: resources that are loaded dynamically through JavaScript cannot be incorporated; as a result, --self-contained
does not work with --mathjax
, and some advanced features (e.g. zoom or speaker notes) may not work in an offline "self-contained" reveal.js
slide show.
--html-q-tags
Use <q>
tags for quotes in HTML.
--ascii
Use only ASCII characters in output. Currently supported for XML and HTML formats (which use entities instead of UTF-8 when this option is selected), CommonMark, gfm, and Markdown (which use entities), roff ms (which use hexadecimal escapes), and to a limited degree LaTeX (which uses standard commands for accented characters when possible). roff man output uses ASCII by default.
--reference-links
Use reference-style links, rather than inline links, in writing Markdown or reStructuredText. By default inline links are used. The placement of link references is affected by the --reference-location
option.
--reference-location = block
|section
|document
Specify whether footnotes (and references, if reference-links
is set) are placed at the end of the current (top-level) block, the current section, or the document. The default is document
. Currently only affects the markdown writer.
--atx-headers
Use ATX-style headings in Markdown output. The default is to use setext-style headings for levels 1 to 2, and then ATX headings. (Note: for gfm
output, ATX headings are always used.) This option also affects markdown cells in ipynb
output.
--top-level-division=[default|section|chapter|part]
Treat top-level headings as the given division type in LaTeX, ConTeXt, DocBook, and TEI output. The hierarchy order is part, chapter, then section; all headings are shifted such that the top-level heading becomes the specified type. The default behavior is to determine the best division type via heuristics: unless other conditions apply, section
is chosen. When the documentclass
variable is set to report
, book
, or memoir
(unless the article
option is specified), chapter
is implied as the setting for this option. If beamer
is the output format, specifying either chapter
or part
will cause top-level headings to become \part{..}
, while second-level headings remain as their default type.
-N
, --number-sections
Number section headings in LaTeX, ConTeXt, HTML, or EPUB output. By default, sections are not numbered. Sections with class unnumbered
will never be numbered, even if --number-sections
is specified.
--number-offset=
NUMBER[,
NUMBER,
]Offset for section headings in HTML output (ignored in other output formats). The first number is added to the section number for top-level headings, the second for second-level headings, and so on. So, for example, if you want the first top-level heading in your document to be numbered specify --number-offset=5
. If your document starts with a level-2 heading which you want to be numbered 5 specify --number-offset=1,4
. Offsets are 0 by default. Implies --number-sections
.
--listings
Use the listings
package for LaTeX code blocks. The package does not support multi-byte encoding for source code. To handle UTF-8 you would need to use a custom template. This issue is fully documented here: Encoding issue with the listings package.
-i
, --incremental
Make list items in slide shows display incrementally (one by one). The default is for lists to be displayed all at once.
--slide-level=
NUMBERSpecifies that headings with the specified level create slides (for beamer
, s5
, slidy
, slideous
, dzslides
). Headings above this level in the hierarchy are used to divide the slide show into sections; headings below this level create subheads within a slide. Note that content that is not contained under slide-level headings will not appear in the slide show. The default is to set the slide level based on the contents of the document; see Structuring the slide show.
--section-divs
Wrap sections in <section>
tags (or <div>
tags for html4
), and attach identifiers to the enclosing <section>
(or <div>
) rather than the heading itself. See Heading identifiers, below.
--email-obfuscation=none
|javascript
|references
Specify a method for obfuscating mailto:
links in HTML documents. none
leaves mailto:
links as they are. javascript
obfuscates them using JavaScript. references
obfuscates them by printing their letters as decimal or hexadecimal character references. The default is none
.
--id-prefix=
STRINGSpecify a prefix to be added to all identifiers and internal links in HTML and DocBook output, and to footnote numbers in Markdown and Haddock output. This is useful for preventing duplicate identifiers when generating fragments to be included in other pages.
-T
STRING, --title-prefix=
STRINGSpecify STRING as a prefix at the beginning of the title that appears in the HTML header (but not in the title as it appears at the beginning of the HTML body). Implies --standalone
.
-c
URL, --css=
URLLink to a CSS style sheet. This option can be used repeatedly to include multiple files. They will be included in the order specified.
A stylesheet is required for generating EPUB. If none is provided using this option (or the css
or stylesheet
metadata fields), ConvertDoc will look for a file epub.css
in the user data directory (see --data-dir
). If it is not found there, sensible defaults will be used.
--reference-doc=
FILEUse the specified file as a style reference in producing a docx or ODT file.
For best results, the reference docx should be a modified version of a docx file produced using ConvertDoc. The contents of the reference docx are ignored, but its stylesheets and document properties (including margins, page size, header, and footer) are used in the new docx. If no reference docx is specified on the command line, ConvertDoc will look for a file reference.docx
in the user data directory (see --data-dir
). If this is not found either, sensible defaults will be used.
To produce a custom reference.docx
, first get a copy of the default reference.docx
: ConvertDoc -o custom-reference.docx --print-default-data-file reference.docx
. Then open custom-reference.docx
in Word, modify the styles as you wish, and save the file. For best results, do not make changes to this file other than modifying the styles used by ConvertDoc:
Paragraph styles:
Character styles:
Table style:
For best results, the reference ODT should be a modified version of an ODT produced using ConvertDoc. The contents of the reference ODT are ignored, but its stylesheets are used in the new ODT. If no reference ODT is specified on the command line, ConvertDoc will look for a file reference.odt
in the user data directory (see --data-dir
). If this is not found either, sensible defaults will be used.
To produce a custom reference.odt
, first get a copy of the default reference.odt
: ConvertDoc -o custom-reference.odt --print-default-data-file reference.odt
. Then open custom-reference.odt
in LibreOffice, modify the styles as you wish, and save the file.
Templates included with Microsoft PowerPoint 2013 (either with .pptx
or .potx
extension) are known to work, as are most templates derived from these.
The specific requirement is that the template should begin with the following first four layouts:
All templates included with a recent version of MS PowerPoint will fit these criteria. (You can click on Layout
under the Home
menu to check.)
You can also modify the default reference.pptx
: first run ConvertDoc -o custom-reference.pptx --print-default-data-file reference.pptx
, and then modify custom-reference.pptx
in MS PowerPoint (ConvertDoc will use the first four layout slides, as mentioned above).
--epub-cover-image=
FILEUse the specified image as the EPUB cover. It is recommended that the image be less than 1000px in width and height. Note that in a Markdown source document you can also specify cover-image
in a YAML metadata block (see EPUB Metadata, below).
--epub-metadata=
FILELook in the specified XML file for metadata for the EPUB. The file should contain a series of Dublin Core elements. For example:
<dc:rights>Creative Commons</dc:rights>
<dc:language>es-AR</dc:language>
By default, ConvertDoc will include the following metadata elements: <dc:title>
(from the document title), <dc:creator>
(from the document authors), <dc:date>
(from the document date, which should be in ISO 8601 format), <dc:language>
(from the lang
variable, or, if is not set, the locale), and <dc:identifier id="BookId">
(a randomly generated UUID). Any of these may be overridden by elements in the metadata file.
Note: if the source document is Markdown, a YAML metadata block in the document can be used instead. See below under EPUB Metadata.
--epub-embed-font=
FILEEmbed the specified font in the EPUB. This option can be repeated to embed multiple fonts. Wildcards can also be used: for example, DejaVuSans-*.ttf
. However, if you use wildcards on the command line, be sure to escape them or put the whole filename in single quotes, to prevent them from being interpreted by the shell. To use the embedded fonts, you will need to add declarations like the following to your CSS (see --css
):
@font-face {
font-family: DejaVuSans;
font-style: normal;
font-weight: normal;
src:url("DejaVuSans-Regular.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: normal;
font-weight: bold;
src:url("DejaVuSans-Bold.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: italic;
font-weight: normal;
src:url("DejaVuSans-Oblique.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: italic;
font-weight: bold;
src:url("DejaVuSans-BoldOblique.ttf");
}
body { font-family: "DejaVuSans"; }
--epub-chapter-level=
NUMBERSpecify the heading level at which to split the EPUB into separate "chapter" files. The default is to split into chapters at level-1 headings. This option only affects the internal composition of the EPUB, not the way chapters and sections are displayed to users. Some readers may be slow if the chapter files are too large, so for large documents with few level-1 headings, one might want to use a chapter level of 2 or 3.
--epub-subdirectory=
DIRNAMESpecify the subdirectory in the OCF container that is to hold the EPUB-specific contents. The default is EPUB
. To put the EPUB contents in the top level, use an empty string.
--ipynb-output=all|none|best
Determines how ipynb output cells are treated. all
means that all of the data formats included in the original are preserved. none
means that the contents of data cells are omitted. best
causes ConvertDoc to try to pick the richest data block in each output cell that is compatible with the output format. The default is best
.
--pdf-engine=
PROGRAMUse the specified engine when producing PDF output. Valid values are pdflatex
, lualatex
, xelatex
, latexmk
, tectonic
, wkhtmltopdf
, weasyprint
, prince
, context
, and pdfroff
. If the engine is not in your PATH, the full path of the engine may be specified here. If this option is not specified, ConvertDoc uses the following defaults depending on the output format specified using -t/--to
:
-t latex
or none: pdflatex
(other options: xelatex
, lualatex
, tectonic
, latexmk
)-t context
: context
-t html
: wkhtmltopdf
(other options: prince
, weasyprint
)-t ms
: pdfroff
--pdf-engine-opt=
STRINGUse the given string as a command-line argument to the pdf-engine
. For example, to use a persistent directory foo
for latexmk
's auxiliary files, use --pdf-engine-opt=-outdir=foo
. Note that no check for duplicate options is done.
--bibliography=
FILESet the bibliography
field in the document's metadata to FILE, overriding any value set in the metadata, and process citations using ConvertDoc-citeproc
. (This is equivalent to --metadata bibliography=FILE --filter ConvertDoc-citeproc
.) If --natbib
or --biblatex
is also supplied, ConvertDoc-citeproc
is not used, making this equivalent to --metadata bibliography=FILE
. If you supply this argument multiple times, each FILE will be added to bibliography.
--csl=
FILESet the csl
field in the document's metadata to FILE, overriding any value set in the metadata. (This is equivalent to --metadata csl=FILE
.) This option is only relevant with ConvertDoc-citeproc
.
--citation-abbreviations=
FILESet the citation-abbreviations
field in the document's metadata to FILE, overriding any value set in the metadata. (This is equivalent to --metadata citation-abbreviations=FILE
.) This option is only relevant with ConvertDoc-citeproc
.
--natbib
Use natbib
for citations in LaTeX output. This option is not
for use with the ConvertDoc-citeproc
filter or with PDF output. It
is intended for use in producing a LaTeX file that can be processed with
bibtex
.
--biblatex
Use biblatex
for citations in LaTeX output. This option is
not for use with the ConvertDoc-citeproc
filter or with PDF output.
It is intended for use in producing a LaTeX file that can be processed with
bibtex
or biber
.
The default is to render TeX math as far as possible using Unicode characters. Formulas are put inside a span
with class="math"
, so that they may be styled differently from the surrounding text if needed. However, this gives acceptable results only for basic math, usually you will want to use --mathjax
or another of the following options.
--mathjax
[=
URL]Use MathJax to display embedded TeX math in HTML output. TeX math will be put between \(...\)
(for inline math) or \[...\]
(for display math) and wrapped in <span>
tags with class math
. Then the MathJax JavaScript will render it. The URL should point to the MathJax.js
load script. If a URL is not provided, a link to the Cloudflare CDN will be inserted.
--mathml
Convert TeX math to MathML (in epub3
, docbook4
, docbook5
, jats
, html4
and html5
). This is the default in odt
output. Note that currently only Firefox and Safari (and select e-book readers) natively support MathML.
--webtex
[=
URL]Convert TeX formulas to <img>
tags that link to an external script that converts formulas to images. The formula will be URL-encoded and concatenated with the URL provided. For SVG images you can for example use --webtex https://latex.codecogs.com/svg.latex?
. If no URL is specified, the CodeCogs URL generating PNGs will be used (https://latex.codecogs.com/png.latex?
). Note: the --webtex
option will affect Markdown output as well as HTML, which is useful if you're targeting a version of Markdown without native math support.
--katex
[=
URL]Use KaTeX to display embedded TeX math in HTML output. The URL is the base URL for the KaTeX library. That directory should contain a katex.min.js
and a katex.min.css
file. If a URL is not provided, a link to the KaTeX CDN will be inserted.
--gladtex
Enclose TeX math in <eq>
tags in HTML output. The resulting HTML can then be processed by GladTeX to produce images of the typeset formulas and an HTML file with links to these images. So, the procedure is:
ConvertDoc -s --gladtex input.md -o myfile.htex
gladtex -d myfile-images myfile.htex
# produces myfile.html and images in myfile-images
--dump-args
Print information about command-line arguments to stdout, then exit. This option is intended primarily for use in wrapper scripts. The first line of output contains the name of the output file specified with the -o
option, or -
(for stdout) if no output file was specified. The remaining lines contain the command-line arguments, one per line, in the order they appear. These do not include regular ConvertDoc options and their arguments, but do include any options appearing after a --
separator at the end of the line.
--ignore-args
Ignore command-line arguments (for use in wrapper scripts). Regular ConvertDoc options are not ignored. Thus, for example,
ConvertDoc --ignore-args -o foo.html -s foo.txt -- -e latin1
is equivalent to
ConvertDoc -o foo.html -s
If ConvertDoc completes successfully, it will return exit code 0. Nonzero exit codes have the following meanings:
Code | Error |
---|---|
3 | ConvertDocFailOnWarningError |
4 | ConvertDocAppError |
5 | ConvertDocTemplateError |
6 | ConvertDocOptionError |
21 | ConvertDocUnknownReaderError |
22 | ConvertDocUnknownWriterError |
23 | ConvertDocUnsupportedExtensionError |
31 | ConvertDocEpubSubdirectoryError |
43 | ConvertDocPDFError |
47 | ConvertDocPDFProgramNotFoundError |
61 | ConvertDocHttpError |
62 | ConvertDocShouldNeverHappenError |
63 | ConvertDocSomeError |
64 | ConvertDocParseError |
65 | ConvertDocParsecError |
66 | ConvertDocMakePDFError |
67 | ConvertDocSyntaxMapError |
83 | ConvertDocFilterError |
91 | ConvertDocMacroLoop |
92 | ConvertDocUTF8DecodingError |
93 | ConvertDocIpynbDecodingError |
97 | ConvertDocCouldNotFindDataFileError |
99 | ConvertDocResourceNotFound |
When the -s/--standalone
option is used, ConvertDoc uses a template to add header and footer material that is needed for a self-standing document. To see the default template that is used, just type
ConvertDoc -D *FORMAT*
where FORMAT is the name of the output format. A custom template can be specified using the --template
option. You can also override the system default templates for a given output format FORMAT by putting a file templates/default.*FORMAT*
in the user data directory (see --data-dir
, above). Exceptions:
odt
output, customize the default.opendocument
template.pdf
output, customize the default.latex
template (or the default.context
template, if you use -t context
, or the default.ms
template, if you use -t ms
, or the default.html
template, if you use -t html
).docx
and pptx
have no template (however, you can use --reference-doc
to customize the output).Templates contain variables, which allow for the inclusion of arbitrary information at any point in the file. They may be set at the command line using the -V/--variable
option. If a variable is not set, ConvertDoc will look for the key in the document's metadata, which can be set using either YAML metadata blocks or with the -M/--metadata
option. In addition, some variables are given default values by ConvertDoc. See Variables below for a list of variables used in ConvertDoc's default templates.
If you use custom templates, you may need to revise them as ConvertDoc changes. We recommend tracking the changes in the default templates, and modifying your custom templates accordingly. An easy way to do this is to fork the ConvertDoc-templates repository and merge in changes after each ConvertDoc release.