Skip to content

kagisearch/quickmark

Repository files navigation

quickmark

The library is based on the architecture of markdown-it-py, making it easy to port their plugins.

Dev Dependencies

You can run make install_toolchain (rust+cargo+uv)

Build with:

# Installs toolchain (cargo + uv)
make install_toolchain
# Builds entire project
make build
# OPTIONAL: Install built package in local python env
uv run maturin develop --release

NOTE: If using conda/mamba on macos, builds can break. Reinstalling mamba helps.

Usage (Rust)

let parser = &mut crate::MarkdownIt::new();
crate::plugins::cmark::add(parser);
crate::plugins::extra::add(parser);

let ast  = parser.parse("Hello **world**!");
let html = ast.render();

print!("{html}");
// prints "<p>Hello <strong>world</strong>!</p>"

Usage (python)

If you built the python package, it should be installed as quickmark in the local python environment. Syntax looks like that of markdown-it-py:

from quickmark import MDParser

md = MDParser("commonmark").enable("table")
md.render("# Hello, world!")
# '<h1>Hello, world!</h1>\n'
  • MarkdownIt("zero") will not enable any plugins.

  • MarkdownIt("commonmark") for all CommonMark plugins.

  • MarkdownIt("gfm") for CommonMark + GitHub Flavoured Markdown plugins.

Python CLI

A cli is in python/quickmark/cli.py, which can be used like this:

# replace - with filename to read from a file
# see `quickmark --help` for more
echo "# Hello, world!" | quickmark html -
# <h1>Hello, world!</h1>

echo "# Hello, world!" | quickmark ast -
# <root>
#   <heading>
#     <text>

Python AST walking

markdown-it.rs does not generate a token stream, but instead directly generates a Node tree. This is similar to the markdown-it-py's SyntaxTreeNode class, although the API is not identical. (source mapping is also provided by byte-offset, rather than line only)

md = (
  MDParser("commonmark")
  .enable("table")
  .enable_many(["linkify", "strikethrough"])
)
node = md.tree("# Hello, world!")
print(node.walk())
# [Node(root), Node(heading), Node(text)]
print(node.pretty(srcmap=True, meta=True))
# <root srcmap="0:15">
#   <heading srcmap="0:15">
#     level: 1
#     <text srcmap="2:15">
#       content: Hello, world!

Note: Attributes of the Node class, such as Node.attrs, return a copy of the underlying data, and so mutating it will not affect what is stored on the node, e.g.

from quickmark import Node
node = Node("name")
# don't do this!
node.attrs["key"] = "value"
print(node.attrs) # {}
# do this instead (Python 3.9+)
node.attrs = node.attrs | {"key": "value"}
print(node.attrs) # {"key": "value"}
# Node.children is only a shallow copy though, so this is fine
child = Node("child")
node.children = [child]
node.children[0].name = "other"
print(child.name) # "other"

WASM Build

There is a webassembly build in the example demos.

Extending

For a guide on how to extend it, see examples folder.

For translating markdown-it plugins to rust, here are some useful notes:

  • state.bMarks[startLine] + state.tShift[startLine] is equivalent to state.line_offsets[line].first_nonspace
  • state.eMarks[startLine] is equivalent to state.line_offsets[line].line_end
  • state.sCount[line] is equivalent to state.line_offsets[line].indent_nonspace
  • state.sCount[line] - state.blkIndent is equivalent to state.line_indent(state.line)

Plugins

All syntax rules in markdown-it.rs are implemented as plugins. Plugins can be added to the parser by calling enable or enable_many with the name of the plugin. The following plugins are currently supported:

CommonMark Blocks:

  • blockquote: Block quotes with >
  • code: Indented code blocks
  • fence: Backtick code blocks
  • heading: # ATX headings
  • hr: --- horizontal rules
  • lheading: --- underline setext headings
  • list: * unordered lists and 1. ordered lists
  • paragraph: Paragraphs
  • reference: Link reference definitions [id]: src "title"

CommonMark Inlines:

  • autolink: <http://example.com>
  • backticks: `code`
  • emphasis: _emphasis_, *emphasis*, **strong**, __strong__
  • entity: &amp;
  • escape: backslash escaping \
  • image: ![alt](src "title")
  • link: [text](src "title"), [text][id], [text]
  • newline: hard line breaks
  • html_block: HTML blocks
  • html_inline: HTML inline
  • sourcepos: Add source mapping to rendered HTML, looks like this: <stuff data-sourcepos="1:1-2:3">, i.e. line:col-line:col
  • replacements: Typographic replacements, like -- to
  • smartquotes: Smart quotes, like " to
  • linkify: Automatically linkify URLs with https://crates.io/crates/linkify (note currently this only matches URLs with a scheme, e.g. https://example.com)
  • heading_anchors: Add heading anchors, with defaults like GitHub
  • front_matter: YAML front matter
  • footnote: Pandoc-style footnotes (see https://pandoc.org/MANUAL.html#footnotes)

GitHub Flavoured Markdown (https://github.github.com/gfm):

  • table:

    | foo | bar |
    | --- | --- |
    | baz | bim |
  • strikethrough: ~~strikethrough~~

  • tasklist: - [x] tasklist item

  • autolink_ext: Extended autolink detection with "bare URLs" like https://example.com and www.example.com

  • tagfilter: HTML tag filtering, e.g. <script> tags are removed

About

Quick markdown to HTML parser in rust with python bindings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages