Redesign migration API around Conversion/Parse result types#30
Draft
Redesign migration API around Conversion/Parse result types#30
Conversation
Step 1 of the migration-API redesign: - Add `Markbridge::Parse` (Data.define) carrying ast, format, unknown_tags, diagnostics. Returned by `parse_*` methods. - Add `Markbridge::Conversion` (Data.define) carrying markdown, ast, format, unknown_tags, diagnostics, emissions, errors. Returned by `*_to_markdown` methods. `to_s` delegates to markdown so puts and string interpolation keep working. - Drop `Markbridge::Configuration`, `.configuration`, `.configure`, `.reset_defaults!`, and the four `default_*` memoized accessors. Per-call kwargs replace global state; `escape_hard_line_breaks` capability returns in step 3 via `Markbridge.discourse_renderer`. - Rewrite `spec/markbridge_spec.rb` for the new shape. - Update system specs to use `expect(result.markdown)`. - Strip stale doc references to the deleted singleton API.
Step 2 of the migration-API redesign. Aligns MediaWiki with the sibling parser APIs (BBCode, HTML, TextFormatter all use `handlers:`). The kwarg type is unchanged — still an `InlineTagRegistry` — only the parameter name moves. Affects: - Markbridge.parse_mediawiki / .mediawiki_to_markdown - Parsers::MediaWiki::Parser#initialize - Parsers::MediaWiki::InlineParser#initialize - Specs and docs that referenced the old name.
Step 3 of the migration-API redesign: route render-side customization (custom Tags, custom escaper) through a one-line factory plus a single per-call kwarg, instead of multiple per-call kwargs. - TagLibrary#unregister(klass): drop a binding so the renderer falls through to render_children (existing path at renderer.rb:25-33). - TagLibrary#merge(hash): bulk register, with nil values unregistering. - Markbridge.discourse_renderer(tags:, tag_library:, unregister:, escaper:, escape_hard_line_breaks:): one-line factory returning a configured Renderer. - renderer: kwarg on the four *_to_markdown methods. When given, the supplied Renderer is used; otherwise a fresh default is built per call.
Step 4 of the migration-API redesign. Tags can now record
side-channel records during rendering without mutating
constructor-injected collections, and the records surface on
Conversion#emissions.
- Renderer: @emission_buffer keyed by Symbol; record_emission appends.
Lifecycle parallels @interface_cache except the buffer is preserved
after the root call so callers can drain it via #emissions.
- RenderingInterface#emit(key, payload) — no-return-value helper Tags
call from #render. Tag's render return value is unaffected.
- with_provisional_emissions { |c| ...; c.commit if keep } —
snapshot/rollback for tags whose first render attempt may be
discarded. Used by TableTag's Markdown-vs-HTML decision so
emissions from the throwaway pass don't double-emit.
- TableTag: wraps the Markdown attempt in with_provisional_emissions;
commits only when the Markdown form survives.
- Conversion#emissions populated from renderer.emissions after render.
Specs cover: emission round-trip, buffer reset on subsequent root
calls, no-op outside render, provisional rollback when not committed,
provisional commit kept, and the TableTag double-emit regression
(both markdown-compatible and HTML-fallback paths).
Step 5 of the migration-API redesign. Custom handlers can now
delegate to defaults instead of reinventing them.
Each registry gains:
registry.overlay("a") do |default|
LinkifyingHandler.new(default:)
end
The block runs at registration time and receives the previously
bound handler (which may be nil — the block must tolerate that).
Also adds TextFormatter::HandlerRegistry#[] for parity with the
sibling registries; previously only #has_handler? and the
internal #process_element exposed the mapping.
Step 6 of the migration-API redesign. Closes the gap that made examples/custom_text_formatter_mappings.rb un-runnable. - BaseHandler#process and every TextFormatter handler subclass now accept `processor:` (defaulting to nil for backward-friendly use in custom subclasses). - HandlerRegistry#process_element forwards `processor:` and dispatches to .process for class-style handlers, .call for Proc/lambda handlers (matching how HTML's HandlerRegistry already handled lambdas). - Parser passes self as the processor argument so handlers can recurse via `processor.process_children(element, ast_node)`. - Repaired the example file: lambdas now return either the AST node (so the parser auto-processes children) or nil (for leaf nodes); they don't return the NodeSet from process_children (which fed back into parser.rb:106 and double-processed). Specs cover lambda dispatch and the regression where a registered handler returning nil must not be tracked as unknown.
Step 7 of the migration-API redesign. Brings MediaWiki to parity with BBCode/HTML/TextFormatter parsers — unknown inline tags now appear in Parse#unknown_tags / Conversion#unknown_tags. - InlineParser shares a single unknown_tags Hash with depth-recursive child instances so nested parses contribute to the same tally. - Only opening tags whose names aren't in the registry get tracked; closing forms and self-closing tags pass through silently (they typically pair with an opening already counted). - Parser exposes attr_reader :unknown_tags and clears it at the start of every #parse call. - Markbridge.parse_mediawiki snapshots parser.unknown_tags into the Parse result.
Step 8 of the migration-API redesign.
- Markbridge.convert(input, format:, **kwargs) — thin dispatcher
over the four *_to_markdown methods. Useful when format is
data-driven (e.g. iterating posts whose :format column varies).
Raises ArgumentError on unknown formats.
- Markbridge.render(ast, format: :discourse, renderer:) — renders
an existing AST. Returns a Conversion with format: :discourse,
unknown_tags/diagnostics: {} (parser-side data not available
when starting from an AST).
Step 9 of the migration-API redesign. The cleanup logic that ran inside Markbridge#cleanup_markdown is now a tiny replaceable collaborator on the Renderer. - Renderers::Discourse::Postprocessor#call(text) — collapses 3+ newlines, removes whitespace-only lines, strips. DEFAULT is a module-level singleton instance. - Renderer accepts postprocessor: kwarg, defaulting to Postprocessor::DEFAULT. - Markbridge.discourse_renderer accepts postprocessor: and forwards it to the Renderer. - Markbridge#build_conversion and #render now go through renderer.postprocessor.call(...) instead of a private cleanup_markdown method. Custom postprocessors compose by subclassing Postprocessor and overriding #call. Spec exercises the end-to-end path.
Step 10 of the migration-API redesign. RawHandler used to call @element_class.new(language:) unconditionally, which forced any downstream AST class reused with RawHandler to declare a language: kwarg even when irrelevant. The handler now introspects the AST class's #initialize parameters once and caches the result, only passing language: when the class actually accepts it. AST::Code keeps its language: attribute; ImageHandler overrides #create_element so it's unaffected.
Step 11 of the migration-API redesign. Importers can now isolate per-row failures from the conversion loop instead of crashing on a single bad post. - raise_on_error: kwarg on the four *_to_markdown methods and on Markbridge.render. Defaults to true (preserves current behavior — render-time exceptions propagate). - When raise_on_error: false, render-time StandardError is caught, the Conversion's markdown comes back as "" (whatever partial output had been built before the failure), and the exception is surfaced via Conversion#errors.
Step 12 of the migration-API redesign. - examples/forum_migration.rb (NEW) — canonical end-to-end importer shape exercising every new path: discourse_renderer factory, custom Tag, FontHandler/FontNode, overlay, unregister:, emit, Conversion#emissions/#errors/#unknown_tags, raise_on_error: false, Markbridge.convert(format:) dispatch, custom escaper subclass. - UPGRADING.md (NEW) — covers Conversion vs String returns, removed singleton API, removed per-call render-side kwargs, MediaWiki kwarg rename, TextFormatter processor: addition, emit migration shape, RawHandler language: relaxation, raise_on_error:. - examples/basic_usage.rb — rewritten for the new API (Conversion result, discourse_renderer factory). - docs/extending.md — surfaces the auto-passthrough fact for unregistered AST classes and the unregister: primitive.
Fixes the lint CI failure on PR #30 — six files had drifted from syntax_tree's expected format. No semantic changes.
Inline snapshot_emissions/rollback_emissions into the only caller (with_provisional_emissions), drop the lazy buffer init, and guarantee @emission_buffer is always present from the constructor. This removes the dead-code guards mutant correctly identifies as unkillable, and fixes the unit-spec entry path that built a Renderer + RenderingInterface manually without going through #render (those tests crashed once #with_provisional_emissions stopped tolerating a nil buffer). Also normalize TagLibrary#merge to an explicit each_pair / if-else form so mutant's hash-iteration mutations have a single matching shape to target. Renderer#emissions now returns transform_values dup of the always-present buffer; its no-buffer fallback was dead. Updated the renderer_spec example that asserted "ignores emissions made outside a render call" — emissions from before the first render are now retained until the first root call clears them.
Mutant flagged the convenience methods because the tests verified "this returns a Conversion/Parse" but didn't pin every field of the Data result. The mutations replaced individual fields with nil or wrong values; tests now pin them. - Drop redundant `handlers ||= ...` fallbacks in parse_bbcode / parse_html / parse_text_formatter_xml — every parser already defaults to its own .default registry when handed nil. - Drop the .dup on parser.unknown_tags / parser.unclosed_raw_tags in the convenience methods. Each convenience call constructs a fresh parser, so there's no aliasing risk through the public API; the .dup was equivalent through Markbridge.* and mutant correctly identified it as such. Direct users of Parsers::*::Parser still see the original (un-dup'd) hashes which is the documented contract. - Add tests pinning Conversion#ast, #format, #diagnostics, #unknown_tags shape; per-method "raise_on_error defaults to true" tests; #ast type via to_s coercion path. - Make build_conversion's `renderer:` and `raise_on_error:` required kwargs — every caller already passes them explicitly, so the defaults were dead code.
Covers eight more subjects (Markbridge.convert, .render, .discourse_renderer; Renderer#emissions, #initialize, #with_provisional_emissions; RenderingInterface#emit, #with_provisional_emissions; Postprocessor#call) by adding focused tests rather than expanding ignore lists. Notable additions: - Per-format `convert` kwargs-forwarding tests (one per branch). - `render` field-shape tests for ast/format/unknown_tags/ diagnostics/errors and a "raise_on_error defaults to true" test. - discourse_renderer tests for tag_library: base, postprocessor: forwarding. - Renderer specs: postprocessor default fallback; deep snapshot semantics on emissions and with_provisional_emissions; explicit return-value test for with_provisional_emissions; postprocessor swap end-to-end via Markbridge.bbcode_to_markdown. - RenderingInterface unit specs for #emit (delegation order, always-nil return) and #with_provisional_emissions (block-forwarding via and_yield). - Postprocessor: multi-run gsub vs sub regression.
Final batch of mutation-coverage fixes for the migration-API redesign. Covers 8 remaining subjects: - HandlerRegistry#overlay (BBCode and HTML): array-of-names test pinning that each name's *previous* handler is yielded individually (kills the `Array(x).each` -> `[x].each` mutation). - RawHandler#accepts_language?: drop the introspection cache (mutant correctly identified the cached/uncached branches as observably equivalent — the AST class never changes for the handler's lifetime, but the cache is unobservable through the public API). Add a public-API test using an AST class with non-:language kwargs to pin the predicate's contract. - TextFormatter::HandlerRegistry#process_element: switch @mappings[name.upcase] to self[name] — equivalent path through the existing #[] (which already upcases), and removes a mutation surface. - MediaWiki::Parser#parse: tests pinning the unknown_tags clear on re-parse, line-ending normalization (Unicode separator), and handlers: kwarg propagation into InlineParser. - TagLibrary#merge: regression test that `nil` value triggers *deletion*, observable through #each iteration (registering `nil` would leave the class in iteration with a nil value). - TableTag#empty_table?: pin the "rows-less-children" path with a Table whose only child is a stray Text node. - TableTag#render: simplify `next nil` -> `next` (equivalent).
Two new shapes for callers that need to modify, append to, or
replace the AST between parse and render — without dropping to the
parser/renderer classes directly.
- Markbridge.render now accepts a Parse OR an AST::Node. Given a
Parse, the resulting Conversion preserves the parser's
unknown_tags, diagnostics, and source format; given an AST node
the fields default to empty / :discourse (current behavior).
- *_to_markdown methods (bbcode/html/text_formatter_xml/mediawiki)
and Markbridge.convert accept an optional block. The block is
yielded the parsed AST between parse and render; mutations
persist in Conversion#ast.
The block form reads cleanly in migration loops:
Markbridge.convert(post.body, format: post.format, renderer:) do |ast|
orphan_attachments.each { |a| ast << AttachmentRef.new(a.id) }
end
Updated forum_migration.rb to demo the orphan-attachment pattern
and UPGRADING.md to document both shapes.
Mutant: 100% on the touched subjects (Markbridge.render, .convert,
and all four *_to_markdown methods).
gschlager
added a commit
that referenced
this pull request
May 6, 2026
Pivot the docs site to the task-oriented "I'm migrating a forum to Discourse" perspective and bring every page in sync with the migration- API redesign coming in PR #30. New top-level "Migrating to Discourse" section with overview, placeholders, and a full walkthrough that ports examples/forum_migration.rb end-to-end. The placeholders page makes explicit what was implicit: Tag return strings are spliced verbatim — only AST::Text goes through the escaper — so placeholder sigils reach the output untouched. The old guides/configuration.md is gone; renderer-side knobs move to a new customization/customizing-renderer.md describing the Markbridge.discourse_renderer(...) factory. PR30's UPGRADING.md is ported under reference/upgrading.md. Existing pages updated for: Conversion return type, removed Markbridge.configure global, MediaWiki handlers: kwarg rename, TextFormatter handler processor: arg, interface.emit and html_mode? on the rendering interface table, the build-once / reuse-many renderer pattern. MIGRATION_API_PLAN.md updated to reflect §1-3 implemented in PR30, §4 (DSL) considered and rejected, and the resolved open questions. Every code sample in the migration section was verified against PR30's branch via a sibling worktree before commit.
Two new escaper customisation knobs that replace the subclass-and-override pattern from importers like Liferay/Wolfram. - MarkdownEscaper.new(allow:) accepts a Symbol or Array of Symbols. Recognised keys: :bullet_list, :ordered_list, :atx_heading, :block_quote. Alias :lists expands to [:bullet_list, :ordered_list]. Unknown keys raise. Storage is a private Array (≤4 elements) so #include? is O(N) with N ≤ 4 — observably identical to a Set and avoids the allocation. Hot-path lookup is on the cold side of each block arm, so no measurable bench impact. - pass_first_char_inline / pass_marker_inline helpers preserve the marker verbatim and inline-escape the rest, so e.g. `* item` with allow: :bullet_list comes out as `* item` (the star is NOT re-escaped by escape_inline). - IdentityEscaper: a tiny class whose #escape(text) returns text || "". Public so callers can point at it. - discourse_renderer(escape: false) routes to IdentityEscaper. Mutually exclusive with escape_hard_line_breaks: / allow: — raises ArgumentError if combined. An explicit escaper: still wins (mirrors the existing precedence). forum_migration.rb now uses `allow: :lists` instead of an inline `ListPermissiveEscaper < MarkdownEscaper` subclass. Specs cover each key, alias, error path, edge cases (7-hash non-heading, empty `# `, setext underlines under :bullet_list), and the IdentityEscaper / mutual-exclusion paths. Mutant: 100% on the touched subjects (MarkdownEscaper#initialize, #resolve_allow, #pass_first_char_inline, #pass_marker_inline, #escape_block_ordered_list, IdentityEscaper#escape, Markbridge.discourse_renderer, Markbridge.build_escaper). Bench: 0.014x noise vs main on the no-allow path (within measurement error).
Adds two unit specs under MarkdownEscaper#initialize that pin the default `escape_hard_line_breaks: false` (preserves trailing-space hard breaks) vs. the explicit `true` form (strips them). The markbridge_spec already exercised both shapes through the discourse_renderer factory but those tests aren't selected for the MarkdownEscaper#initialize subject under mutant's description-matching test selection.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR reshapes the Markbridge public API to return structured result objects (
ParseandConversion) instead of raw strings/ASTs, and consolidates render-side customization into a singlerenderer:parameter. The changes eliminate the global configuration singleton and per-process default registries in favor of explicit, reusable renderer instances. Two follow-up rounds added AST-mutation hooks (block form on*_to_markdown, polymorphicMarkbridge.render) and selective escaper customisation (allow:kwarg +IdentityEscaper) that replace the subclass-and-override pattern from importers like Liferay/Wolfram.Key Changes
New result types: Introduced
Parse(for parse-only calls) andConversion(for render calls) —Data.definevalue objects wrapping the AST, markdown output, format, unknown tags, diagnostics, emissions, and errorsparse_bbcode,parse_html,parse_text_formatter_xml,parse_mediawikinow returnParseobjectsbbcode_to_markdown,html_to_markdown,text_formatter_xml_to_markdown,mediawiki_to_markdownnow returnConversionobjectsConversion#to_sdelegates tomarkdownfor string-coercion contexts (interpolation, puts)Removed global configuration: Deleted
Markbridge::Configurationclass and all singleton methodsMarkbridge.configuration,Markbridge.configure,Markbridge.reset_defaults!Markbridge.default_handlers,Markbridge.default_html_handlers,Markbridge.default_text_formatter_handlers,Markbridge.default_tag_libraryRenderer factory: Added
Markbridge.discourse_renderer(...)to build reusableRendererinstances with customizationstags:,tag_library:,unregister:,escaper:,escape:,escape_hard_line_breaks:,allow:,postprocessor:tag_library:,escaper:,escape_hard_line_breaks:parametersSimplified method signatures: All
*_to_markdownmethods andconvertnow accept only:handlers:— parser handler registry (optional, uses default)renderer:— pre-built Renderer (optional, creates fresh default)raise_on_error:— boolean (defaulttrue)Markbridge.renderaccepts aParseor anAST::Node: when given aParse, the resultingConversioncarries the parser'sunknown_tags,diagnostics, and sourceformatforward; when given an AST node those fields default to empty /:discourse(current behavior). Useful for the parse → mutate → render flow.Handler registry enhancements: Added
#overlaymethod to BBCode, HTML, and TextFormatter registries for wrapping existing handlersTagLibrary#unregister: New method to remove tag bindings, enabling auto-passthrough for unregistered AST classes
Postprocessor: New
Markbridge::Renderers::Discourse::Postprocessorclass that cleans up rendered markdown (collapses excess newlines, strips whitespace-only lines, trims document edges)Emissions tracking:
Renderernow tracks side-channel emissions viainterface.emit(:type, data)and exposes them via#emissionsandConversion#emissionsMarkdownEscaper#allow:for selective passthrough: replaces the subclass-and-override pattern. Recognised keys::bullet_list,:ordered_list,:atx_heading,:block_quote. Alias:listsexpands to[:bullet_list, :ordered_list]. Unknown keys raiseArgumentError. Marker bytes pass through verbatim while inline content after the marker is still escaped.IdentityEscaper+escape: falsesugar: new no-op escaper class for migration paths where source content is already trusted Markdown.Markbridge.discourse_renderer(escape: false)swaps inIdentityEscaper. Mutually exclusive withescape_hard_line_breaks:/allow:; an explicitescaper:always wins.MediaWiki API consistency: Renamed
inline_tag_registry:parameter tohandlers:for parity with other parsersTextFormatter handler signature: All TextFormatter handlers now accept
processor:parameter for consistencyNotable Implementation Details
ParseandConversionuseData.define(frozen, structural equality)raise_on_error: falsemode allows per-row failure isolation in batch migrationswith_provisional_emissions { }snapshots the buffer forTableTag's throwaway Markdown-vs-HTML passMarkdownEscaper#allow:storage is a private Array (≤ 4 elements) —#include?is observably identical toSet#include?at this size, no allocationexamples/forum_migration.rb) demonstrates the migration API end-to-end:overlay,emit,tags:,unregister:,allow: :lists, the AST-mutation block,Conversion#emissions,raise_on_error: false, and theconvert(format:)dispatcherTests
Conversion#emissions,Renderer#emissions,#overlay,#unregister,Postprocessor,MarkdownEscaper#allow:,IdentityEscaper, the polymorphicMarkbridge.render(Parse|AST), and the AST-mutation block formmain(1994+ mutations, 0 alive on the local pass)https://claude.ai/code/session_01X7TU9zitHfqv6HmszqgfYa