Skip to content

Replace AWK/shell docs preprocessing with AsciidoctorJ extension#3418

Draft
Cole-Greer wants to merge 6 commits into3.8-devfrom
docsBuild-3.8
Draft

Replace AWK/shell docs preprocessing with AsciidoctorJ extension#3418
Cole-Greer wants to merge 6 commits into3.8-devfrom
docsBuild-3.8

Conversation

@Cole-Greer
Copy link
Copy Markdown
Contributor

@Cole-Greer Cole-Greer commented May 3, 2026

Overview

This PR introduces the gremlin-docs module — a custom AsciidoctorJ TreeProcessor extension that replaces the entire AWK/shell preprocessing pipeline (10 AWK scripts, 5 shell scripts) used to build TinkerPop documentation.

The new system delegates Gremlin code execution to a real Gremlin Console process via stdin/stdout, then generates language variant tabs using the ANTLR-based GremlinTranslator. This provides the full console environment (correct result formatting, Sugar plugin, SPARQL, Neo4j, remote connections) while
eliminating the AWK pipeline and external Hadoop dependency.

This PR is currently in draft state to collect early feedback. There are several known issues which need to be resolved, and I suspect more issues may be found upon further inspections of the docs.

Architecture

  • GremlinTreeProcessor — AsciidoctorJ TreeProcessor that walks the document AST, finds [gremlin-groovy] blocks, orchestrates execution and translation, and builds tabbed HTML output
  • ConsoleExecutor — Launches bin/gremlin.sh as a long-running subprocess, communicates via stdin/stdout with per-statement sentinel boundaries. Handles the console's "Display stack trace?" error prompt robustly via double-sentinel protocol
  • VariantTranslator — Wraps GremlinTranslator to produce Java, Python, JavaScript, C#, and Go tabs for every translatable Gremlin example
  • process-docs.sh — Build entry point that installs console plugins, starts Gremlin Server (for remote examples) and Gephi mock (for visualization examples), and passes configuration to the AsciidoctorJ extension

What it does

  • Executes [gremlin-groovy] code blocks via the real Gremlin Console, producing live console output with correct gremlin> prompts, ==> results, and proper datatype formatting (Path, Map, etc.)
  • Auto-generates language variant tabs (Java, Python, JavaScript, C#, Go) for every translatable Gremlin example using the ANTLR-based GremlinTranslator
  • Supports standalone tab groups for manually-authored [source,LANG,tab] blocks
  • Syntax highlighting via Rouge (server-side, build-time) — replaces both CodeRay (no C# support) and highlight.js (destroyed callout markers)
  • Callout numbers render correctly as black circled numbers in code blocks

How to use

bash

Prerequisites: build console and server distributions

mvn clean install -pl :gremlin-server,:gremlin-console -am -DskipTests

Full build with live gremlin execution

bin/process-docs.sh

Dry-run: skip execution (fast, for layout checks — no console/server needed)

bin/process-docs.sh --dry-run

Changes

  • New gremlin-docs/ module (not in the Maven reactor — built separately by the build script)
  • New ConsoleExecutor.java — per-statement stdin/stdout protocol with the Gremlin Console
  • New VariantTranslator.java — ANTLR-based translation to all GLV languages
  • Updated bin/process-docs.sh — builds extension, installs plugins, starts server/Gephi mock
  • Restored bin/gephi-mock.py — simple HTTP mock for Gephi visualization examples
  • Root pom.xml: switched syntax highlighter to Rouge, added SnakeYAML 1.33 override for Rouge/Psych compatibility
  • Removed old AWK/shell pipeline (10 AWK scripts, 5 shell scripts, gephi-mock)
  • Updated development-environment.asciidoc to document new build requirements

Known issues

Neo4j examples fall back to dry-run output. The neo4j-gremlin plugin is excluded from the default plugin list because its Spark jars conflict with spark-gremlin on the classpath. The old system handled this by swapping plugins per-document file; the new system processes the entire reference book as one
document, making per-file swapping impractical. Neo4j is deprecated in TinkerPop.

A few Hadoop hdfs.head() calls may fail. The hdfs.head('output/~g') and hdfs.head('output', GryoInputFormat) calls may produce different output in local-mode Spark vs a real cluster. This was also an issue in the old system.

Not changed

  • All AsciiDoc source files are unchanged (no doc content modifications)
  • Document structure, sections, anchors, and static content are preserved

Cole-Greer added 6 commits May 2, 2026 22:24
Introduce the gremlin-docs module, a custom AsciidoctorJ TreeProcessor
extension that replaces the entire AWK/shell preprocessing pipeline
(10 AWK scripts, 5 shell scripts) used to build TinkerPop documentation.

The new system executes [gremlin-groovy] code blocks in an embedded
GremlinGroovyScriptEngine during Asciidoctor rendering, eliminating the
need for a running Gremlin Server, Hadoop daemons, or the Gremlin Console
distribution at build time.

Key features:
- Embedded execution of gremlin code blocks with live query results
- Auto-generated language variant tabs (Java, Python, JavaScript, C#, Go)
  using the ANTLR-based GremlinTranslator infrastructure
- Standalone tab group support for manually-authored [source,LANG,tab] blocks
- Hadoop/Spark example support via local-mode Spark with sandboxed HDFS
- Console utility functions (describeGraph) loaded from gremlin-console
- GremlinPlugin SPI loading for hadoop-gremlin/spark-gremlin imports and bindings
- Syntax highlighting via highlight.js 11.9.0 (replaces CodeRay)
- Console command detection (:remote, :>, :submit) for static rendering
- Multi-line statement joining with bracket balancing
- Callout marker preservation through execution

New files:
- gremlin-docs/ - Maven module (not in reactor, built separately)
  - GremlinDocsExtension.java - SPI entry point for AsciidoctorJ
  - GremlinTreeProcessor.java - Main TreeProcessor that walks the AST
  - GremlinExecutor.java - Embedded script engine wrapper
  - VariantTranslator.java - GremlinTranslator wrapper for all GLVs
  - GremlinExecutorTest.java - Unit tests
- bin/process-docs-new.sh - New build entry point

Root pom.xml changes:
- Added gremlin-docs as asciidoctor-maven-plugin dependency
- Switched source-highlighter from coderay to highlightjs 11.9.0
- Added highlightjs-languages for groovy support
- Added tabs-1 CSS rule for single-tab blocks

Usage:
  bin/process-docs-new.sh              # full build
  bin/process-docs-new.sh --dry-run    # skip gremlin execution

Assisted-by: Kiro:claude-opus-4.6
Replace the embedded GremlinGroovyScriptEngine (GremlinExecutor) with
ConsoleExecutor, which launches bin/gremlin.sh as a long-running subprocess
and communicates via stdin/stdout. This provides the full console environment
including correct result formatting, Sugar plugin, SPARQL, Neo4j, and remote
connection support — resolving PR known issues 2-6.

Key changes:
- New ConsoleExecutor with per-statement sentinel protocol that handles
  the console's 'Display stack trace? [yN]' error prompt robustly
- GremlinTreeProcessor reads console-home and hadoop-libs from document
  attributes; falls back to dry-run when no console is available
- process-docs.sh builds console distribution, installs plugins, starts
  Gremlin Server for remote examples, and sets up conf/hadoop in
  CONSOLE_HOME (where the console process resolves relative paths)
- Port 8182 conflict detection before starting Gremlin Server
- Dropped 5 heavy compile dependencies from gremlin-docs (gremlin-groovy,
  tinkergraph, hadoop-gremlin, spark-gremlin, gremlin-console) — only
  gremlin-core needed for ANTLR-based variant translation

Assisted-by: Kiro:claude-sonnet-4-20250514
- Add docs/src/docinfo-footer.html that configures highlight.js with
  ignoreUnescapedHTML:true, preventing it from destroying <b class=conum>
  callout markers inside code blocks (fixes PR known issue 1)
- Restore bin/gephi-mock.py and start it in process-docs.sh for Gephi
  plugin doc examples that send HTTP requests to localhost:8080
- Exclude neo4j-gremlin from default plugin list to avoid classpath
  conflicts with spark-gremlin (Neo4j examples fall back to dry-run)
- Update development-environment.asciidoc to document that full builds
  require built Console and Server distributions

Assisted-by: Kiro:claude-sonnet-4-20250514
Replace client-side highlight.js with server-side Rouge syntax highlighter.
Rouge runs at build time during Asciidoctor rendering, producing static
pre-colored HTML. This eliminates the callout marker destruction issue
(highlight.js was stripping <i class=conum> elements from code blocks),
removes the CDN dependency, and adds C# syntax highlighting support
(which CodeRay lacked and was the original reason for switching to
highlight.js).

Rouge is bundled in AsciidoctorJ 2.5.8 but requires SnakeYAML 1.x for
JRuby's Psych YAML extension. The project uses SnakeYAML 2.0, so a 1.33
override is added scoped to the asciidoctor-maven-plugin.

- Change source-highlighter from highlightjs to rouge in all 12
  asciidoctor executions in pom.xml
- Remove highlightjsdir and highlightjs-languages attributes
- Add SnakeYAML 1.33 plugin dependency to fix Rouge/Psych compatibility
- Delete docs/src/docinfo-footer.html (highlight.js workarounds)

Assisted-by: Kiro:claude-sonnet-4-20250514
The EXIT trap's kill/wait on the Gremlin Server and Gephi mock returned
non-zero (the killed process's exit status), which propagated as the
script's exit code due to set -e. This caused publish-docs.sh to abort
at 'bin/process-docs.sh || exit 1'. Adding || true to kill/wait ensures
the cleanup always succeeds.

Assisted-by: Kiro:claude-sonnet-4-20250514
Comment thread gremlin-docs/pom.xml
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a standalone utility/tool, should it use TinkerPop as a parent? Could we tuck gremlin-docs away in the docs/ directory? Perhaps even better would be to call it "tinkerpop-docs" to not confuse it with our standard "gremlin" prefixing/suffixing?

bigger thinking - i think there's going to be more and more of these types of little tools and things for various bits of automation and such. not sure where those things should live. i think docs/upgrade was one of those to start it. maybe the examples too. they all sorta build outside of the primary maven reactor. Do we need something else that ties these together better? almost need a top-level command system of sorts to keep all these tools and scripts unified somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants