Replace AWK/shell docs preprocessing with AsciidoctorJ extension#3418
Replace AWK/shell docs preprocessing with AsciidoctorJ extension#3418Cole-Greer wants to merge 6 commits into3.8-devfrom
Conversation
Introduce the gremlin-docs module, a custom AsciidoctorJ TreeProcessor extension that replaces the entire AWK/shell preprocessing pipeline (10 AWK scripts, 5 shell scripts) used to build TinkerPop documentation. The new system executes [gremlin-groovy] code blocks in an embedded GremlinGroovyScriptEngine during Asciidoctor rendering, eliminating the need for a running Gremlin Server, Hadoop daemons, or the Gremlin Console distribution at build time. Key features: - Embedded execution of gremlin code blocks with live query results - Auto-generated language variant tabs (Java, Python, JavaScript, C#, Go) using the ANTLR-based GremlinTranslator infrastructure - Standalone tab group support for manually-authored [source,LANG,tab] blocks - Hadoop/Spark example support via local-mode Spark with sandboxed HDFS - Console utility functions (describeGraph) loaded from gremlin-console - GremlinPlugin SPI loading for hadoop-gremlin/spark-gremlin imports and bindings - Syntax highlighting via highlight.js 11.9.0 (replaces CodeRay) - Console command detection (:remote, :>, :submit) for static rendering - Multi-line statement joining with bracket balancing - Callout marker preservation through execution New files: - gremlin-docs/ - Maven module (not in reactor, built separately) - GremlinDocsExtension.java - SPI entry point for AsciidoctorJ - GremlinTreeProcessor.java - Main TreeProcessor that walks the AST - GremlinExecutor.java - Embedded script engine wrapper - VariantTranslator.java - GremlinTranslator wrapper for all GLVs - GremlinExecutorTest.java - Unit tests - bin/process-docs-new.sh - New build entry point Root pom.xml changes: - Added gremlin-docs as asciidoctor-maven-plugin dependency - Switched source-highlighter from coderay to highlightjs 11.9.0 - Added highlightjs-languages for groovy support - Added tabs-1 CSS rule for single-tab blocks Usage: bin/process-docs-new.sh # full build bin/process-docs-new.sh --dry-run # skip gremlin execution Assisted-by: Kiro:claude-opus-4.6
Replace the embedded GremlinGroovyScriptEngine (GremlinExecutor) with ConsoleExecutor, which launches bin/gremlin.sh as a long-running subprocess and communicates via stdin/stdout. This provides the full console environment including correct result formatting, Sugar plugin, SPARQL, Neo4j, and remote connection support — resolving PR known issues 2-6. Key changes: - New ConsoleExecutor with per-statement sentinel protocol that handles the console's 'Display stack trace? [yN]' error prompt robustly - GremlinTreeProcessor reads console-home and hadoop-libs from document attributes; falls back to dry-run when no console is available - process-docs.sh builds console distribution, installs plugins, starts Gremlin Server for remote examples, and sets up conf/hadoop in CONSOLE_HOME (where the console process resolves relative paths) - Port 8182 conflict detection before starting Gremlin Server - Dropped 5 heavy compile dependencies from gremlin-docs (gremlin-groovy, tinkergraph, hadoop-gremlin, spark-gremlin, gremlin-console) — only gremlin-core needed for ANTLR-based variant translation Assisted-by: Kiro:claude-sonnet-4-20250514
- Add docs/src/docinfo-footer.html that configures highlight.js with ignoreUnescapedHTML:true, preventing it from destroying <b class=conum> callout markers inside code blocks (fixes PR known issue 1) - Restore bin/gephi-mock.py and start it in process-docs.sh for Gephi plugin doc examples that send HTTP requests to localhost:8080 - Exclude neo4j-gremlin from default plugin list to avoid classpath conflicts with spark-gremlin (Neo4j examples fall back to dry-run) - Update development-environment.asciidoc to document that full builds require built Console and Server distributions Assisted-by: Kiro:claude-sonnet-4-20250514
Replace client-side highlight.js with server-side Rouge syntax highlighter. Rouge runs at build time during Asciidoctor rendering, producing static pre-colored HTML. This eliminates the callout marker destruction issue (highlight.js was stripping <i class=conum> elements from code blocks), removes the CDN dependency, and adds C# syntax highlighting support (which CodeRay lacked and was the original reason for switching to highlight.js). Rouge is bundled in AsciidoctorJ 2.5.8 but requires SnakeYAML 1.x for JRuby's Psych YAML extension. The project uses SnakeYAML 2.0, so a 1.33 override is added scoped to the asciidoctor-maven-plugin. - Change source-highlighter from highlightjs to rouge in all 12 asciidoctor executions in pom.xml - Remove highlightjsdir and highlightjs-languages attributes - Add SnakeYAML 1.33 plugin dependency to fix Rouge/Psych compatibility - Delete docs/src/docinfo-footer.html (highlight.js workarounds) Assisted-by: Kiro:claude-sonnet-4-20250514
The EXIT trap's kill/wait on the Gremlin Server and Gephi mock returned non-zero (the killed process's exit status), which propagated as the script's exit code due to set -e. This caused publish-docs.sh to abort at 'bin/process-docs.sh || exit 1'. Adding || true to kill/wait ensures the cleanup always succeeds. Assisted-by: Kiro:claude-sonnet-4-20250514
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
| xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> | ||
| <modelVersion>4.0.0</modelVersion> | ||
| <parent> |
There was a problem hiding this comment.
If this is a standalone utility/tool, should it use TinkerPop as a parent? Could we tuck gremlin-docs away in the docs/ directory? Perhaps even better would be to call it "tinkerpop-docs" to not confuse it with our standard "gremlin" prefixing/suffixing?
bigger thinking - i think there's going to be more and more of these types of little tools and things for various bits of automation and such. not sure where those things should live. i think docs/upgrade was one of those to start it. maybe the examples too. they all sorta build outside of the primary maven reactor. Do we need something else that ties these together better? almost need a top-level command system of sorts to keep all these tools and scripts unified somehow.
Overview
This PR introduces the gremlin-docs module — a custom AsciidoctorJ TreeProcessor extension that replaces the entire AWK/shell preprocessing pipeline (10 AWK scripts, 5 shell scripts) used to build TinkerPop documentation.
The new system delegates Gremlin code execution to a real Gremlin Console process via stdin/stdout, then generates language variant tabs using the ANTLR-based GremlinTranslator. This provides the full console environment (correct result formatting, Sugar plugin, SPARQL, Neo4j, remote connections) while
eliminating the AWK pipeline and external Hadoop dependency.
This PR is currently in draft state to collect early feedback. There are several known issues which need to be resolved, and I suspect more issues may be found upon further inspections of the docs.
Architecture
What it does
How to use
bash
Prerequisites: build console and server distributions
mvn clean install -pl :gremlin-server,:gremlin-console -am -DskipTests
Full build with live gremlin execution
bin/process-docs.sh
Dry-run: skip execution (fast, for layout checks — no console/server needed)
bin/process-docs.sh --dry-run
Changes
Known issues
Neo4j examples fall back to dry-run output. The neo4j-gremlin plugin is excluded from the default plugin list because its Spark jars conflict with spark-gremlin on the classpath. The old system handled this by swapping plugins per-document file; the new system processes the entire reference book as one
document, making per-file swapping impractical. Neo4j is deprecated in TinkerPop.
A few Hadoop hdfs.head() calls may fail. The hdfs.head('output/~g') and hdfs.head('output', GryoInputFormat) calls may produce different output in local-mode Spark vs a real cluster. This was also an issue in the old system.
Not changed