-
Notifications
You must be signed in to change notification settings - Fork 1
Add test suite, CI workflow, and technical documentation #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
c21bdd8
3071a8d
973be36
5ee9fae
a265eb3
1aa4563
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| name: Tests | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ main, test-suite ] | ||
| pull_request: | ||
| branches: [ main ] | ||
|
|
||
| jobs: | ||
| test: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up OCaml | ||
| uses: ocaml/setup-ocaml@v3 | ||
| with: | ||
| ocaml-compiler: 5.1.1 | ||
| dune-cache: true | ||
| opam-repositories: | | ||
| default: https://github.com/ocaml/opam-repository.git | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get install -y npm xz-utils libomp-dev llvm-dev | ||
| opam install . --deps-only --update-invariant | ||
| npm install --no-save typescript browserify pug-lexer pug-parser pug-walk | ||
|
|
||
| - name: Install QuickJS | ||
| run: | | ||
| curl https://bellard.org/quickjs/quickjs-2021-03-27.tar.xz > quickjs.tar.xz | ||
| tar xvf quickjs.tar.xz && rm quickjs.tar.xz | ||
| mv quickjs-2021-03-27 quickjs | ||
| cd quickjs && make | ||
|
|
||
| - name: Install Flow | ||
| run: | | ||
| git clone --branch v0.183.1 --depth 1 https://github.com/facebook/flow.git flow | ||
| ln -s "$(pwd)/flow/src/parser" src/flow_parser | ||
| ln -s "$(pwd)/flow/src/third-party/sedlex" src/sedlex | ||
| ln -s "$(pwd)/flow/src/hack_forked/utils/collections" src/collections | ||
|
|
||
| - name: Run tests | ||
| run: | | ||
| mkdir -p strings | ||
| opam exec -- dune runtest tests/ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,3 +23,5 @@ bad/ | |
| src/flow_parser | ||
| src/sedlex | ||
| src/collections | ||
|
|
||
| tests/integration_test_run/ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| # Agent Information - String Extractor | ||
|
|
||
| This repository contains an OCaml-based internationalization (i18n) string extraction tool. It parses source files (JS, TS, Vue, Pug, HTML) and extracts strings for translation management. | ||
|
|
||
| ## Documentation | ||
|
|
||
| - **[ARCHITECTURE.md](ARCHITECTURE.md)**: Contains a deep-dive into the codebase layout, directory structure, and a comprehensive API reference. **Read this file first** when: | ||
| - Starting a new task to understand which files are relevant. | ||
| - Investigating the impact of changes across the system. | ||
| - Looking for specific functionality or function definitions before searching. | ||
|
|
||
| - **[DEVELOPMENT.md](DEVELOPMENT.md)**: Contains instructions for environment setup, build processes for various platforms, and release workflows. **Read this file first** when: | ||
| - Setting up the development environment or installing dependencies (OCaml, JS, QuickJS). | ||
| - Building the project for development or release. | ||
| - Executing the tool for manual verification or testing. | ||
| - Managing version numbers or release artifacts. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| - **Language**: OCaml (5.1.1) with some C++ (QuickJS bridge) and JavaScript (parsers via Browserify). | ||
| - **Architecture**: | ||
| - `src/cli/`: Main entry point, command-line interface, and output generation logic. | ||
| - `src/parsing/`: OCaml parsers using `Angstrom` for custom formats and `Flow_parser` for JS. | ||
| - `src/quickjs/`: Bridge to QuickJS to run JavaScript-based parsers (TypeScript/Pug) from OCaml. | ||
| - `src/utils/`: Common utilities for collection, timing, and I/O. | ||
| - **Key Libraries**: `Core`, `Eio` (concurrency), `Angstrom` (parsing), `Yojson`, `Ppx_jane`. | ||
|
|
||
| ## Essential Commands | ||
|
|
||
| ### Build | ||
| - **Development build**: `dune build src/cli/strings.exe` | ||
| - **Watch mode**: `dune build src/cli/strings.exe -w` | ||
| - **Release build (MacOS)**: `DUNE_PROFILE=release dune build src/cli/strings.exe` | ||
| - **Full release cycle**: See `DEVELOPMENT.md` for `cp`, `strip`, and Docker commands. | ||
|
|
||
| ### Run | ||
| - After building: `./_build/default/src/cli/strings.exe [directory-to-extract-from]` | ||
| - The CLI expects to be run from the root of a project containing a `strings/` directory (or it will create one if a `.git` folder is present). | ||
|
|
||
| ### Installation (Dev Setup) | ||
| Refer to `DEVELOPMENT.md` for specific `opam` and `npm` setup steps, as the project has several external dependencies (Flow, QuickJS, pug-lexer, etc.). | ||
|
|
||
| ## Code Conventions & Patterns | ||
|
|
||
| ### Parsing Strategy | ||
| 1. **Direct Parsers**: Simple formats like `.strings`, `HTML`, and basic `Vue` tags are parsed using `Angstrom` in `src/parsing/`. | ||
| 2. **JS/TS Parsing**: | ||
| - Javascript uses `Flow_parser` and a custom AST walker in `src/parsing/js_ast.ml`. | ||
| - TypeScript uses the official TS parser running inside QuickJS (`src/quickjs/`). | ||
| 3. **Pug Parsing**: Has a "fast" OCaml implementation (`src/parsing/pug.ml`) and a "slow" official Pug implementation via QuickJS (`src/quickjs/`). | ||
|
|
||
| ### Extraction Pattern | ||
| - Content is extracted into a `Utils.Collector.t`. | ||
| - The collector tracks found strings, potential scripts (to be further parsed), and file errors. | ||
| - **Convention**: Strings found inside `L("...")` calls are treated as translations in JS/TS. | ||
|
|
||
| ### Concurrency | ||
| - Uses OCaml 5's `Eio` for multicore processing. | ||
| - Parallel traversal of directories is handled in `src/cli/strings.ml` via `Fiber.List.iter` and `Eio.Executor_pool`. | ||
| - JS workers (QuickJS) are managed via a pool in `src/quickjs/quickjs.ml`. | ||
|
|
||
| ## Important Gotchas | ||
|
|
||
| - **QuickJS Dependency**: Requires a compiled `quickjs` directory at the project root for building. `dune` rules in `src/quickjs/dune` copy headers and libraries from there. | ||
| - **Generated Headers**: `src/quickjs/runtime.h` is generated from `src/quickjs/parsers.js` using `browserify` and `qjsc`. | ||
| - **Linking**: MacOS builds use specific link flags (e.g., `ld64.lld`) defined in `src/cli/link_flags.*`. | ||
| - **OCamlFormat**: `.ocamlformat` is present; ensure you format OCaml code before submitting. | ||
| - **Memory Safety**: Be cautious with C++ FFI code in `src/quickjs/quickjs.cpp`, particularly regarding OCaml's GC interaction (`CAMLparam`, `CAMLreturn`, `caml_release_runtime_system`). | ||
|
|
||
| ## Testing Approach | ||
|
|
||
| - **Inline Tests**: The project uses `ppx_inline_test`. Parsers in `src/parsing/` can be tested directly within the OCaml files or in the `tests/` directory. | ||
| - **Test Suite**: A standard test suite is located in `tests/test_runner.ml`. It covers JS, HTML, Pug, and `.strings` file parsing. | ||
| - **Integration Tests**: Verification can be performed by running the built binary against fixtures in `tests/fixtures/` and checking the generated output in the `strings/` directory. | ||
| - **Debug Flags**: Use `--show-debugging` or `--debug-pug` / `--debug-html` flags in the CLI to inspect internal parsing results. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### "File modified since last read" | ||
| If you receive an error stating that a file has been **"modified since it was last read"**, it usually indicates a discrepancy between the file's filesystem timestamp and the internal state tracking. | ||
|
|
||
| **Example Error:** | ||
| > `Edit failed: The file '/path/to/file' was modified since it was last read. Please read the file again before trying to edit it.` | ||
|
|
||
| **Recommended Fix:** | ||
| 1. Execute `touch filename` to reset the file's modification time to the current system time. | ||
| 2. Re-read the file using the `view` tool. | ||
| 3. Attempt the edit again. | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| # Architecture Documentation - String Extractor | ||
|
|
||
| This document provides a high-level overview of the String Extractor's architecture, directory structure, and internal APIs. | ||
|
|
||
| ## Project Entry Point | ||
|
|
||
| The main entry point of the application is **`src/cli/strings.ml`**. It handles command-line argument parsing using `Core.Command`, sets up the `Eio` runtime, and initiates the file traversal process. | ||
|
|
||
| ## Directory Structure | ||
|
|
||
| ```text | ||
| / | ||
| ├── src/ | ||
| │ ├── cli/ # Main CLI application logic | ||
| │ │ ├── strings.ml # CLI entry point, traversal coordination | ||
| │ │ ├── vue.ml # Vue-specific parsing and extraction logic | ||
| │ │ └── generate.ml # Localization file generation (.strings, .json) | ||
| │ ├── parsing/ # Core parsers using Angstrom and Flow | ||
| │ │ ├── basic.ml # Common parsing utilities and combinators | ||
| │ │ ├── js_ast.ml # Flow AST walker for string extraction | ||
| │ │ ├── js.ml # JavaScript string extraction entry point | ||
| │ │ ├── pug.ml # Native Pug template parsing | ||
| │ │ ├── html.ml # HTML template parsing | ||
| │ │ ├── strings.ml # .strings file parsing logic | ||
| │ │ └── ... # Other specialized parsers (vue blocks, styles) | ||
| │ ├── quickjs/ # Interface to QuickJS for JS/TS/Pug parsing | ||
| │ │ ├── quickjs.ml # OCaml FFI to QuickJS | ||
| │ │ ├── quickjs.cpp # C++ implementation of the bridge | ||
| │ │ └── parsers.js # JS-based parsers running in QuickJS | ||
| │ └── utils/ # Shared utility modules | ||
| │ ├── collector.ml # State container for collected strings/errors | ||
| │ ├── io.ml # I/O helpers | ||
| │ ├── timing.ml # Performance measurement | ||
| │ └── exception.ml # Exception handling | ||
| ├── strings/ # Directory where .strings files are managed | ||
| ├── dune-project # Dune build system configuration | ||
| └── README.md # Project overview and usage instructions | ||
| ``` | ||
|
|
||
| ## Core API Reference | ||
|
|
||
| ### `src/cli/` | ||
| - **`Strings.main`**: Coordinates the entire run, including directory traversal and result generation. | ||
| - **`Vue.parse`**: Splits a `.vue` file into its constituent parts (template, script, style). | ||
| - **`Generate.write_english`**: Creates `english.strings` and `english.json` from the collected strings. | ||
| - **`Generate.write_other`**: Updates existing translations for other languages. | ||
|
|
||
| ### `src/parsing/` | ||
| - **`Parsing.Basic`**: Provides foundational Angstrom parsers for whitespace, strings, and standard error handling. | ||
| - **`Parsing.Js.extract_to_collector`**: Entry point for scanning JavaScript source code. | ||
| - **`Parsing.Js_ast.extract`**: A comprehensive walker for the Flow AST that identifies and extracts strings from `L("...")` calls. | ||
| - **`Parsing.Pug.collect`**: Traverses the native Pug AST to extract strings. | ||
| - **`Parsing.Strings.parse`**: Parses existing `.strings` files into a lookup table. | ||
|
|
||
| ### `src/quickjs/` | ||
| - **`Quickjs.extract_to_collector`**: Offloads extraction to QuickJS for TypeScript and advanced Pug templates. | ||
|
|
||
| ### `src/utils/` | ||
| - **`Utils.Collector.create`**: Initializes a new string collection state for a specific file. (type `t = { path: string; strings: string Queue.t; ... }`) | ||
| - **`Utils.Collector.blit_transfer`**: Merges results from one collector into another. | ||
|
|
||
| ## Control Flow | ||
| 1. **Initiation**: `strings.exe` starts, parses CLI flags, and identifies the target directory. | ||
| 2. **Traversal**: Uses `Eio` to recursively walk the directory tree. | ||
| 3. **Dispatch**: For each supported file extension, the corresponding parser in `src/parsing` is invoked. | ||
| 4. **Collection**: Parsers find strings (usually inside `L()`) and add them to a `Collector.t`. | ||
| 5. **Generation**: `Generate.ml` aggregates strings from all collectors and updates the `strings/` directory. | ||
|
|
||
| ## Testing Setup | ||
|
|
||
| The project implements a multi-layered testing strategy: | ||
|
|
||
| 1. **Inline Tests**: Using `ppx_inline_test`, logic can be tested directly within the source files. This is primarily used for parser validation in `src/parsing/`. | ||
| 2. **Standard Test Suite**: Located in `tests/test_runner.ml`, this suite uses `ppx_expect` and `ppx_assert` to verify: | ||
| - JavaScript string extraction via `Flow_parser`. | ||
| - HTML and Pug extraction via `SZXX` and `Angstrom`. | ||
| - Apple-style `.strings` file parsing. | ||
| 3. **Integration Testing**: The `tests/fixtures/` directory contains sample files of all supported types. The CLI can be run against these fixtures to verify end-to-end extraction and output generation (`.strings` and `.json` files). | ||
|
|
||
| The `tests/dune` file configures the test library and enables inline tests for the module. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| () |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| (library | ||
| (name parsing_tests) | ||
| (inline_tests) | ||
| (libraries parsing utils core eio_main) | ||
| (preprocess (pps ppx_jane ppx_inline_test)) | ||
| ) | ||
|
|
||
| (rule | ||
| (alias runtest) | ||
| (deps | ||
| ../src/cli/strings.exe | ||
| (source_tree fixtures)) | ||
| (action | ||
| (bash " | ||
| TMP_DIR=\"integration_test_run\" | ||
| rm -rf $TMP_DIR | ||
| mkdir -p $TMP_DIR/strings | ||
| mkdir -p $TMP_DIR/.git | ||
| printf '\"Hello from HTML\" = \"Bonjour de HTML\";\n' > $TMP_DIR/strings/french.strings | ||
| cp -r fixtures $TMP_DIR/ | ||
| cd $TMP_DIR | ||
| ../../src/cli/strings.exe fixtures --output strings &> /dev/null | ||
| cd .. | ||
|
|
||
| if ! grep -q \"Bonjour de HTML\" $TMP_DIR/strings/french.strings; then | ||
| echo \"Error: French translation lost in .strings\" | ||
| exit 1 | ||
| fi | ||
| if ! grep -q \"Bonjour de HTML\" $TMP_DIR/strings/french.json; then | ||
| echo \"Error: French translation lost in .json\" | ||
| exit 1 | ||
| fi | ||
| if ! grep -q \"MISSING TRANSLATION - demo.pug\" $TMP_DIR/strings/french.strings; then | ||
| echo \"Error: Missing translation marker not found in .strings\" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo \"✅ French integration test passed\" | ||
| rm -rf $TMP_DIR | ||
|
|
||
| # Help user populate root strings/ if it exists | ||
| # We use absolute paths to ensure we hit the real source directory | ||
| # even if sandboxed in _build/default/tests | ||
| # The dune-project is used as a landmark for the root | ||
| # Traverse up to find the root of the source tree | ||
| # In dune, we are at _build/default/tests | ||
| ROOT_SRC=\"$(cd ../../.. && pwd)\" | ||
| if [ -d \"$ROOT_SRC/strings\" ]; then | ||
| # Extraction generates 5 strings. | ||
| # We pre-populate 3 translations. | ||
| printf '\"Hello from HTML\" = \"Bonjour de HTML\";\n\"Hello from JS\" = \"Bonjour de JS\";\n\"Hello from Pug\" = \"Bonjour de Pug\";\n' > \"$ROOT_SRC/strings/french.strings\" | ||
| ./../src/cli/strings.exe \"$ROOT_SRC/tests/fixtures\" --output \"$ROOT_SRC/strings\" | ||
| fi | ||
|
Comment on lines
+47
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 Integration test overwrites real strings/french.strings in repo root (data loss) when strings/ exists The New code: ROOT_SRC="$(cd ../../.. && pwd)"
if [ -d "$ROOT_SRC/strings" ]; then
printf '...\n' > "$ROOT_SRC/strings/french.strings"
./../src/cli/strings.exe "$ROOT_SRC/tests/fixtures" --output "$ROOT_SRC/strings"
fiActual behavior: running Impact: potential data loss / corruption of translation files during routine test runs; also makes tests non-reproducible depending on whether Recommendation: Remove this side-effecting block, or gate it behind an explicit env var (e.g. Was this helpful? React with 👍 or 👎 to provide feedback.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: I did this for my convenience to verify that the test command produced the correct output. @SGrondin If you prefer some other location feel free to modify as you see fit.
Comment on lines
+41
to
+53
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Integration test helper computes repo root incorrectly (lands in _build instead of source root) In # In dune, we are at _build/default/tests
ROOT_SRC="$(cd ../../.. && pwd)"From Actual: helper block is effectively skipped even when the real repo-root Impact: reduces usefulness of the test rule for developers; does not typically break CI because the helper is conditional. Recommendation: Use dune variables instead of relative traversal, e.g. Was this helpful? React with 👍 or 👎 to provide feedback. |
||
| "))) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| <i18n>Hello from HTML</i18n> |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| L('Hello from JS'); |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| i18n Hello from Pug |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| <template> | ||
| <i18n>Hello from Vue Template</i18n> | ||
| </template> | ||
|
|
||
| <script> | ||
| export default { | ||
| data() { | ||
| return { | ||
| msg: L('Hello from Vue Script') | ||
| } | ||
| } | ||
| } | ||
| </script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 QuickJS dune rules cd to %{project_root}/../../ causing build to run outside repo
The
src/quickjs/dunerulescdto%{project_root}/../../before runningbrowserify, and also reference QuickJS tools/headers via%{project_root}/../../quickjs/....%{project_root}is already the repository root, so appending/../../moves the working directory two levels above the repo. That makes paths likesrc/quickjs/parsers.jsandquickjs/qjscresolve to non-existent locations.Example (new code):
src/quickjs/parsers.jsis expected to be in the repo root, but aftercdit will not be.Actual behavior: dune rules fail during build/test (cannot find
src/quickjs/parsers.js/quickjs/qjsc/quickjs.h/libquickjs.a).Expected behavior: these rules should run within the repo root (or otherwise reference repo-root-relative paths correctly).
Impact: CI and local builds that rely on these rules will break (QuickJS bridge can’t build, which typically blocks TypeScript/Pug parsing and may prevent the entire binary from linking).
(Refers to lines 24-50)
Recommendation: Remove the
../../hops and operate from%{project_root}(repo root), e.g.cd %{project_root}and use%{project_root}/quickjs/...paths. Alternatively, avoidcdand pass absolute%{project_root}-based paths directly tobrowserify/qjsc/cp.Was this helpful? React with 👍 or 👎 to provide feedback.