mobilityDCAT-AP · atibaut · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026
diff --git a/.gitignore b/.gitignore
@@ -24,5 +24,6 @@ Thumbs.db
 
 # Validation outputs
 reports/
+/logs/
 *.log
 validation_results*.txt
diff --git a/README.md b/README.md
@@ -20,6 +20,80 @@ cd validation
 uv sync  # or: docker build -t mobilitydcat-validator .
 ```
 
+## CLI Usage
+
+Two separate workflows are available:
+- Universal validation: `scripts/validate.py`
+- Suite testing (positives/negatives expected outcomes): `scripts/validate_suite.py`
+
+Run validator help:
+```bash
+uv run scripts/validate.py --help
+```
+
+Run suite tester help:
+```bash
+uv run scripts/validate_suite.py --help
+```
+
+Minimal default run:
+```bash
+uv run scripts/validate.py
+```
+
+Defaults used in minimal run:
+- `--data data/`
+- `--shacl shacl/`
+- `--vocab sample_data/vocabularies/`
+- `--report-file logs/validation-report.txt`
+
+Example (directory validation):
+```bash
+uv run scripts/validate.py \
+	--data sample_data/ \
+	--shacl shacl/
+```
+
+Optional tuning example:
+```bash
+uv run scripts/validate.py \
+	--data sample_data/ \
+	--shacl shacl/ \
+	--timeout 30 \
+	--max-files-report 50
+```
+
+Report behavior:
+- Terminal output is compact by default for large runs.
+- Full violation details are always written to a report file.
+- Default report path (if `--report-file` is omitted): `logs/validation-report.txt`.
+
+Supported RDF serializations:
+- The validator accepts multiple RDF serializations in one run, including `.ttl`, `.rdf`, `.xml`, `.nt`, `.n3`, `.jsonld`, `.json`, `.trig`, and `.nq`.
+- You can point `--data` to a directory containing mixed formats; all supported files are discovered and validated.
+
+Common options:
+- `--data`: Input RDF file or directory
+- `--shacl`: SHACL file or directory
+- `--vocab`: Vocabulary stubs directory
+- `--verbose` / `--no-verbose`: Show or hide detailed violations in terminal
+- `--progress` / `--no-progress`: Per-file progress while validating directories
+- `--timeout`: Per-file validation timeout in seconds (`0` disables timeout)
+- `--max-files-report`: Safety option to cap VALID/INVALID terminal output on large runs (`50` default, `0` means unlimited)
+- `--report-file`: Path for full detailed validation report
+
+Suite testing workflow:
+- Use `scripts/validate_suite.py` for `positives`/`negatives` test folders.
+- Expected outcomes are inferred from directory names (`positives` => should conform, `negatives` => should violate).
+- Files outside those folder patterns are reported as unclassified and fail the suite run.
+
+Why `--vocab` is important:
+- Some SHACL checks rely on external controlled vocabularies being available as RDF resources at validation time.
+- Typical examples are EU File Type, EU Frequency, and mobility theme terms that are referenced by URI in datasets.
+- The validator loads all `.ttl` files from the `--vocab` directory and merges them into each data graph before running SHACL.
+- This prevents false violations caused by missing vocabulary resources during class/range checks.
+- In most cases you can use the default path; override `--vocab` only when validating against a different vocabulary source.
+
 ## Structure
 ```
 validation/

diff --git a/docs/README.md b/docs/README.md
@@ -2,6 +2,12 @@
 
 Validation suite for mobilityDCAT-AP 1.1.0 compliance using SHACL shapes.
 
+## Workflows
+
+This repository has two separate CLI workflows:
+- Universal validation: `scripts/validate.py` (pure SHACL conformance checks)
+- Suite testing: `scripts/validate_suite.py` (expects `positives`/`negatives` folder semantics)
+
 ## Quick Start
 ```bash
 # Install dependencies
@@ -14,28 +20,96 @@ uv run scripts/validate.py --data sample_data/baseline-dcat-ap/negatives/B-N-01-
 uv run scripts/validate.py --data sample_data/mobility/negatives/M-N-01-missing-mandatory-properties-dataset.ttl --shacl shacl/ -v
 ```
 
+## CLI Options
+
+Show all options:
+```bash
+uv run scripts/validate.py --help
+```
+
+Show suite tester options:
+```bash
+uv run scripts/validate_suite.py --help
+```
+
+Minimal default run:
+```bash
+uv run scripts/validate.py
+```
+
+Defaults used in minimal run:
+- `--data data/`
+- `--shacl shacl/`
+- `--vocab sample_data/vocabularies/`
+- `--report-file logs/validation-report.txt`
+
+Current key options:
+- `--data`: Path to RDF file or directory
+- `--shacl`: Path to SHACL file or directory
+- `--vocab`: Path to vocabulary stubs directory (default: `sample_data/vocabularies`)
+- `--verbose` / `--no-verbose`: Toggle detailed violation output in terminal
+- `--progress` / `--no-progress`: Toggle per-file progress output for directory validation
+- `--timeout`: Per-file timeout in seconds (`0` disables timeout)
+- `--max-files-report`: Safety option to cap VALID/INVALID terminal output and keep VS Code responsive on large runs (`0` means unlimited)
+- `--report-file`: Write full detailed report (default: `logs/validation-report.txt`)
+
+Why `--vocab` is important:
+- Several shapes expect terms from external controlled vocabularies to be present as RDF resources.
+- Common examples include EU file types, EU frequency values, and mobility theme concepts.
+- The validator reads all `.ttl` files from the `--vocab` folder and merges them into each input graph before validation.
+- This helps avoid false negatives/positives caused by unresolved vocabulary resources in class/range constraints.
+- Keep the default in normal runs; set a custom `--vocab` path when you need to validate against another vocabulary snapshot.
+
+Notes:
+- Terminal output is intentionally compact by default for stability on large runs.
+- Full violation details are written to the report file.
+
+Supported RDF serializations:
+- Validation supports multiple RDF serializations: `.ttl`, `.rdf`, `.xml`, `.nt`, `.n3`, `.jsonld`, `.json`, `.trig`, and `.nq`.
+- A single directory run can include mixed serializations; all supported files are discovered automatically.
+
+Example with explicit report file:
+```bash
+uv run scripts/validate.py \
+  --data sample_data/ \
+  --shacl shacl/ \
+  --report-file logs/validation-report.txt
+```
+
+Optional tuning example:
+```bash
+uv run scripts/validate.py \
+  --data sample_data/ \
+  --shacl shacl/ \
+  --timeout 30 \
+  --max-files-report 50 \
+  --report-file logs/validation-report-latest.txt
+```
+
 ## Run All Test Suites
+
+Use the dedicated suite runner so expectations are evaluated from `positives`/`negatives` paths:
 ```bash
 # All baseline DCAT-AP tests
-uv run scripts/validate.py --data sample_data/baseline-dcat-ap/ --shacl shacl/
+uv run scripts/validate_suite.py --data sample_data/baseline-dcat-ap/ --shacl shacl/
 
 # All mobility-specific tests
-uv run scripts/validate.py --data sample_data/mobility/ --shacl shacl/
+uv run scripts/validate_suite.py --data sample_data/mobility/ --shacl shacl/
 
 # All multilingual tests
-uv run scripts/validate.py --data sample_data/multilingual/ --shacl shacl/
+uv run scripts/validate_suite.py --data sample_data/multilingual/ --shacl shacl/
 
 # All partial graph tests
-uv run scripts/validate.py --data sample_data/partial_graphs/ --shacl shacl/
+uv run scripts/validate_suite.py --data sample_data/partial_graphs/ --shacl shacl/
 
 # All range constraint tests
-uv run scripts/validate.py --data sample_data/ranges/ --shacl shacl/
+uv run scripts/validate_suite.py --data sample_data/ranges/ --shacl shacl/
 
 # All vocabulary tests
-uv run scripts/validate.py --data sample_data/vocabularies/ --shacl shacl/
+uv run scripts/validate_suite.py --data sample_data/vocabularies/ --shacl shacl/
 
 # Run everything
-uv run scripts/validate.py --data sample_data/ --shacl shacl/
+uv run scripts/validate_suite.py --data sample_data/ --shacl shacl/
 ```
 
 ## Docker Usage