Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,6 @@ Thumbs.db

# Validation outputs
reports/
/logs/
*.log
validation_results*.txt
74 changes: 74 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,80 @@ cd validation
uv sync # or: docker build -t mobilitydcat-validator .
```

## CLI Usage

Two separate workflows are available:
- Universal validation: `scripts/validate.py`
- Suite testing (positives/negatives expected outcomes): `scripts/validate_suite.py`

Run validator help:
```bash
uv run scripts/validate.py --help
```

Run suite tester help:
```bash
uv run scripts/validate_suite.py --help
```

Minimal default run:
```bash
uv run scripts/validate.py
```

Defaults used in minimal run:
- `--data data/`
- `--shacl shacl/`
- `--vocab sample_data/vocabularies/`
- `--report-file logs/validation-report.txt`

Example (directory validation):
```bash
uv run scripts/validate.py \
--data sample_data/ \
--shacl shacl/
```

Optional tuning example:
```bash
uv run scripts/validate.py \
--data sample_data/ \
--shacl shacl/ \
--timeout 30 \
--max-files-report 50
```

Report behavior:
- Terminal output is compact by default for large runs.
- Full violation details are always written to a report file.
- Default report path (if `--report-file` is omitted): `logs/validation-report.txt`.

Supported RDF serializations:
- The validator accepts multiple RDF serializations in one run, including `.ttl`, `.rdf`, `.xml`, `.nt`, `.n3`, `.jsonld`, `.json`, `.trig`, and `.nq`.
- You can point `--data` to a directory containing mixed formats; all supported files are discovered and validated.

Common options:
- `--data`: Input RDF file or directory
- `--shacl`: SHACL file or directory
- `--vocab`: Vocabulary stubs directory
- `--verbose` / `--no-verbose`: Show or hide detailed violations in terminal
- `--progress` / `--no-progress`: Per-file progress while validating directories
- `--timeout`: Per-file validation timeout in seconds (`0` disables timeout)
- `--max-files-report`: Safety option to cap VALID/INVALID terminal output on large runs (`50` default, `0` means unlimited)
- `--report-file`: Path for full detailed validation report

Suite testing workflow:
- Use `scripts/validate_suite.py` for `positives`/`negatives` test folders.
- Expected outcomes are inferred from directory names (`positives` => should conform, `negatives` => should violate).
- Files outside those folder patterns are reported as unclassified and fail the suite run.

Why `--vocab` is important:
- Some SHACL checks rely on external controlled vocabularies being available as RDF resources at validation time.
- Typical examples are EU File Type, EU Frequency, and mobility theme terms that are referenced by URI in datasets.
- The validator loads all `.ttl` files from the `--vocab` directory and merges them into each data graph before running SHACL.
- This prevents false violations caused by missing vocabulary resources during class/range checks.
- In most cases you can use the default path; override `--vocab` only when validating against a different vocabulary source.

## Structure
```
validation/
Expand Down
88 changes: 81 additions & 7 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

Validation suite for mobilityDCAT-AP 1.1.0 compliance using SHACL shapes.

## Workflows

This repository has two separate CLI workflows:
- Universal validation: `scripts/validate.py` (pure SHACL conformance checks)
- Suite testing: `scripts/validate_suite.py` (expects `positives`/`negatives` folder semantics)

## Quick Start
```bash
# Install dependencies
Expand All @@ -14,28 +20,96 @@ uv run scripts/validate.py --data sample_data/baseline-dcat-ap/negatives/B-N-01-
uv run scripts/validate.py --data sample_data/mobility/negatives/M-N-01-missing-mandatory-properties-dataset.ttl --shacl shacl/ -v
```

## CLI Options

Show all options:
```bash
uv run scripts/validate.py --help
```

Show suite tester options:
```bash
uv run scripts/validate_suite.py --help
```

Minimal default run:
```bash
uv run scripts/validate.py
```

Defaults used in minimal run:
- `--data data/`
- `--shacl shacl/`
- `--vocab sample_data/vocabularies/`
- `--report-file logs/validation-report.txt`

Current key options:
- `--data`: Path to RDF file or directory
- `--shacl`: Path to SHACL file or directory
- `--vocab`: Path to vocabulary stubs directory (default: `sample_data/vocabularies`)
- `--verbose` / `--no-verbose`: Toggle detailed violation output in terminal
- `--progress` / `--no-progress`: Toggle per-file progress output for directory validation
- `--timeout`: Per-file timeout in seconds (`0` disables timeout)
- `--max-files-report`: Safety option to cap VALID/INVALID terminal output and keep VS Code responsive on large runs (`0` means unlimited)
- `--report-file`: Write full detailed report (default: `logs/validation-report.txt`)

Why `--vocab` is important:
- Several shapes expect terms from external controlled vocabularies to be present as RDF resources.
- Common examples include EU file types, EU frequency values, and mobility theme concepts.
- The validator reads all `.ttl` files from the `--vocab` folder and merges them into each input graph before validation.
- This helps avoid false negatives/positives caused by unresolved vocabulary resources in class/range constraints.
- Keep the default in normal runs; set a custom `--vocab` path when you need to validate against another vocabulary snapshot.

Notes:
- Terminal output is intentionally compact by default for stability on large runs.
- Full violation details are written to the report file.

Supported RDF serializations:
- Validation supports multiple RDF serializations: `.ttl`, `.rdf`, `.xml`, `.nt`, `.n3`, `.jsonld`, `.json`, `.trig`, and `.nq`.
- A single directory run can include mixed serializations; all supported files are discovered automatically.

Example with explicit report file:
```bash
uv run scripts/validate.py \
--data sample_data/ \
--shacl shacl/ \
--report-file logs/validation-report.txt
```

Optional tuning example:
```bash
uv run scripts/validate.py \
--data sample_data/ \
--shacl shacl/ \
--timeout 30 \
--max-files-report 50 \
--report-file logs/validation-report-latest.txt
```

## Run All Test Suites

Use the dedicated suite runner so expectations are evaluated from `positives`/`negatives` paths:
```bash
# All baseline DCAT-AP tests
uv run scripts/validate.py --data sample_data/baseline-dcat-ap/ --shacl shacl/
uv run scripts/validate_suite.py --data sample_data/baseline-dcat-ap/ --shacl shacl/

# All mobility-specific tests
uv run scripts/validate.py --data sample_data/mobility/ --shacl shacl/
uv run scripts/validate_suite.py --data sample_data/mobility/ --shacl shacl/

# All multilingual tests
uv run scripts/validate.py --data sample_data/multilingual/ --shacl shacl/
uv run scripts/validate_suite.py --data sample_data/multilingual/ --shacl shacl/

# All partial graph tests
uv run scripts/validate.py --data sample_data/partial_graphs/ --shacl shacl/
uv run scripts/validate_suite.py --data sample_data/partial_graphs/ --shacl shacl/

# All range constraint tests
uv run scripts/validate.py --data sample_data/ranges/ --shacl shacl/
uv run scripts/validate_suite.py --data sample_data/ranges/ --shacl shacl/

# All vocabulary tests
uv run scripts/validate.py --data sample_data/vocabularies/ --shacl shacl/
uv run scripts/validate_suite.py --data sample_data/vocabularies/ --shacl shacl/

# Run everything
uv run scripts/validate.py --data sample_data/ --shacl shacl/
uv run scripts/validate_suite.py --data sample_data/ --shacl shacl/
```

## Docker Usage
Expand Down
Loading