Skip to content

da_build: add PDF/UA checker with severity classification and optional strict mode#86

Open
nonprofittechy wants to merge 2 commits intomainfrom
add-pac-to-build
Open

da_build: add PDF/UA checker with severity classification and optional strict mode#86
nonprofittechy wants to merge 2 commits intomainfrom
add-pac-to-build

Conversation

@nonprofittechy
Copy link
Copy Markdown
Member

This PR adds automated PDF accessibility validation to the da_build action using veraPDF (https://verapdf.org/), ensuring that PDFs in Docassemble repositories comply with the PDF/UA-1 (ISO 14289-1) standard.

For now, this defaults to being warnings only. In a future version of this action (probably about 30 days?) we will start failing repos that do not pass the PDF accessibility checks.

Key Features

  • Smart Severity Levels: Categorizes accessibility rules to prioritize critical blockers while still surfacing advisory warnings:
    • Fail: Critical issues (e.g., untagged content, missing alt text, non-embedded fonts).
    • Warning: Advisory issues (e.g., missing metadata title, language tags).
    • Info/Suppressed: Technical details or rules that only matter in specific contexts (like non-flattened forms).
  • Comprehensive Reporting:
    • GitHub Job Summary: Generates a detailed breakdown for every PDF, including rule descriptions and failure counts.
    • Annotations: Emits GitHub warning or error annotations directly in the action logs.

Configuration & Syntax

Add these optional inputs to the da_build step in your workflow:

 - uses: SuffolkLITLab/ALActions/da_build@main
   with:
     # Set to "error" to fail the build on accessibility failures (default is "warning")
     verapdf-validation-mode: "error"

     # Set to "true" to enforce tab-order and form-field annotation rules (default is "false")
     verapdf-strict: "true"

     # Set to "true" to skip the PDF check and veraPDF installation entirely
     skip-pdf-check: "true"

How to Adjust Strictness

  • Default (Non-Strict): By default, rules related to tab-order (§7.18.3) and widget annotation structure (§7.18.4) are suppressed. Since many Docassemble interviews flatten form fields
    before they reach the user, these rules are often irrelevant.
  • Strict Mode: Set verapdf-strict: "true" to treat these suppressed rules as regular failures.

How to Turn It Off

  • Advisory Only: Keep the default verapdf-validation-mode: "warning". The check will run and report issues, but it will never fail your build.
  • Complete Disable: Set skip-pdf-check: "true". This prevents the action from downloading/installing veraPDF and skips the scanning process entirely.

Rules are now classified into four levels based on their real-world impact:

- **fail**: structural failures that break screen readers (missing StructTreeRoot,
  untagged content, figures without alt text, missing font/ToUnicode, etc.)
- **warning**: advisory issues that don't break AT but should be fixed (missing
  dc:title, missing document language, missing DisplayDocTitle, etc.)
- **info**: administrative metadata rules suppressed by default (§5 PDF/UA
  identifier, optional-content config, PrinterMark annotations)
- **form_annotation**: tab-order (§7.18.3) and widget annotation structure
  (§7.18.4) rules — suppressed in non-strict mode because forms are often
  flattened before users see them; treated as failures in strict mode

New input: `verapdf-strict` (default `false`).
  Set to `true` to activate form-annotation structure rules.

Job summary now groups results into failure / advisory-warning / passing
sections, with advisory warnings and suppressed rules in collapsible details.
Console output shows a per-PDF breakdown (N failure(s), N warning(s), N suppressed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automated PDF/UA-1 accessibility validation to the da_build composite GitHub Action using veraPDF, with rule severity classification and an optional strict mode to control how form/tab-order related rules are treated.

Changes:

  • Add check_pdf_accessibility.py to run veraPDF, classify rule severities, emit annotations, and write a GitHub Step Summary report.
  • Extend da_build/action.yml with new inputs (verapdf-validation-mode, verapdf-strict), install veraPDF, and run the checker.
  • Document the new PDF accessibility behavior and inputs in README.md.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
da_build/check_pdf_accessibility.py New checker script that runs veraPDF UA1 validation, buckets failures by severity, and outputs annotations + job summary.
da_build/action.yml Adds inputs and new steps to install veraPDF and invoke the checker as part of da_build.
README.md Documents PDF accessibility checking and the new action inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread da_build/action.yml
Comment on lines +17 to +29
verapdf-validation-mode:
description: >-
How to report PDF/UA-1 accessibility failures found by veraPDF.
'warning' annotates the job without failing it (default).
'error' fails the build.
default: "warning"
verapdf-strict:
description: >-
Enable strict PDF/UA-1 checking.
When 'false' (default), tab-order and annotation structure rules for form
fields are suppressed because forms are often flattened before users see
them. Set to 'true' to treat those rules as failures.
default: "false"
Comment thread da_build/action.yml
Comment on lines +111 to +121
- name: Install veraPDF
run: |
# veraPDF 1.28+ is required for compatibility with Java 21 (GitHub Actions default).
VERAPDF_VERSION="1.28.1"
VERAPDF_MINOR="1.28"
INSTALL_DIR="${RUNNER_TEMP}/verapdf"

if command -v verapdf &>/dev/null; then
echo "veraPDF already available: $(verapdf --version 2>&1 | head -1)"
exit 0
fi
Comment on lines +184 to +187
"""Run veraPDF on a list of PDFs; return (stdout_xml, stderr)."""
cmd = [verapdf_cmd, "--flavour", "ua1", "--format", "xml"] + [str(p) for p in pdfs]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
return result.stdout, result.stderr
Comment on lines +428 to +430
# Passing PDFs
passing = [r for r in results if r.get("compliant")]
if passing:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants