Skip to content

dkfancska/DagRuff

Repository files navigation

DagRuff

An extremely fast Python linter for Apache Airflow DAG files, written in Python.

PyPI version Python Version License Coverage

DagRuff is a linter designed to catch common errors and enforce best practices in Apache Airflow DAG files. It checks for over 31 rules covering DAG structure, best practices, and Airflow-specific patterns.

Features

  • Fast: Built with performance in mind, using AST parsing for static analysis
  • Caching: Results are cached based on file hash for improved performance
  • Comprehensive: 31+ lint rules covering DAG structure, best practices, and Airflow patterns
  • Auto-fix: Automatically fix many common issues with --fix
  • Configurable: Configure rules via pyproject.toml or .dagruff.toml with validation
  • Plugin Support: Extend functionality with custom rule plugins via entry points
  • No Airflow Required: Works without Airflow for AST-based checks (optional DagBag validation requires Airflow)

Installation

# Basic installation (no Airflow, AST checks only)
pip install dagruff

# With Airflow support (recommended for full DagBag validation)
pip install dagruff[airflow]

Or install from source:

git clone https://github.com/dkfancska/dagruff.git
cd dagruff
pip install -e ".[airflow]"

Note: Basic installation works without Airflow and performs all static checks via AST. For DagBag validation (import checking and code execution), install with the airflow extra.

Usage

After installation, use the dagruff command:

# Check a single file
dagruff examples/example_dag_good.py

# Check a directory
dagruff examples/

# Filter by severity
dagruff examples/ --severity warning

# JSON output
dagruff examples/ --format json

# Use configuration file
dagruff --config .dagruff.toml

# Without path - uses paths from config
dagruff

# Auto-fix all fixable issues
dagruff examples/ --fix

# Auto-fix specific rules
dagruff examples/ --fix DAG001 DAG009 AIR003

# Ignore specific rules
dagruff examples/ --ignore DAG006 DAG007

# Disable caching (useful for CI/CD)
dagruff examples/ --no-cache

# Verbose logging
dagruff examples/ --log-level debug

Lint Rules

DagRuff implements 31 lint rules from various sources:

DAG Rules (13 rules)

  • DAG import and definition checks
  • dag_id validation and uniqueness
  • Required DAG parameters (dag_id, start_date)
  • Recommended parameters (dag_md)
  • Special checks for KubernetesPodOperator (requires container_resources and executor_resources)

Ruff AIR Rules (4 rules)

  • AIR002: Check for start_date presence
  • AIR003: Check catchup parameter
  • AIR013: Recommend max_active_runs
  • AIR014: Recommend max_active_tasks for Airflow 2+ (warn about deprecated concurrency)

flake8-airflow Rules (4 rules)

  • AF001: Forbid SubDagOperator usage
  • AF002: Security warnings for BashOperator
  • AF003: Check task_id uniqueness
  • AF004: Detect deprecated operators

airflint AST Rules (4 rules)

  • AIRFLINT001: Check task dependencies
  • AIRFLINT002: Check XCom usage
  • AIRFLINT003: Check Variables usage
  • AIRFLINT004: Check required operator parameters

Best Practices Rules (6 rules)

  • BP001: Check for top-level code avoidance
  • BP002: Check datetime function usage
  • BP003: Recommend execution_timeout for tasks
  • BP004: Check dependency method consistency
  • BP005: Recommend docstrings for tasks
  • BP006: Recommend dagrun_timeout for DAGs

Full documentation:

  • πŸ“– RULES.md - Complete rule descriptions with examples, quick reference, and grouping
  • πŸ”Œ PLUGINS.md - Plugin system documentation
  • πŸ”§ CONTRIBUTING.md - Contribution guidelines
  • βœ… PRE_COMMIT.md - Pre-commit hooks setup

Auto-fix (--fix)

DagRuff supports automatic fixing of many issues via the --fix flag:

Fixable Rules:

  • DAG001 - Adds from airflow import DAG import
  • DAG005 - Removes extra spaces in dag_id
  • DAG009 - Adds "owner": "airflow" to default_args
  • DAG010 - Adds "retries": 1 to default_args
  • AIR003 - Adds catchup=False to DAG
  • AIR013 - Adds max_active_runs=1 to DAG
  • AIR014 - Replaces concurrency with max_active_tasks or adds max_active_tasks=1

Usage:

# Fix all fixable issues
dagruff examples/ --fix

# Fix only specific rules
dagruff examples/ --fix DAG001 DAG009

# Combine with other options
dagruff examples/ --fix DAG001 --severity warning

Note: Auto-fix preserves code formatting and checks for duplicates before adding parameters. Uses AST-based approach for more reliable fixes with fallback to regex when needed.

Configuration

DagRuff can be configured via pyproject.toml or .dagruff.toml:

[tool.dagruff]
# Enable/disable specific rules
select = ["DAG001", "DAG002", "AIR003"]
ignore = ["DAG006", "BP005"]

# Set minimum severity level
severity = "error"  # or "warning", "info"

# Paths to check (automatically validated)
paths = ["dags/", "custom_dags/"]

# Per-file ignores
[tool.dagruff.per-file-ignores]
"legacy_dags/*.py" = ["DAG006", "DAG007"]

Configuration Validation: DagRuff validates configuration values:

  • Ensures paths and ignore are lists of strings
  • Validates rule ID format (e.g., DAG001, AIR002)
  • Normalizes whitespace and filters empty values
  • Gracefully handles invalid values with warnings

Caching: Results are cached by default based on file hash. Use --no-cache to disable:

  • Automatic cache invalidation on file changes
  • Memory-efficient singleton cache
  • Deep copy returns for safety

Examples

The examples/ directory contains:

  • example_dag_good.py - Example of a correct DAG
  • example_dag_bad.py - Example DAG with errors to demonstrate the linter

Plugins

DagRuff supports custom rule plugins via Python entry points. See PLUGINS.md for detailed documentation.

Quick Example:

# my_plugin/__init__.py
from typing import List
from dagruff.rules.ast_collector import ASTCollector
from dagruff.models import LintIssue, Severity

def check_all_custom_rules(collector: ASTCollector, file_path: str) -> List[LintIssue]:
    """Custom rule checker following RuleChecker protocol."""
    issues = []
    # Your custom logic here
    return issues
# pyproject.toml
[project.entry-points."dagruff.rules"]
my_custom_rule = "my_plugin:check_all_custom_rules"

Contributing

Contributions are welcome and highly appreciated! To get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Ensure tests pass (pytest tests/) - 296+ tests with 77% code coverage
  5. Ensure code is formatted (ruff format) and linted (ruff check)
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Pre-commit Hooks: Tests run automatically before each commit. See PRE_COMMIT.md for setup.

Development

Setup

# Clone the repository
git clone https://github.com/dkfancska/dagruff.git
cd dagruff

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode (with Airflow for full functionality)
pip install -e ".[airflow,dev]"
# or using uv
uv pip install -e ".[airflow,dev]"

# Run tests (296+ tests)
pytest tests/

# Run tests with coverage (current coverage: 77%)
pytest --cov=dagruff tests/

# Format code
ruff format dagruff tests/

# Lint code
ruff check dagruff tests/

# Run specific test file
pytest tests/test_linter.py -v

Project Structure

dagruff/
β”œβ”€β”€ dagruff/                # Main package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ cli/                # CLI package (refactored)
β”‚   β”‚   β”œβ”€β”€ __init__.py     # Main entry point
β”‚   β”‚   β”œβ”€β”€ runner.py       # CLI orchestrator
β”‚   β”‚   β”œβ”€β”€ linter.py       # Linting functions
β”‚   β”‚   β”œβ”€β”€ commands/       # Command pattern
β”‚   β”‚   β”‚   β”œβ”€β”€ base.py     # BaseCommand
β”‚   β”‚   β”‚   β”œβ”€β”€ check.py    # CheckCommand
β”‚   β”‚   β”‚   └── fix.py      # FixCommand
β”‚   β”‚   β”œβ”€β”€ formatters/    # Output formatters
β”‚   β”‚   β”‚   β”œβ”€β”€ human.py   # Human-readable format
β”‚   β”‚   β”‚   └── json.py    # JSON format
β”‚   β”‚   └── utils/          # CLI utilities
β”‚   β”‚       β”œβ”€β”€ args.py     # Argument parsing
β”‚   β”‚       β”œβ”€β”€ files.py    # File utilities
β”‚   β”‚       β”œβ”€β”€ config_handler.py
β”‚   β”‚       └── autofix_handler.py
β”‚   β”œβ”€β”€ config.py           # Configuration handling with validation
β”‚   β”œβ”€β”€ linter.py           # Main linter with caching
β”‚   β”œβ”€β”€ cache.py            # Caching implementation
β”‚   β”œβ”€β”€ models.py           # Data models
β”‚   β”œβ”€β”€ autofix.py          # Auto-fix implementation
β”‚   β”œβ”€β”€ plugins.py          # Plugin system
β”‚   β”œβ”€β”€ validation.py       # Input validation
β”‚   β”œβ”€β”€ logger.py           # Logging setup
β”‚   └── rules/              # Lint rules
β”‚       β”œβ”€β”€ base.py         # Protocols (RuleChecker, Linter, Autofixer)
β”‚       β”œβ”€β”€ ast_collector.py # AST data collector
β”‚       β”œβ”€β”€ dag_rules.py    # DAG-specific rules
β”‚       β”œβ”€β”€ ruff_air_rules.py
β”‚       β”œβ”€β”€ best_practices_rules.py
β”‚       β”œβ”€β”€ airflint_rules.py
β”‚       └── utils.py        # Rule utilities
β”œβ”€β”€ tests/                  # Tests (296+ tests)
β”œβ”€β”€ examples/               # Example DAG files
β”œβ”€β”€ pyproject.toml          # Project configuration
β”œβ”€β”€ README.md               # This file
└── RULES.md                # Rule descriptions

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

DagRuff draws inspiration from:

Special thanks to the Apache Airflow community for their excellent documentation and tooling.

Support

Having trouble? Check out the existing Issues or feel free to open a new one.

About

An fast linter for Apache Airflow DAG files. Catches common errors, enforces best practices. 31+ rules with auto-fix. No Airflow installation needed for AST checks.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages