itheCreator1
diff --git a/‎CHANGELOG.md‎
Lines changed: 19 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 38 additions & 13 deletions b/‎README.md‎
Lines changed: 38 additions & 13 deletions
diff --git a/‎sha/analyzer.py‎
Lines changed: 64 additions & 3 deletions b/‎sha/analyzer.py‎
Lines changed: 64 additions & 3 deletions
diff --git a/‎sha/analyzers/csp.py‎
Lines changed: 36 additions & 1 deletion b/‎sha/analyzers/csp.py‎
Lines changed: 36 additions & 1 deletion
diff --git a/‎sha/config.py‎
Lines changed: 4 additions & 0 deletions b/‎sha/config.py‎
Lines changed: 4 additions & 0 deletions
@@ -8,14 +8,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **Automatic retry logic** with exponential backoff for transient failures (429, 503, timeouts)
+- **Verbose and quiet modes** (`-v/--verbose`, `-q/--quiet`) for better output control
+- **JSON schema versioning** for backwards compatibility tracking
+- **Enhanced SSRF protection** with intermediate redirect validation (not just final destination)
+- **Robust analyzer validation** - runtime validation of all analyzer return values
+- **Maximum timeout validation** - prevents extremely long hangs (300s max)
+- **CSP DoS protection** - 10KB size limit to prevent memory exhaustion attacks
 - GitHub Actions CI/CD pipeline for automated testing across Python 3.8-3.12
 - Pre-commit hooks configuration with black, isort, flake8, mypy, and bandit
 - Comprehensive security policy (SECURITY.md) with vulnerability disclosure process
 - Contributing guidelines (CONTRIBUTING.md) for external contributors
 - Development tool configurations in pyproject.toml (black, isort, mypy, bandit, coverage)
+- 18 new edge case tests (IPv6 URLs, malformed CSP, timeout boundaries, schema version)
 
 ### Changed
+- **CSP parser hardening** - gracefully handles empty directives, duplicates, and malformed input
+- **Exception handling improvements** - specific exception types instead of broad catches
+- **HTTP error handling** - preserves exit code 3 even when analysis fails during error
 - Enhanced pyproject.toml with dev dependencies and tool configurations
+- Updated test suite from 478 to 494 tests (97% coverage)
+- `fetch_headers_with_retry()` now used by default instead of `fetch_headers()`
+
+### Fixed
+- Set-Cookie exception handling now catches specific exceptions only
+- CSP parser no longer crashes on extremely long policies (raises ValueError instead)
+- Timeout parameter now properly validated with upper bound
+- Mock objects in tests properly handled by redirect validation code
 
 ## [1.0.0] - 2024-12-04
 
 
@@ -1,20 +1,22 @@
 # Security Header Analyzer
 
 [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
-[![Tests](https://img.shields.io/badge/tests-478%20passing-success.svg)](https://github.com/itheCreator1/security-header-analyzer/actions)
-[![Coverage](https://img.shields.io/badge/coverage-96%25-brightgreen.svg)](https://github.com/itheCreator1/security-header-analyzer)
+[![Tests](https://img.shields.io/badge/tests-494%20passing-success.svg)](https://github.com/itheCreator1/security-header-analyzer/actions)
+[![Coverage](https://img.shields.io/badge/coverage-97%25-brightgreen.svg)](https://github.com/itheCreator1/security-header-analyzer)
 [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
 
 A lightweight Python CLI tool that fetches and analyzes HTTP security headers according to Mozilla and OWASP best practices. This tool is designed for developers, penetration testers, and system administrators who want a quick, reliable way to evaluate the security posture of a website's HTTP response headers.
 
 ## 🚀 Features
 
 * **15 Security Header Analyzers**: HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Set-Cookie, Cache-Control, Expect-CT, Permissions-Policy, COEP, COOP, CORP, X-XSS-Protection, X-Download-Options, X-Permitted-Cross-Domain-Policies
-* **SSRF Protection**: Built-in safeguards against Server-Side Request Forgery attacks
-* **Multiple Output Formats**: Human-readable text or JSON for automation
+* **Enhanced SSRF Protection**: Multi-layer validation including intermediate redirect checks and DNS rebinding prevention
+* **Automatic Retry Logic**: Exponential backoff for 429/503 errors and transient network failures
+* **Robust Error Handling**: Graceful handling of malformed CSP policies, analyzer failures, and edge cases
+* **Multiple Output Formats**: Human-readable text or JSON with schema versioning for automation
 * **Severity Classification**: Issues categorized as Critical, High, Medium, or Low
-* **96% Test Coverage**: 478 comprehensive tests ensuring reliability
-* **Type Safety**: Full type hints with mypy support
+* **97% Test Coverage**: 494 comprehensive tests ensuring reliability
+* **Type Safety**: Full type hints with mypy support and runtime validation
 * **CI/CD Ready**: Easy integration with GitHub Actions, GitLab CI, Jenkins
 * **Extensible**: Add new header analyzers with minimal code changes
 
@@ -44,16 +46,39 @@ Run the analyzer from the command line:
 python -m sha https://example.com
 ```
 
-### Useful options
+### Command-Line Options
 
 ```
---json               Outputs results in JSON format
---timeout 10         Sets request timeout
---no-redirects       Disables following HTTP redirects
---user-agent "MyBot"  Uses a custom User-Agent
---debug              Shows verbose debug logs
+--json                 Output results in JSON format (with schema version)
+--timeout SECONDS      Request timeout (1-300 seconds, default: 10)
+--no-redirects         Disable following HTTP redirects
+--max-redirects N      Maximum redirects to follow (default: 5)
+--user-agent STRING    Custom User-Agent string
+-v, --verbose          Enable verbose output with detailed progress
+-q, --quiet            Suppress all output except errors and final report
+--debug                Show full error tracebacks
+--version              Show version information
 ```
 
+### Advanced Features
+
+**Automatic Retry with Exponential Backoff:**
+The tool automatically retries failed requests with exponential backoff for:
+- HTTP 429 (Too Many Requests) - respects Retry-After header
+- HTTP 503 (Service Unavailable) - respects Retry-After header
+- Transient network errors (timeouts, connection failures)
+
+**Enhanced SSRF Protection:**
+- Pre-request DNS validation
+- Post-redirect DNS rebinding checks
+- Intermediate redirect validation (all redirects in chain)
+- Private IP range blocking (IPv4 and IPv6)
+
+**Robust Error Handling:**
+- Malformed CSP policies are parsed gracefully with detailed error messages
+- Analyzer failures are caught and reported without stopping analysis
+- HTTP errors with headers still allow partial analysis
+
 ## 📖 Documentation
 
 - **[Architecture Guide](docs/architecture-overview.md)** - System design, components, and extensibility
@@ -78,7 +103,7 @@ security-header-analyzer/
 │   ├── reporter.py       # Report generation (text/JSON)
 │   ├── config.py         # Configuration and exceptions
 │   └── analyzers/        # Individual header analyzers (15 total)
-├── tests/                # Comprehensive test suite (478 tests, 96% coverage)
+├── tests/                # Comprehensive test suite (494 tests, 97% coverage)
 ├── docs/                 # Documentation
 └── .github/              # CI/CD workflows
 ```
 
@@ -60,10 +60,47 @@
 from typing import Any, Dict, List, Union
 
 from .analyzers import ANALYZER_REGISTRY, get_all_header_keys
+from .config import STATUS_ACCEPTABLE, STATUS_BAD, STATUS_GOOD, STATUS_MISSING
 
 # Type alias for finding result
 Finding = Dict[str, Any]
 
+# Define required keys for a valid Finding
+REQUIRED_FINDING_KEYS = {
+    "header_name", "status", "severity", "message", "actual_value", "recommendation"
+}
+
+
+def validate_finding(finding: Dict[str, Any], header_key: str) -> None:
+    """
+    Validate that a finding has all required keys and correct types.
+
+    Args:
+        finding: Finding dictionary to validate
+        header_key: Header key for error messages
+
+    Raises:
+        ValueError: If finding is missing required keys or has wrong types
+    """
+    if not isinstance(finding, dict):
+        raise ValueError(f"Analyzer for {header_key} returned non-dict: {type(finding)}")
+
+    missing_keys = REQUIRED_FINDING_KEYS - set(finding.keys())
+    if missing_keys:
+        raise ValueError(
+            f"Analyzer for {header_key} returned finding missing keys: {missing_keys}"
+        )
+
+    # Validate types
+    if not isinstance(finding["header_name"], str):
+        raise ValueError(f"header_name must be str, got {type(finding['header_name'])}")
+
+    if finding["status"] not in (STATUS_GOOD, STATUS_ACCEPTABLE, STATUS_BAD, STATUS_MISSING):
+        raise ValueError(f"Invalid status: {finding['status']}")
+
+    if not isinstance(finding["severity"], str):
+        raise ValueError(f"severity must be str, got {type(finding['severity'])}")
+
 
 def analyze_headers(headers: Dict[str, Union[str, List[str]]]) -> List[Finding]:
     """
@@ -99,9 +136,33 @@ def analyze_headers(headers: Dict[str, Union[str, List[str]]]) -> List[Finding]:
         # Get the analyzer function from the registry
         analyzer_func = ANALYZER_REGISTRY[header_key]
 
-        # Run the analysis
-        finding = analyzer_func(header_value)
-        findings.append(finding)
+        # Validate analyzer is callable
+        if not callable(analyzer_func):
+            raise ValueError(f"Analyzer for {header_key} is not callable: {analyzer_func}")
+
+        try:
+            # Run the analysis
+            finding = analyzer_func(header_value)
+
+            # Validate finding structure
+            validate_finding(finding, header_key)
+
+            findings.append(finding)
+
+        except Exception as e:
+            # Log error and create error finding
+            import sys
+            print(f"Warning: Analyzer for {header_key} failed: {e}", file=sys.stderr)
+
+            # Create placeholder finding for failed analyzer
+            findings.append({
+                "header_name": header_key.replace("-", " ").title(),
+                "status": STATUS_MISSING,
+                "severity": "info",
+                "message": f"Analyzer error: {e}",
+                "actual_value": None,
+                "recommendation": "Please report this issue to the developers",
+            })
 
     return findings
 
 
@@ -167,35 +167,59 @@ def parse_csp(value: str) -> Dict[str, List[str]]:
     """
     Parse CSP header value into directives.
 
+    Handles:
+    - Empty directives
+    - Malformed directives
+    - Duplicate directives (last one wins, per CSP spec)
+    - Extremely long CSPs (memory limit protection)
+
     Args:
         value: CSP header value
 
     Returns:
         Dictionary mapping directive names to lists of values
 
+    Raises:
+        ValueError: If CSP is too large (potential DoS)
+
     Example:
         >>> parse_csp("default-src 'self'; script-src 'self' https://cdn.example.com")
         {
             "default-src": ["'self'"],
             "script-src": ["'self'", "https://cdn.example.com"]
         }
     """
+    # Protect against DoS via extremely long CSP
+    MAX_CSP_LENGTH = 10000  # 10KB
+    if len(value) > MAX_CSP_LENGTH:
+        raise ValueError(f"CSP too long: {len(value)} bytes (max {MAX_CSP_LENGTH})")
+
     directives = {}
 
     # Split by semicolon to get individual directives
     for directive_str in value.split(";"):
         directive_str = directive_str.strip()
+
+        # Skip empty directives (e.g., ";;;" or trailing semicolon)
         if not directive_str:
             continue
 
         # Split directive into name and values
         parts = directive_str.split()
+
+        # Skip if no parts after splitting (e.g., all whitespace)
         if not parts:
             continue
 
+        # Skip if directive has no name (e.g., " 'self'" with no directive name)
+        if not parts[0]:
+            continue
+
         directive_name = parts[0].lower()
         directive_values = parts[1:] if len(parts) > 1 else []
 
+        # Handle duplicate directives: last one wins (standard CSP behavior per spec)
+        # If directive already exists, it will be overwritten
         directives[directive_name] = directive_values
 
     return directives
@@ -450,7 +474,18 @@ def analyze(value: Optional[str]) -> Dict[str, Any]:
         }
 
     # Parse CSP into directives
-    directives = parse_csp(value)
+    try:
+        directives = parse_csp(value)
+    except ValueError as e:
+        # CSP is malformed or too large
+        return {
+            "header_name": header_name,
+            "status": STATUS_BAD,
+            "severity": "medium",
+            "message": f"CSP header is malformed: {e}",
+            "actual_value": value,
+            "recommendation": "Fix CSP syntax errors and ensure policy is under 10KB",
+        }
 
     # Check for dangerous patterns
     dangerous_findings = check_csp_dangerous_patterns(directives, CONFIG)
 
@@ -74,11 +74,15 @@
 
 # HTTP Request Configuration
 DEFAULT_TIMEOUT = 10  # seconds
+MAX_TIMEOUT = 300  # 5 minutes maximum - prevents extremely long hangs
 DEFAULT_MAX_REDIRECTS = 5
 DEFAULT_USER_AGENT = (
     f"SecurityHeaderAnalyzer/{VERSION} (https://github.com/ThodorhsPerros/security-header-analyzer)"
 )
 
+# Report Schema Version
+SCHEMA_VERSION = "1.0.0"  # JSON report schema version for backwards compatibility
+
 # Private IP ranges for SSRF protection
 PRIVATE_IP_RANGES = [
     "127.0.0.0/8",  # Loopback