Skip to content

Latest commit

 

History

History
213 lines (173 loc) · 9.08 KB

File metadata and controls

213 lines (173 loc) · 9.08 KB

MetaSanitize - Project Structure

C:\Users\smwlc\proj\Cyber\
│
├── 📄 README.md                      # Quick start guide
├── 📄 USER_GUIDE.md                 # Comprehensive user manual (8KB)
├── 📄 ARCHITECTURE.md               # Technical design document (15KB)
├── 📄 EXAMPLES.md                   # Usage examples & configs (4KB)
├── 📄 IMPLEMENTATION_SUMMARY.md     # Project completion summary
├── 📄 LICENSE                       # MIT License + ethical use notice
├── 📄 .gitignore                    # Git ignore patterns
├── 📄 requirements.txt              # Python dependencies
├── 📄 setup.py                      # Package installation config
├── 📄 pytest.ini                    # Test configuration
│
├── 📁 metasanitize/                 # Main package
│   ├── 📄 __init__.py              # Package exports
│   ├── 📄 __main__.py              # CLI entry point
│   │
│   ├── 📁 core/                    # Core engine
│   │   ├── 📄 __init__.py
│   │   ├── 📄 models.py            # Data models (Pydantic)
│   │   └── 📄 engine.py            # MetadataEngine orchestrator
│   │
│   ├── 📁 handlers/                # Format-specific handlers
│   │   ├── 📄 __init__.py
│   │   ├── 📄 base.py              # BaseHandler abstract class
│   │   ├── 📄 registry.py          # Handler dispatcher
│   │   ├── 📄 image_handler.py    # JPG/PNG/HEIC (EXIF/XMP)
│   │   ├── 📄 pdf_handler.py      # PDF metadata + XMP
│   │   ├── 📄 audio_handler.py    # MP3/FLAC (ID3/Vorbis)
│   │   ├── 📄 video_handler.py    # MP4/MOV (scan only)
│   │   └── 📄 document_handler.py # DOCX/PPTX (Office XML)
│   │
│   ├── 📁 privacy/                 # Privacy analysis
│   │   ├── 📄 __init__.py
│   │   └── 📄 analyzer.py          # Risk scoring & diff generation
│   │
│   ├── 📁 profiles/                # Dummy metadata generation
│   │   ├── 📄 __init__.py
│   │   └── 📄 generator.py         # Realistic fake metadata
│   │
│   ├── 📁 gui/                     # Graphical User Interface
│   │   ├── 📄 __init__.py
│   │   ├── 📄 app.py               # App entry point & styling
│   │   └── 📄 main_window.py       # Main interaction logic
│   │
│   ├── 📁 cli/                     # Command-line interface
│   │   ├── 📄 __init__.py
│   │   └── 📄 main.py              # Click-based CLI (scan, clean, inject, diff)
│   │
│   └── 📁 utils/                   # Utilities
│       ├── 📄 __init__.py
│       └── 📄 batch.py             # Batch processor (parallel)
│
└── 📁 tests/                        # Test suite
    ├── 📄 __init__.py
    ├── 📄 test_engine.py           # Core engine tests
    ├── 📄 test_privacy.py          # Privacy analyzer tests
    └── 📄 test_profiles.py         # Generator tests

File Statistics

Category Files Lines of Code (est.)
Core Engine 3 ~800
GUI Interface 2 ~900
Handlers 7 ~1,400
Privacy & Profiles 2 ~600
CLI & Utils 2 ~500
Tests 4 ~400
Documentation 6 ~3,000
Config 5 ~100
TOTAL 29 ~6,800

Key Files by Purpose

🚀 Getting Started

  1. README.md - Quick overview and installation
  2. USER_GUIDE.md - Step-by-step usage instructions
  3. requirements.txt - Install dependencies

🔧 Development

  1. metasanitize/core/engine.py - Main orchestrator
  2. metasanitize/handlers/ - Add new format support here
  3. tests/ - Test suite

📖 Documentation

  1. ARCHITECTURE.md - Technical deep-dive
  2. EXAMPLES.md - Code samples & configs
  3. IMPLEMENTATION_SUMMARY.md - Project status

🎯 Usage

gui/app.py** - GUI Application launcher 2. metasanitize/cli/main.py - CLI commands 3. metasanitize/cli/main.py - CLI commands 2. metasanitize/main.py - Run as module 3. **setup.py** - Install as package

Module Dependencies

┌─────────────────────────────────────────────────────────────┐
│                          CLI (main.py)                       │
└──────────────────┬──────────────────────────────────────────┘
                   │
          ┌────────┴────────┬──────────────┐
          │                 │              │
┌─────────▼────────┐ ┌──────▼─────┐ ┌─────▼──────┐
│  MetadataEngine  │ │  Analyzer  │ │ Generator  │
│   (core/)        │ │ (privacy/) │ │(profiles/) │
└────────┬─────────┘ └────────────┘ └────────────┘
         │
         │
┌────────▼────────────────────────────────────────────────────┐
│              HandlerRegistry (handlers/)                     │
│  ┌──────────┬──────────┬──────────┬──────────┬──────────┐  │
│  │  Image   │   PDF    │  Audio   │  Video   │ Document │  │
│  │ Handler  │ Handler  │ Handler  │ Handler  │ Handler  │  │
│  └──────────┴──────────┴──────────┴──────────┴──────────┘  │
└──────────────────────────────────────────────────────────────┘

Import Graph

# Main package
from metasanitize import MetadataEngine, PrivacyAnalyzer, DummyMetadataGenerator

# Core models
from metasanitize.core.models import (
    MetadataRecord,
    MetadataField,
    PrivacyRisk,
    ProcessingResult,
)

# Batch processing
from metasanitize.utils.batch import BatchProcessor

# CLI
from metasanitize.cli.main import cli

Size Breakdown

Component Approx Size Description
Handlers 40% Format-specific extraction/cleaning
Core Engine 20% Orchestration & backup management
Privacy 15% Risk analysis & scoring
CLI 15% User interface & reporting
Utils 5% Batch processing & helpers
Profiles 5% Dummy metadata generation

Extensibility Points

Add New File Format

  1. Create metasanitize/handlers/new_format_handler.py
  2. Inherit from BaseHandler
  3. Implement extract_metadata(), clean_metadata(), modify_metadata()
  4. Register in handlers/registry.py

Add New Cleaning Profile

  1. Update handlers/base.py or specific handler
  2. Add profile logic in clean_metadata() method
  3. Document in USER_GUIDE.md

Add New Dummy Profile

  1. Update profiles/generator.py
  2. Add device/software variants
  3. Ensure cross-field consistency

Add New CLI Command

  1. Update cli/main.py
  2. Add new @cli.command() function
  3. Wire up to engine methods

Configuration Files

Generated at Runtime

  • .originals/ - Backup directory (auto-created)
  • htmlcov/ - Test coverage reports (pytest)

User-Customizable (future)

  • config.yaml - User preferences
  • profiles/*.yaml - Custom cleaning profiles
  • rules.yaml - Automation rules

Documentation Size

Document Size Purpose
USER_GUIDE.md ~8 KB How to use the tool
ARCHITECTURE.md ~15 KB Technical design
EXAMPLES.md ~4 KB Code samples
README.md ~2 KB Quick start
IMPLEMENTATION_SUMMARY.md ~6 KB Project status

Total Documentation: ~35 KB (comprehensive)


All components implemented and documented. Ready for installation and testing.