Python project for detecting identity-theft / sensitive-credential requests in text.
It uses a rule-first pipeline with an optional ML fallback and returns structured
decisions: SAFE, REVIEW, or BLOCK, plus evidence and suggested actions.
Sensitive entities:
- Government ID: SSN, passport
- Financial: credit/debit card, CVV/CVC
- Account takeover: passwords, OTP/2FA codes, PINs, recovery codes
Outputs:
- Decision:
SAFE,REVIEW,BLOCK - Action:
allow,ask_clarification,refuse - Entities + evidence + confidence
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m privcheck_agent.cli "Please send your SSN and passport number."
- Rule detector: normalize text, match entity patterns, require nearby collection intent verbs (send/provide/enter/submit).
- ML fallback (optional): catch paraphrases and ambiguous requests.
- Decision:
SAFE/REVIEW/BLOCKwith evidence and confidence. - Action: allow, ask for clarification, or refuse.
python train_ml.py
This creates model.pkl. The agent automatically loads it if present.
privcheck_agent/agent.py: pipeline and guard logicprivcheck_agent/rules.py: regex, intent, and extraction rulesprivcheck_agent/ml.py: ML fallback and model loadingprivcheck_agent/cli.py: command line entrydata/sample_train.jsonl: starter training data