-
Notifications
You must be signed in to change notification settings - Fork 8
Preview/Microsoft-Presidio-Analyzer #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| reviews: | ||
| review_details: true | ||
| tools: | ||
| presidio: | ||
| enabled: true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # presidio_smoke.py — noisy samples to exercise Microsoft Presidio–style detectors. | ||
| # Default CodeRabbit Presidio pass: CREDIT_CARD, US_SSN, CRYPTO, PHONE_NUMBER only. | ||
|
|
||
| # --- Credit cards (CREDIT_CARD): common test PAN shapes --- | ||
| # Dashed | ||
| visa_dashed = "4111-1111-1111-1111" | ||
| mc_dashed = "5500-0000-0000-0004" | ||
| amex_dashed = "3400-000000-00009" | ||
| discover_dashed = "6011-0000-0000-0004" | ||
|
|
||
| # Spaced | ||
| visa_spaced = "4111 1111 1111 1111" | ||
| mc_spaced = "5500 0000 0000 0004" | ||
|
|
||
| # Same digits, compact (extra shape) | ||
| mc_compact = "5500000000000004" | ||
|
|
||
| # Remediation example PAN (dashed + spaced + compact — filter hits only exact `4242 4242 4242 4242` / `4242424242424242` at Presidio column) | ||
| remediation_pan_spaced = "4242 4242 4242 4242" | ||
| remediation_pan_compact = "4242424242424242" | ||
|
Comment on lines
+6
to
+20
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make PAN test literals scanner-safe to avoid CI/compliance noise. Lines 6-20 are intentionally test data, but they currently trigger multiple OpenGrep Example pattern (apply consistently to all PAN literals)+def _join(parts: tuple[str, ...], sep: str = "") -> str:
+ return sep.join(parts)
+
-visa_dashed = "4111-1111-1111-1111"
+visa_dashed = _join(("4111", "1111", "1111", "1111"), "-")
-mc_compact = "5500000000000004"
+mc_compact = _join(("5500", "0000", "0000", "0004"))🧰 Tools🪛 OpenGrep (1.20.0)[ERROR] 6-6: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number-dashed) [ERROR] 7-7: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number-dashed) [ERROR] 8-8: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number-dashed) [ERROR] 9-9: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number-dashed) [ERROR] 12-12: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number-dashed) [ERROR] 13-13: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number-dashed) [ERROR] 16-16: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number) [ERROR] 18-18: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number-dashed) [ERROR] 18-18: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number) [ERROR] 19-19: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number-dashed) [ERROR] 20-20: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead. (coderabbit.pii.credit-card-number) 🤖 Prompt for AI Agents |
||
|
|
||
| # --- US SSN / ITIN-shaped (US_SSN / sometimes ITIN recognizers) --- | ||
| ssn_like = "078-05-1120" | ||
| ssn_invalid_example = "000-00-0000" # exact remediation literal (skipped only if entity+col match) | ||
| itin_like = "900-70-0000" # exact remediation literal for US_ITIN when that entity is enabled | ||
| odd_dashed_15 = "856-45-6790" # short dashed number — may or may not fire; kept from your file | ||
|
|
||
| # --- Crypto (CRYPTO) --- | ||
| btc_testnet = "tb1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t4" # exact remediation literal | ||
| eth_zero = "0x0000000000000000000000000000000000000000" # exact remediation literal | ||
| eth_nonzero = "0x742d35Cc6634C0532925a3b844Bc9e7595f213b" # extra shape | ||
|
|
||
| # --- Phone (PHONE_NUMBER) --- | ||
| phone_fictional = "+1-555-0100" # exact remediation literal | ||
| phone_fictional_template = "555-01XX" # exact remediation literal (unusual in real code) | ||
| phone_plain = "5550100199" | ||
|
|
||
| # --- Email (EMAIL_ADDRESS) — only if entity enabled in Presidio config --- | ||
| email_example = "user@example.com" # exact remediation literal | ||
|
|
||
| # --- IBAN (IBAN_CODE) — only if entity enabled --- | ||
| iban_example = "DE89370400440532013000" # exact remediation literal | ||
|
|
||
| # --- US bank routing (US_BANK_NUMBER) — only if entity enabled --- | ||
| routing_nines = "000000000" # exact remediation literal | ||
| routing_doc_like = "021000021" # realistic-shaped 9 digits; may fire if recognizer exists | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| """Customer-service handler — sample with PII embedded in security antipatterns. | ||
|
|
||
| Each PII value is placed on a line that is independently security-relevant | ||
| (hardcoded credential, logged PII, hardcoded test fixture in production code) | ||
| to maximize the chance the LLM reviewer will comment on the same line and | ||
| trigger the inline tool-tag render path despite the upstream render-suppression | ||
| bug in successful tool runs. | ||
| """ | ||
|
|
||
| import logging | ||
| import sqlite3 | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| DATABASE_URL = "postgres://admin:Sup3rS3cret123@db.internal.example.com:5432/users" | ||
| ADMIN_EMAIL = "alice.thompson@demoapp.example.com" | ||
| ADMIN_PHONE = "+1-555-0100" | ||
| ADMIN_SSN = "000-00-0000" | ||
| SUPPORT_HOTLINE = "555-0100" | ||
| SERVER_IP = "192.168.1.100" | ||
| PAYMENT_TEST_CARD = "4111-1111-1111-1111" | ||
| ROUTING_NUMBER = "021000021" | ||
| WALLET_TEST = "tb1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t4" | ||
|
|
||
|
|
||
| def authenticate_admin(): | ||
| logger.info("Authenticating admin %s from %s", ADMIN_EMAIL, SERVER_IP) | ||
| conn = sqlite3.connect(DATABASE_URL) | ||
| return conn | ||
|
|
||
|
|
||
| def process_payment(card_number, customer_email, customer_phone): | ||
| logger.info( | ||
| "Payment attempt: card=%s email=%s phone=%s", | ||
| card_number, | ||
| customer_email, | ||
| customer_phone, | ||
| ) | ||
| if card_number == PAYMENT_TEST_CARD: | ||
| return {"status": "test", "ssn": ADMIN_SSN, "routing": ROUTING_NUMBER} | ||
| return {"status": "approved"} | ||
|
|
||
|
|
||
| def send_welcome_emails(): | ||
| test_customers = [ | ||
| {"name": "Bob Smith", "email": "bob.smith@example.com", "phone": "555-0100"}, | ||
| {"name": "Carol Jones", "email": "carol.jones@example.com", "phone": "555-0101"}, | ||
| ] | ||
| for c in test_customers: | ||
| logger.info("Welcome %s, contact %s / %s", c["name"], c["email"], c["phone"]) | ||
|
|
||
|
|
||
| def emergency_wallet_payout(): | ||
| logger.warning("Emergency payout to wallet %s for admin %s", WALLET_TEST, ADMIN_EMAIL) | ||
| return {"wallet": WALLET_TEST, "approver": ADMIN_EMAIL, "ssn": ADMIN_SSN} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: coderabbitai/coderabbit-pr-review
Length of output: 1159
Consider renaming to follow Python module naming conventions if this becomes a shared fixture.
The hyphenated filename
cards-test.pycannot be imported with standard Python syntax (e.g.,import cards-testfails). While this file is not currently imported anywhere in the codebase, renaming tocards_test.pyfollows Python conventions and prevents future breakage. Also update Line 1 to reflect the correct filename (currently showspresidio_smoke.py).🧰 Tools
🪛 Ruff (0.15.12)
[warning] 1-1: Comment contains ambiguous
–(EN DASH). Did you mean-(HYPHEN-MINUS)?(RUF003)
🤖 Prompt for AI Agents