A production-ready serverless pattern for intelligent data normalization using Claude Haiku via AWS Bedrock
This pattern combines LLM-based normalization with statistical validation and regex post-processing to achieve high-quality data cleansing at ultra-low cost.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Messy Input │────▶│ Claude Haiku │────▶│ Clean Output │
│ "CRA 15 #100" │ │ (via Bedrock) │ │ "Cra. 15 #100" │
│ "BOGOTA" │ │ │ │ "Bogotá D.C." │
│ "ing sistemas" │ │ + Post-process │ │ "Ing. Sistemas"│
└─────────────────┘ └─────────────────┘ └─────────────────┘
Dual-layer architecture that combines:
- LLM intelligence for context-aware normalization
- Regex post-processing to catch LLM inconsistencies
- Statistical validation with 95% confidence intervals to detect quality drift
| Metric | Value |
|---|---|
| Records processed | 652 leads |
| Fields normalized | 4,280 |
| Improvement rate | 70.4% |
| Coverage | 99.2% |
| Cost per 1K records | $0.07 |
| Bug detection | Caught systematic "double-dot" bug via statistical analysis |
# Clone the repo
git clone https://github.com/gabanox/llm-data-normalization-pattern.git
cd llm-data-normalization-pattern
# Follow the 90-minute tutorial
open docs/en/TUTORIAL.md┌────────────────────┐
│ EventBridge │──▶ Daily at 2 AM
│ Scheduled Rule │
└─────────┬──────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Normalize Leads Lambda │
│ ┌───────────────────────────────────────────┐ │
│ │ 1. Query leads needing normalization │ │
│ │ 2. Generate field-specific prompts │ │
│ │ 3. Call Claude Haiku via Bedrock │ │
│ │ 4. Parse JSON response │ │
│ │ 5. Apply post-processing regex pipeline │ │ ◀─ Self-healing
│ │ 6. Store in normalizedData attribute │ │
│ │ 7. Track metrics (coverage, improvements) │ │
│ └───────────────────────────────────────────┘ │
└────────┬──────────────────────────┬─────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────────┐
│ DynamoDB │ │ AWS Bedrock │
│ leads table │ │ Claude 3 Haiku │
└──────────────────┘ └─────────────────────┘
| Goal | Document |
|---|---|
| Understand the pattern | README → Architecture |
| Implement it yourself | Tutorial ⭐ → Implementation |
| Understand the "why" | Explanation docs |
| Validate quality | Statistical Validation |
| Avoid pitfalls | Lessons Learned |
- Developers: Tutorial → Implementation
- Architects: Explanation → Architecture
- Data Engineers: Statistical Validation
- Managers: Cost Analysis
This pattern is ideal for:
- User-submitted form data (names, addresses, cities, companies)
- Data quality improvement for analytics/reporting
- LLM input preparation for downstream AI processes
- Compliance scenarios requiring audit trails
| Approach | Cost per 1K records | Notes |
|---|---|---|
| Manual data entry ($15/hr) | $75.00 | 5 min per record |
| Rule-based ETL | $0.00 | Weeks of engineering |
| Claude 3.5 Sonnet (LLM only) | $1.20 | 15x more expensive |
| This pattern (Haiku + rules) | $0.07 | Best cost/quality ratio |
- AWS Lambda (Node.js 22.x)
- AWS Bedrock (Claude 3 Haiku)
- DynamoDB (pay-per-request)
- EventBridge (scheduled triggers)
- AWS SAM (Infrastructure as Code)
Contributions are welcome! Please read CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
Gabriel Isaías Ramírez Melgarejo AWS Community Hero | Founder, Bootcamp Institute
- GitHub: @gabanox
- LinkedIn: Gabriel Ramírez
- Twitter/X: @gabanox_
⭐ If you find this pattern useful, please star the repo!
