Skip to content

A production-ready serverless pattern for intelligent data normalization using Claude Haiku via AWS Bedrock

License

Notifications You must be signed in to change notification settings

gabanox/llm-data-normalization-pattern

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-Powered Data Normalization Pattern

LLM Data Normalization Banner

License: MIT AWS Claude

A production-ready serverless pattern for intelligent data normalization using Claude Haiku via AWS Bedrock

English | Español


What is this?

This pattern combines LLM-based normalization with statistical validation and regex post-processing to achieve high-quality data cleansing at ultra-low cost.

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Messy Input    │────▶│  Claude Haiku   │────▶│  Clean Output   │
│  "CRA 15 #100"  │     │  (via Bedrock)  │     │  "Cra. 15 #100" │
│  "BOGOTA"       │     │                 │     │  "Bogotá D.C."  │
│  "ing sistemas" │     │  + Post-process │     │  "Ing. Sistemas"│
└─────────────────┘     └─────────────────┘     └─────────────────┘

Key Innovation

Dual-layer architecture that combines:

  1. LLM intelligence for context-aware normalization
  2. Regex post-processing to catch LLM inconsistencies
  3. Statistical validation with 95% confidence intervals to detect quality drift

Production Results

Metric Value
Records processed 652 leads
Fields normalized 4,280
Improvement rate 70.4%
Coverage 99.2%
Cost per 1K records $0.07
Bug detection Caught systematic "double-dot" bug via statistical analysis

Quick Start

# Clone the repo
git clone https://github.com/gabanox/llm-data-normalization-pattern.git
cd llm-data-normalization-pattern

# Follow the 90-minute tutorial
open docs/en/TUTORIAL.md

Architecture

┌────────────────────┐
│  EventBridge       │──▶ Daily at 2 AM
│  Scheduled Rule    │
└─────────┬──────────┘
          │
          ▼
┌─────────────────────────────────────────────────┐
│         Normalize Leads Lambda                  │
│  ┌───────────────────────────────────────────┐  │
│  │ 1. Query leads needing normalization      │  │
│  │ 2. Generate field-specific prompts        │  │
│  │ 3. Call Claude Haiku via Bedrock          │  │
│  │ 4. Parse JSON response                    │  │
│  │ 5. Apply post-processing regex pipeline   │  │ ◀─ Self-healing
│  │ 6. Store in normalizedData attribute      │  │
│  │ 7. Track metrics (coverage, improvements) │  │
│  └───────────────────────────────────────────┘  │
└────────┬──────────────────────────┬─────────────┘
         │                          │
         ▼                          ▼
┌──────────────────┐      ┌─────────────────────┐
│   DynamoDB       │      │   AWS Bedrock       │
│   leads table    │      │   Claude 3 Haiku    │
└──────────────────┘      └─────────────────────┘

Documentation

By Goal

Goal Document
Understand the pattern READMEArchitecture
Implement it yourself Tutorial ⭐ → Implementation
Understand the "why" Explanation docs
Validate quality Statistical Validation
Avoid pitfalls Lessons Learned

By Role

Use Cases

This pattern is ideal for:

  • User-submitted form data (names, addresses, cities, companies)
  • Data quality improvement for analytics/reporting
  • LLM input preparation for downstream AI processes
  • Compliance scenarios requiring audit trails

Cost Comparison

Approach Cost per 1K records Notes
Manual data entry ($15/hr) $75.00 5 min per record
Rule-based ETL $0.00 Weeks of engineering
Claude 3.5 Sonnet (LLM only) $1.20 15x more expensive
This pattern (Haiku + rules) $0.07 Best cost/quality ratio

Tech Stack

  • AWS Lambda (Node.js 22.x)
  • AWS Bedrock (Claude 3 Haiku)
  • DynamoDB (pay-per-request)
  • EventBridge (scheduled triggers)
  • AWS SAM (Infrastructure as Code)

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Gabriel Isaías Ramírez Melgarejo AWS Community Hero | Founder, Bootcamp Institute


⭐ If you find this pattern useful, please star the repo!

About

A production-ready serverless pattern for intelligent data normalization using Claude Haiku via AWS Bedrock

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages