Skip to content

Feature request: Add DataMasking utility for encrypting and masking sensitive data #4960

@walmsles

Description

@walmsles

Use case

AWS Lambda functions often process sensitive data (PII, credentials, financial information) that needs to be protected when:

  • Storing data in logs, databases, or checkpoints (e.g., AWS Lambda Durable Functions)
  • Passing data between services
  • Meeting compliance requirements (GDPR, HIPAA, PCI-DSS)

Current challenges:

  1. No built-in masking utilities - Developers manually implement masking logic, leading to inconsistent approaches
  2. Full payload encryption loses visibility - Encrypting entire objects makes debugging, querying, and monitoring impossible
  3. Field-level encryption is complex - Requires significant boilerplate to encrypt specific fields while preserving structure
  4. No integration with AWS Encryption SDK - Developers must configure envelope encryption, key caching, and KMS integration from scratch

Real-world scenarios:

  • Durable Functions checkpointing: Store workflow state with encrypted PII while keeping workflow metadata visible for queries
  • Multi-tenant applications: Encrypt customer data with tenant-specific encryption context for isolation
  • Compliance logging: Mask sensitive fields before logging while preserving log structure for analysis
  • API responses: Redact sensitive fields before returning data to clients

Powertools for Python has a DataMasking utility that solves these problems. TypeScript needs equivalent functionality.

Solution/User Experience

Add a new @aws-lambda-powertools/data-masking package that provides:

1. Core DataMasking Class

import { DataMasking } from '@aws-lambda-powertools/data-masking';
import { AWSEncryptionSDKProvider } from '@aws-lambda-powertools/data-masking/providers';

// Initialize with encryption provider
const provider = new AWSEncryptionSDKProvider({ keys: [KMS_KEY_ARN] });
const dataMasker = new DataMasking({ provider });

2. Three Primary Operations

Erase (Irreversible Masking):

const masked = dataMasker.erase(data, { 
  fields: ['email', 'address.street', 'customer.ssn'] 
});

// Result:
// {
//   "email": "*****",
//   "address": { "street": "*****", "city": "Anytown" },
//   "customer": { "ssn": "*****" }
// }

Encrypt (Full Payload):

const encrypted = await dataMasker.encrypt(data, {
  context: { tenantId: 'acme-corp', dataType: 'pii' }
});
// Returns: base64 encrypted string

Encrypt (Field-Level with Structure Preservation):

const encrypted = await dataMasker.encrypt(data, {
  fields: ['customer.ssn', 'payment.creditCard'],
  context: { tenantId: 'acme-corp' }
});

// Result:
// {
//   "orderId": "12345",  // Visible for queries
//   "customer": {
//     "name": "John",  // Visible
//     "ssn": { "__encrypted": "customer.ssn" }  // Placeholder
//   },
//   "payment": {
//     "creditCard": { "__encrypted": "payment.creditCard" },
//     "amount": 99.99  // Visible
//   },
//   "__powertools_encrypted_data": "AQICAHh8...",  // Encrypted blob
//   "__powertools_encryption_context": { "tenantId": "acme-corp" }
// }

Decrypt:

const decrypted = await dataMasker.decrypt(encrypted);
// Automatically detects format (full or field-level) and restores original data

3. Advanced Masking Options

const masked = dataMasker.erase(data, {
  maskingRules: {
    email: { 
      regexPattern: /(.)(.*)(@.*)/, 
      maskFormat: '$1****$3'  // j****@example.com
    },
    age: { dynamicMask: true },  // Maintains length: "30" -> "**"
    'address.zip': { customMask: 'XXXXX' }
  }
});

4. Lambda Handler Integration

import { Logger } from '@aws-lambda-powertools/logger';
import { DataMasking } from '@aws-lambda-powertools/data-masking';
import { AWSEncryptionSDKProvider } from '@aws-lambda-powertools/data-masking/providers';

const logger = new Logger();
const provider = new AWSEncryptionSDKProvider({ keys: [process.env.KMS_KEY_ARN!] });
const dataMasker = new DataMasking({ provider });

export const handler = async (event: any) => {
  const orderData = event.body;
  
  // Mask before logging
  logger.info('Processing order', { 
    order: dataMasker.erase(orderData, { fields: ['creditCard'] })
  });
  
  // Encrypt for storage
  const encrypted = await dataMasker.encrypt(orderData, {
    fields: ['customer.ssn', 'payment.creditCard'],
    context: { orderId: orderData.orderId }
  });
  
  await dynamodb.put({ Item: encrypted });
  
  return { statusCode: 200 };
};

5. Durable Functions Integration (Future)

// Custom SerDes for encrypted checkpoints (when SDK supports it)
class EncryptedFieldsSerDes implements SerDes<any> {
  constructor(
    private fields: string[],
    private dataMasker: DataMasking
  ) {}
  
  async serialize(value: any): Promise<string> {
    const encrypted = await this.dataMasker.encrypt(value, { 
      fields: this.fields 
    });
    return JSON.stringify(encrypted);
  }
  
  async deserialize(data: string): Promise<any> {
    const encrypted = JSON.parse(data);
    return this.dataMasker.decrypt(encrypted);
  }
}

6. Provider System

Support AWS Encryption SDK with KMS:

const provider = new AWSEncryptionSDKProvider({
  keys: [KMS_KEY_ARN],
  // Optional: Configure caching
  localCacheCapacity: 100,
  maxCacheAgeSeconds: 300,
  maxMessagesEncrypted: 4294967296,
  maxBytesEncrypted: 9223372036854775807
});

7. Key Features

  • JSONPath field selection: 'customer.ssn', 'orders[*].payment', '$.items[?(@.price > 100)]'
  • Encryption context: Additional authenticated data for KMS operations
  • Automatic format detection: Decrypt handles both full and field-level formats
  • Batched encryption: Single KMS call encrypts all specified fields together
  • Type safety: Full TypeScript types for all operations
  • AWS Encryption SDK integration: Envelope encryption with data key caching
  • Structure preservation: Non-sensitive fields remain queryable/debuggable

8. API Parity with Python

Match the existing Python DataMasking API (with TypeScript idioms):

  • erase(data, options)data_masker.erase(data, fields=[])
  • encrypt(data, options)data_masker.encrypt(data, fields=[])
  • decrypt(data)data_masker.decrypt(data)
  • Encryption providers → Same provider system
  • Field selection → Same JSONPath syntax

Alternative solutions

### 1. Manual Implementation

Developers can manually:

- Traverse objects to mask fields
- Configure AWS Encryption SDK directly
- Implement custom serialization logic

**Drawbacks:**

- Significant boilerplate (100+ lines per use case)
- Inconsistent implementations across teams
- Error-prone field traversal
- No reusable patterns

### 2. Third-party Libraries

Generic encryption libraries exist but:

- Not designed for Lambda/serverless patterns
- No AWS Encryption SDK integration
- No KMS key caching optimization
- Missing field-level encryption with structure preservation

### 3. AWS Services

- **AWS Secrets Manager**: For static secrets, not dynamic data masking
- **Amazon Macie**: For data discovery/classification, not runtime masking
- **KMS Encryption SDK**: Low-level, requires significant integration work

### 4. Redaction-only Solutions

Libraries like `redact-pii` provide masking but:

- No encryption support
- No reversible operations
- Limited field selection
- No AWS integration

**Why a Powertools utility is better:**

- ✅ Integrated with Powertools ecosystem (Logger, Tracer, Metrics)
- ✅ Optimized for Lambda (cold start, memory, caching)
- ✅ Follows Powertools patterns and conventions
- ✅ Tested, documented, and maintained by AWS
- ✅ Type-safe with full TypeScript support
- ✅ Parity with Python implementation

Acknowledgment

Future readers

Please react with 👍 and your use case to help us understand customer demand.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestThis item refers to a feature request for an existing or new utilityneed-customer-feedbackRequires more customers feedback before making or revisiting a decision

    Type

    No type

    Projects

    Status

    Ideas

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions