Skip to content

A containerized AWS Lambda data ingestion pipeline with automated CI/CD using GitHub and AWS CodeBuild, extracting data from the Calendly API and storing it in Amazon S3

Notifications You must be signed in to change notification settings

johnathon-smith/containerized-lambda-cicd-data-ingestion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Containerized AWS Lambda Data Pipeline with CI/CD

A production-minded, serverless data ingestion pipeline that uses a containerized AWS Lambda function to extract meeting data from the Calendly API and store it in Amazon S3, with continuous deployment implemented via GitHub + AWS CodeBuild.

This project focuses on modern data engineering and DevOps patterns including Dockerized Lambda functions, secrets management, IAM permissions, and automated CI/CD workflows.


📌 Project Overview

The Lambda function is packaged as a Docker container and deployed using an automated CI/CD pipeline. On execution, the function retrieves meeting data from the Calendly API and writes the output to an S3 bucket.

The infrastructure was intentionally designed to be fully reproducible, secure, and cost-efficient, and was later torn down after validation to avoid unnecessary cloud spend.


🏗 Architecture Overview

The pipeline follows a serverless, event-driven architecture:

  1. AWS Lambda runs as a containerized function
  2. Lambda retrieves meeting data from the Calendly API
  3. Data is written to Amazon S3
  4. CI/CD pipeline rebuilds and redeploys the container image on every GitHub push

Architecture Diagram

Architecture Diagram


🧰 Technology Stack

  • AWS Lambda (Container Image)
  • Docker
  • Amazon ECR
  • Amazon S3
  • AWS CodeBuild
  • AWS Secrets Manager
  • IAM
  • CloudWatch Logs
  • GitHub

🔄 Data Flow

  1. Lambda function is invoked manually or programmatically
  2. Lambda retrieves API credentials from AWS Secrets Manager
  3. Calendly API is queried for meeting data
  4. Data is serialized and written to Amazon S3
  5. Logs and execution metadata are captured in CloudWatch

⚙️ Implementation Details

Lambda Function

  • Written in Python
  • Uses environment variables configured in AWS
  • Packaged using Docker with the AWS Lambda Python 3.9 base image
  • Runs in the default Lambda working directory (/var/task)

Containerization

  • Dependencies installed via requirements.txt
  • Uses AWS-provided Lambda runtime image
  • Entry point and handler configured for Lambda execution

Storage

  • Amazon S3 used as the persistent storage layer
  • Files written with timestamps to validate execution

🚀 CI/CD Pipeline

The project includes a fully automated CI/CD workflow:

  1. Source control managed in GitHub
  2. AWS CodeBuild configured with a webhook trigger
  3. Every PUSH or merged PULL REQUEST triggers:
    • Docker image build
    • Image push to Amazon ECR
    • Please Note: Lambda does not automatically use the new image and would have to be manually updated to point to the new image. However, this process CAN be automated and may be a future project enhancement.

This ensures the latest version of the container is always available in ECR and can be used in the Lambda function.


🔐 Security & Permissions

  • Secrets Management

    • Calendly API credentials stored in AWS Secrets Manager
    • No secrets hardcoded in the repository
  • IAM Roles

    • Lambda execution role grants least-privilege access to:
      • Amazon S3
      • Amazon ECR
      • CloudWatch Logs
    • CodeBuild role allows container builds and image pushes

✅ Verification & Screenshots

The following screenshots provide evidence of successful implementation and execution.

Lambda Function Configuration

Lambda Function

Data Successfully Written to S3

S3 Output

CI/CD Pipeline Execution (CodeBuild)

CodeBuild History

Container Image in Amazon ECR

ECR Repository


📁 Repository Structure

├── lambda_function.py
├── requirements.txt
├── commands.sh
├── Dockerfile
├── buildspec.yml
├── screenshots/
│ ├── architecture/
│ ├── lambda/
│ ├── s3/
│ ├── cicd/
│ └── ecr/
└── README.md


🏁 Project Status

✔ Project completed
✔ CI/CD validated
✔ Data ingestion verified
✔ Infrastructure safely decommissioned to avoid ongoing AWS costs

All screenshots and documentation are preserved to demonstrate the full implementation and execution of the system.


📌 Key Takeaways

  • Demonstrates real-world AWS Lambda container usage
  • Shows end-to-end CI/CD automation
  • Emphasizes security and best practices
  • Designed with cost awareness and reproducibility in mind

About

A containerized AWS Lambda data ingestion pipeline with automated CI/CD using GitHub and AWS CodeBuild, extracting data from the Calendly API and storing it in Amazon S3

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published