data_engineering_projects/README.md at master · bclipp/data_engineering_projects · GitHub

83 lines (47 loc) · 2.12 KB

included in the artic code vault

Data Engineering Projects

Level's of Skill for each Project

Blue :

All code follows best practices and was run through a linter
Classes and Functions are used when appropriate
Project is organized into modules logically

Purple:

code is testable without interacting with external dependencies
code is tests with reasonable code coverage.
code has intergration tests for external dependencies.
project uses has tests for all the infrastructure
project is a package

Brown :

project uses infrastructure as code (terraform or Cloudwatch)
project uses Docker
code uses fakes (mocks and stubs)
project uses a CI/CD process using something like Jenkins
project uses concurrency when appropriate

Projects

Core :
Merge pipeline DB and API
Streaming PostGresql CDC to S3

Data modeling and Datawarehouse
Data Modeling in PostgreSQL
Data Warehouse(Redshift , Snowflake or postgresql)

Automating Data Pipeline
Automate DataPipeline (airflow or Jenkins)

Moving Data :
REST API crud app (restaurant)
Grps crud app (Library)
AWS Lambda microservices (blockbuster)

Streaming :
Kafka Project
Spark Streaming not structured streaming

...

Big Data :
Spark DataLake
Spark Delta lake ...

DBs:
Data Modeling in Cassandra
Data Modeling in MongoDB
Data Modeling in Elasticsearch
Redis In Memory DB
Globally distributed Database