Skip to content

Latest commit

 

History

History
83 lines (47 loc) · 2.12 KB

File metadata and controls

83 lines (47 loc) · 2.12 KB

included in the artic code vault

Data Engineering Projects

Level's of Skill for each Project

Blue :

  • All code follows best practices and was run through a linter

  • Classes and Functions are used when appropriate

  • Project is organized into modules logically

Purple:

  • code is testable without interacting with external dependencies

  • code is tests with reasonable code coverage.

  • code has intergration tests for external dependencies.

  • project uses has tests for all the infrastructure

  • project is a package

Brown :

  • project uses infrastructure as code (terraform or Cloudwatch)

  • project uses Docker

  • code uses fakes (mocks and stubs)

  • project uses a CI/CD process using something like Jenkins

  • project uses concurrency when appropriate

Projects

Core :
Merge pipeline DB and API
Streaming PostGresql CDC to S3

Data modeling and Datawarehouse
Data Modeling in PostgreSQL
Data Warehouse(Redshift , Snowflake or postgresql)

Automating Data Pipeline
Automate DataPipeline (airflow or Jenkins)

Moving Data :
REST API crud app (restaurant)
Grps crud app (Library)
AWS Lambda microservices (blockbuster)

Streaming :
Kafka Project
Spark Streaming not structured streaming

...

Big Data :
Spark DataLake
Spark Delta lake ...

DBs:
Data Modeling in Cassandra
Data Modeling in MongoDB
Data Modeling in Elasticsearch
Redis In Memory DB
Globally distributed Database