❯ Palmify
Built with the tools and technologies:
This repository contains the source code for the okini-dataplatform-etl project. The project is designed to provide a comprehensive ETL pipeline for the Okini Data Platform.
Resources (Click to view the document of resources)
- Calendarific: Public holiday of Japan, Korea, Hong Kong and Taiwan getting from Wiki
- Booking.com: Hotel data (numba location)
- Kyoceradome: Events of Osaka
Before getting started with okini-dataplatform-etl, ensure your runtime environment meets the following requirements:
- Programming Language: Python 3.11
- Package Manager: Poetry or Pip
- Container Runtime: Docker
Currently, project use OAuth2 to authenticate for BigQuery API. Initially, if there is no file named assets/token.pickle, the application will send an url and I need to copy this url and login, then new file assets/token.pickle will be created.
Install okini-dataplatform-etl using one of the following methods:
Build from source:
- Clone the okini-dataplatform-etl repository:
❯ https://github.com/palmify/okini-dataplatform-etl- Navigate to the project directory:
❯ cd okini-dataplatform-etl-
Copy the
.envand.env.prodto the root directory of the project. -
Run all the deps using
docker-compose:
❯ docker compose up -d** Build images on local machine:**
docker build -t okini-dataplatform-etl -f docker/Dockerfile .
docker tag okini-dataplatform-etl:latest louispalmify/development:okini-dataplatform-etl
docker push louispalmify/development:okini-dataplatform-etl- Next, we need to go to the GUI of Object Storage Minio (
http://localhost:9001) to create values forS3_ACCESS_KEY_IDandS3_SECRET_ACCESS_KEY:
-
Copy the created value
S3_ACCESS_KEY_IDandS3_SECRET_ACCESS_KEYto the.env.prodfile. Leave.envas it is. -
Run
docker compose downto stop the running containers, but not remove the volume. -
We need to remove the builded image of the
okini-dataplatform-etl-dagster-apprepo:
❯ docker rmi <IMAGE_ID>-
Now run the
docker compose up -dagain to start the containers with updated.env.prodfile. -
Go to the Daster dashboard at
http://localhost:3124, tabJoband turn on the job schedulers.
- Please help me to forward these urls to domains or something else so that I can access them to track the logs of data crawler:
http://localhost:9001 // Minio
http://localhost:5601 // Kibana
http://localhost:3124 // Dagster Webserver
http://localhost:5900 // VNC Viewer

