Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
2616186
DVC
ManoharVit Nov 11, 2024
7604ef1
Track data directory with DVC
ManoharVit Nov 11, 2024
58f26e1
DVC
ManoharVit Nov 11, 2024
6c8db3a
DVC
ManoharVit Nov 11, 2024
d17a328
add to DVC
ManoharVit Nov 12, 2024
e9ca462
Clean up non-existent DVC references and unnecessary files
ManoharVit Nov 12, 2024
a140926
Clean up non-existent DVC references and unnecessary files
ManoharVit Nov 12, 2024
c470b6b
Add files
ManoharVit Nov 12, 2024
47f900e
Stop tracking merged_original_dataset.csv in Git
ManoharVit Nov 13, 2024
155f397
Models
ManoharVit Nov 13, 2024
f5b2eea
Merge pull request #28 from IE7374-MachineLearningOperations/Manohar
ManoharVit Nov 13, 2024
8ead3f4
add models
khanhvynguyen Nov 15, 2024
794c58d
add service_key_gcs to gitignore
khanhvynguyen Nov 15, 2024
666e450
remove service_key_gcs.json
khanhvynguyen Nov 15, 2024
308b845
Merge branch 'vy'
khanhvynguyen Nov 15, 2024
7bb28cd
Add Cloud Build configuration
ManoharVit Nov 15, 2024
37495b5
Merge pull request #29 from IE7374-MachineLearningOperations/Manohar
ManoharVit Nov 15, 2024
d8b243f
update model pipeline, wandb, bias detection
khanhvynguyen Nov 16, 2024
03a35a4
Merge remote-tracking branch 'origin/vy'
khanhvynguyen Nov 16, 2024
e4383a2
Model Dev
ManoharVit Nov 16, 2024
096b179
Model
ManoharVit Nov 16, 2024
e887dac
Model Dev
ManoharVit Nov 16, 2024
1e36981
Merge pull request #30 from IE7374-MachineLearningOperations/Manohar
ManoharVit Nov 16, 2024
8eef468
Model Dev
ManoharVit Nov 16, 2024
4b1436a
Update
ManoharVit Nov 18, 2024
9477eac
Update
ManoharVit Nov 18, 2024
3d86bb8
Merge pull request #31 from IE7374-MachineLearningOperations/Manohar
ManoharVit Nov 18, 2024
68b0f6a
update upload_blob
khanhvynguyen Nov 21, 2024
0d98bd1
update gitignore
khanhvynguyen Nov 21, 2024
7422fec
update requirements.txt
khanhvynguyen Nov 21, 2024
2e14000
switch ci/cd to citest
khanhvynguyen Nov 21, 2024
b6d49c4
fix path for step 4: run tests
khanhvynguyen Nov 21, 2024
9164955
fix GCS credentials
khanhvynguyen Nov 21, 2024
105043d
fix GCS key requires
khanhvynguyen Nov 21, 2024
72d8017
fix storage.Client()
khanhvynguyen Nov 21, 2024
65dd40a
fix global variable
khanhvynguyen Nov 21, 2024
c09263c
fix test_pca
khanhvynguyen Nov 21, 2024
5473f50
update test_pca
khanhvynguyen Nov 21, 2024
3ae6eaa
update test_pca
khanhvynguyen Nov 21, 2024
97bc76e
update scaler function
khanhvynguyen Nov 21, 2024
a236e5a
update test scaler function
khanhvynguyen Nov 21, 2024
e549a24
remove fixture
khanhvynguyen Nov 21, 2024
e4c36f0
fix technical_report
khanhvynguyen Nov 21, 2024
bdc6b2e
fix add_technical_indicators_constant_price
khanhvynguyen Nov 21, 2024
9e46c78
Merge branch 'citest'
khanhvynguyen Nov 21, 2024
ea6fb8c
update pytests in model.yml
khanhvynguyen Nov 21, 2024
79d8c7d
update service acc key in model.yml
khanhvynguyen Nov 21, 2024
2abdea9
update model.yml
khanhvynguyen Nov 21, 2024
7f66c29
CI
ManoharVit Nov 23, 2024
20911f9
Merge branch 'main' into Manohar
ManoharVit Nov 23, 2024
d3aa745
Merge pull request #32 from IE7374-MachineLearningOperations/Manohar
ManoharVit Nov 23, 2024
badedeb
CI
ManoharVit Nov 25, 2024
ec5b7f7
ci
ManoharVit Nov 25, 2024
7cd5ba0
Update .gitignore
ManoharVit Nov 25, 2024
9c9ab71
-
ManoharVit Nov 27, 2024
8659633
Merge pull request #35 from IE7374-MachineLearningOperations/Manohar
ManoharVit Nov 27, 2024
efd97de
Merge branch 'samarth' into main
Sampy13 Dec 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
/cache
/config.local
/tmp
/cache
2 changes: 2 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@
autostage = true
['remote "myremote"']
url = gs://stock_price_prediction_dataset/DVC
['remote "gcs_remote"']
url = gs://stock_price_prediction_dataset/Data
74 changes: 74 additions & 0 deletions .github/workflows/PyTest.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: Python Test Workflow

on:
push:
branches:
- main
- citest

jobs:
test:
runs-on: ubuntu-latest

steps:
# Step 1: Checkout the repository
- name: Checkout repository
uses: actions/checkout@v2

# Step 2: Set up Python environment
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

# Step 3: Install dependencies
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r pipeline/requirements.txt

# Step 4: Run tests and generate coverage report
- name: Run tests
run: |
pip install coverage
coverage run -m pytest pipeline/airflow/tests --maxfail=1 --disable-warnings
coverage html

# Step 5: Upload coverage report as artifact
- name: Upload coverage report
if: always()
uses: actions/upload-artifact@v4
with:
name: coverage-report
path: htmlcov/

# Step 6: Install Google Cloud CLI
- name: Install Google Cloud CLI
run: |
sudo apt-get update
sudo apt-get install -y google-cloud-cli

# Step 7: Decode GCP Service Account Key
- name: Decode and Write GCP Service Account Key
run: |
echo "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}" | base64 -d > /tmp/gcp-key.json
shell: bash

# Step 8: Authenticate with GCP using the Temporary File
- name: Authenticate with GCP
env:
GOOGLE_APPLICATION_CREDENTIALS: /tmp/gcp-key.json
run: gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS}

# Step 9: Set GCP Project ID
- name: Set GCP Project ID
run: gcloud config set project ${{ secrets.GCP_PROJECT_ID }}

# Step 10: Upload coverage report to GCP bucket
- name: Upload coverage report to GCP bucket
env:
GCS_BUCKET_NAME: ${{ secrets.GCS_BUCKET_NAME }}
run: |
CURRENT_DATE=$(date +"%Y-%m-%d")
zip -r coverage-report.zip htmlcov/
gsutil -m cp coverage-report.zip gs://${GCS_BUCKET_NAME}/Pytest-reports/${CURRENT_DATE}/coverage-report.zip
93 changes: 93 additions & 0 deletions .github/workflows/airflowtrigeer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
name: Trigger Airflow DAG Workflow

on:
push:
branches:
- main
- citest
workflow_dispatch:

jobs:
trigger_airflow_dag:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Set up Docker Compose
run: |
sudo apt-get update
sudo apt-get install -y docker-compose
- name: Initialize Airflow
working-directory: ./pipeline/airflow
run: |
docker-compose up airflow-init
- name: Start Airflow Services
working-directory: ./pipeline/airflow
run: |
docker-compose up -d
# # Step 6: Install Python packages inside Docker containers
# - name: Install Python Packages
# working-directory: ./pipeline/airflow
# run: |
# docker-compose exec -T airflow-scheduler python3 -m pip install -r /opt/airflow/dags/requirements.txt
# # docker-compose exec -T airflow-webserver python3 -m pip install -r /opt/airflow/dags/requirements.txt


# Step 6: Set permissions for Airflow logs inside the container
- name: Set permissions for Airflow logs
working-directory: ./pipeline/airflow
run: |
docker-compose exec -T --user root airflow-scheduler bash -c "chmod -R 777 /opt/airflow/logs/"
- name: Wait for Airflow to Initialize
working-directory: ./pipeline/airflow
run: |
timeout 300 bash -c 'until docker-compose exec -T airflow-webserver curl -f http://localhost:8080/health; do sleep 10; done'

# Step 9: Delete .pyc Files
- name: Delete .pyc Files
working-directory: ./pipeline/airflow
run: |
docker-compose exec -T airflow-scheduler find /opt/airflow -name \*.pyc -delete
docker-compose exec -T airflow-webserver find /opt/airflow -name \*.pyc -delete
- name: List DAG Import Errors
working-directory: ./pipeline/airflow
run: |
docker-compose exec -T airflow-scheduler airflow dags list-import-errors
- name: Show Airflow DAGs
working-directory: ./pipeline/airflow
run: |
docker-compose exec -T airflow-scheduler airflow dags list
- name: Trigger Airflow DAG
working-directory: ./pipeline/airflow
run: |
docker-compose exec -T airflow-scheduler airflow dags trigger -r manual_$(date +%Y%m%d%H%M%S) Group10_DataPipeline_MLOps
- name: Monitor DAG Execution
working-directory: ./pipeline/airflow
run: |
for i in {1..10}; do
STATUS=$(docker-compose exec -T airflow-scheduler airflow dags state Group10_DataPipeline_MLOps $(date +%Y-%m-%d))
echo "Current DAG status: $STATUS"
if [ "$STATUS" = "success" ]; then
echo "DAG completed successfully"
break
elif [ "$STATUS" = "failed" ]; then
echo "DAG failed"
exit 1
fi
sleep 60
done
- name: Stop Airflow Services
if: always()
working-directory: ./pipeline/airflow
run: docker-compose down --volumes --rmi all
78 changes: 78 additions & 0 deletions .github/workflows/cloudbuild.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# steps:
# # Step 1: Clone the repository
# - name: 'gcr.io/cloud-builders/git'
# entrypoint: 'bash'
# args:
# - '-c'
# - |
# git clone https://github.com/IE7374-MachineLearningOperations/StockPricePrediction.git &&
# cd StockPricePrediction &&
# echo "Repository cloned successfully"

# # Step 2: Upload specific files to GCP Bucket
# - name: 'gcr.io/cloud-builders/gsutil'
# args:
# - '-m'
# - 'cp'
# - '-r'
# - 'StockPricePrediction/*.py'
# - 'StockPricePrediction/*.ipynb'
# - 'StockPricePrediction/*.pkl'
# - 'gs://stock_price_prediction_dataset/'

# # Step 3: Install dependencies
# - name: 'gcr.io/cloud-builders/pip'
# args:
# - 'install'
# - '-r'
# - 'StockPricePrediction/requirements.txt'

# # # Step 4: Train the model
# # - name: 'gcr.io/cloud-builders/python'
# # args:
# # - 'StockPricePrediction/train.py'

# # # Step 5: Validate the model
# # - name: 'gcr.io/cloud-builders/python'
# # args:
# # - 'StockPricePrediction/validate.py'

# # # Step 6: Conditional deployment if validation is successful
# # - name: 'gcr.io/cloud-builders/bash'
# # id: 'Check Validation'
# # args:
# # - '-c'
# # - |
# # ACCURACY=$(python StockPricePrediction/validate.py --get_accuracy) &&
# # if (( $(echo "$ACCURACY > 0.70" | bc -l) )); then
# # echo "Model accuracy is sufficient, proceeding with deployment";
# # else
# # echo "Model accuracy is insufficient, stopping deployment";
# # exit 1;
# # fi

# # # Step 7: Save the trained model to GCP Bucket
# # - name: 'gcr.io/cloud-builders/gsutil'
# # args:
# # - 'cp'
# # - 'StockPricePrediction/models/*.h5'
# # - 'gs://stock_price_prediction_dataset/trained_models/'

# # # Step 8: Run Unit Tests
# # - name: 'gcr.io/cloud-builders/python'
# # args:
# # - '-m'
# # - 'unittest'
# # - 'discover'
# # - '-s'
# # - 'StockPricePrediction/tests'

# # artifacts:
# # objects:
# # location: 'gs://stock_price_prediction_dataset/artifacts/'
# # paths:
# # - 'StockPricePrediction/*.py'
# # - 'StockPricePrediction/*.ipynb'
# # - 'StockPricePrediction/*.h5'

# # timeout: '1200s'
85 changes: 85 additions & 0 deletions .github/workflows/model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
name: Model Training

on:
push:
branches:
- main
- citest

jobs:
train_model:
runs-on: ubuntu-latest

steps:
# Step 1: Checkout repository
- name: Checkout repository
uses: actions/checkout@v2

# Step 2: Set up Python
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10.5'

# Step 3: Install all dependencies
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r pipeline/requirements.txt

# Step 4: Run Py Tests
- name: Run Tests
run: |
pytest pipeline/airflow/tests --maxfail=1 --disable-warnings

# Step 5: Authenticate with GCP
- name: Authenticate to GCP
env:
GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
GCP_REGION: ${{ secrets.GOOGLE_CLOUD_REGION }}
GCP_SERVICE_ACCOUNT_KEY: ${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}
run: |
echo "${GCP_SERVICE_ACCOUNT_KEY}" | base64 --decode > ${HOME}/gcp-key.json
gcloud auth activate-service-account --key-file=${HOME}/gcp-key.json
gcloud config set project ${GCP_PROJECT_ID}
gcloud config set compute/region ${GCP_REGION}

# # Step 6: Conditional Model Training
# - name: Trigger Model Training
# env:
# GCP_BUCKET_NAME: ${{ secrets.GCP_BUCKET_NAME }}
# run: |
# if [ "${{ github.ref }}" == "refs/heads/main" ]; then
# echo "Running production model training on GCP"
# gcloud ai-platform jobs submit training model_training_$(date +%Y%m%d_%H%M%S) \
# --region ${GCP_REGION} \
# --module-name trainer.task \
# --package-path ./trainer \
# --python-version 3.10 \
# --runtime-version 2.5 \
# --job-dir gs://${GCP_BUCKET_NAME}/models/training_$(date +%Y%m%d_%H%M%S) \
# -- \
# --additional_training_args
# else
# echo "Running model training locally for testing"
# python trainer/task.py --test_data ./data/test_data.csv
# fi

# # Step 7: Save Training Logs as Artifacts
# - name: Upload Training Logs
# if: always()
# uses: actions/upload-artifact@v3
# with:
# name: training-logs
# path: logs/

# # Step 8: Send notification on failure
# - name: Notify on failure
# if: failure()
# uses: actions/github-script@v6
# with:
# script: |
# github.issues.createComment({
# issue_number: context.issue.number,
# body: 'Model training failed on `${{ github.ref }}` branch. Please check logs for details.'
# })
Loading
Loading