Skip to content

An end-to-end data science pipeline project that includes data processing, model training, evaluation, and modular configuration. Built for clarity, scalability, and experimentation, with a structured folder system and integrated logging.

License

Notifications You must be signed in to change notification settings

omisdami/DSProject

Repository files navigation

End to End Data Science Project

Workflows- ML Pipeleine

=======

End to End Data Science Project

This project is an end-to-end data science pipeline that includes data processing, model training, and evaluation. The
project is structured to facilitate easy understanding and modification.

Project Structure

  • config/: Contains configuration files for the project.

    • config.yaml: Main configuration file.
  • src/DS/: Main source directory for the project.

    • components/: Contains the core components of the data processing and model pipeline.
    • config/: Configuration-related modules.
    • constants/: Defines constant values used across the project.
    • entity/: Contains entity definitions and configurations.
    • pipeline/: Manages the execution of the data science pipeline.
    • utils/: Utility functions for common tasks such as reading and writing files.
  • research/: Contains Jupyter notebooks for exploratory data analysis and research.

  • templates/: HTML templates for any web-based components.

Key Features

  • Data Processing: Includes utilities for reading and writing YAML and JSON files, and managing directories.
  • Model Training: Configurable pipeline for training machine learning models.
  • Logging: Integrated logging for tracking the execution of the pipeline.

Getting Started

  1. Install Dependencies: Use the requirements.txt to install necessary Python packages.
    pip install -r requirements.txt                                                                                           
                                                                                                                              
    

2 Run the Pipeline: Execute the main script to start the data science pipeline.

python main.py

3 Explore Notebooks: Use the Jupyter notebooks in the research/ directory for data exploration and model evaluation.

About

An end-to-end data science pipeline project that includes data processing, model training, evaluation, and modular configuration. Built for clarity, scalability, and experimentation, with a structured folder system and integrated logging.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published