Skip to content

akashmalbari/SparkOracleDataPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spark to Oracle Data Pipeline

This project demonstrates a data pipeline that extracts data from a CSV file and loads it into an Oracle database table using Apache Spark.

Prerequisites

  • Apache Spark
  • Python 3
  • Oracle JDBC Driver
  • Oracle Database

Setup

  1. Install dependencies:

    pip install -r requirements.txt
  2. Download the Oracle JDBC Driver:

    Download the ojdbc8.jar file and place it in a known directory. Update the path in the load_data.py script.

  3. Update the script:

    Update the Oracle connection details in scripts/load_data.py with your database credentials and connection string.

Running the Pipeline

Navigate to the scripts directory and run the script:

cd scripts
python load_data.py

About

This project demonstrates a data pipeline that extracts data from a CSV file and loads it into an Oracle database table using Apache Spark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages