This project demonstrates a data pipeline that extracts data from a CSV file and loads it into an Oracle database table using Apache Spark.
- Apache Spark
- Python 3
- Oracle JDBC Driver
- Oracle Database
-
Install dependencies:
pip install -r requirements.txt
-
Download the Oracle JDBC Driver:
Download the
ojdbc8.jarfile and place it in a known directory. Update the path in theload_data.pyscript. -
Update the script:
Update the Oracle connection details in
scripts/load_data.pywith your database credentials and connection string.
Navigate to the scripts directory and run the script:
cd scripts
python load_data.py