Skip to content

ehsanestaji/Loan-Application-Analysis-using-PySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Loan Application Analysis using PySpark

πŸ“Œ Description

This repository contains an end-to-end analysis of loan applications using PySpark. It includes data manipulation, feature engineering, and binary classification models.

πŸ“š Table of Contents

  1. πŸ“‹ Data Overview
  2. πŸ“ Project Structure
  3. πŸ”¨ Usage

πŸ“‹ Data Overview

The data for this project is sourced from the Kaggle competition Home Credit Default Risk. The goal of the competition is to predict the capability of each applicant in repaying a loan.

application_train.csv

  • Number of Entries: 307,511
  • Number of Columns: 122
  • Column Types: Float64(65), Int64(41), Object(16)

πŸ“„ Sample Data

SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR ...
100002 1 Cash loans M N ...
100003 0 Cash loans F N ...
100004 0 Revolving loans M Y ...
100006 0 Cash loans F N ...
100007 0 Cash loans M N ...

πŸ“ Project Structure

πŸ““ Notebooks

  • Income-Spark.ipynb: Main Jupyter Notebook for the project.

πŸ“ Sections in Notebook

  1. Import Essential Libraries: Libraries like os and pandas are imported.
  2. Initialize PySpark Configuration: The Spark Configuration and Context are initialized.
  3. Import PySpark and Initialize: PySpark library is imported and Spark Session is initialized.

πŸ’» Code Snippets

  • Importing essential libraries

    import os
    import pandas as pd
  • Initializing PySpark Configuration

    from pyspark import SparkConf, SparkContext
  • Initializing Spark Session

    import pyspark
    from pyspark.sql import SparkSession

πŸ”¨ Usage

To run the Jupyter Notebook, execute:

jupyter notebook Loan-Application-PySpark.ipynb

About

This repository contains an end-to-end analysis of loan applications using PySpark. It includes data manipulation, feature engineering, and binary classification models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors