📊 Loan Application Analysis using PySpark

📌 Description

This repository contains an end-to-end analysis of loan applications using PySpark. It includes data manipulation, feature engineering, and binary classification models.

📚 Table of Contents

📋 Data Overview
📁 Project Structure
🔨 Usage

📋 Data Overview

The data for this project is sourced from the Kaggle competition Home Credit Default Risk. The goal of the competition is to predict the capability of each applicant in repaying a loan.

application_train.csv

Number of Entries: 307,511
Number of Columns: 122
Column Types: Float64(65), Int64(41), Object(16)

📄 Sample Data

SK_ID_CURR	TARGET	NAME_CONTRACT_TYPE	CODE_GENDER	FLAG_OWN_CAR	...
100002	1	Cash loans	M	N	...
100003	0	Cash loans	F	N	...
100004	0	Revolving loans	M	Y	...
100006	0	Cash loans	F	N	...
100007	0	Cash loans	M	N	...

📁 Project Structure

📓 Notebooks

Income-Spark.ipynb: Main Jupyter Notebook for the project.

📝 Sections in Notebook

Import Essential Libraries: Libraries like os and pandas are imported.
Initialize PySpark Configuration: The Spark Configuration and Context are initialized.
Import PySpark and Initialize: PySpark library is imported and Spark Session is initialized.

💻 Code Snippets

Importing essential libraries
```
import os
import pandas as pd
```

Initializing PySpark Configuration

from pyspark import SparkConf, SparkContext

Initializing Spark Session

import pyspark
from pyspark.sql import SparkSession

🔨 Usage

To run the Jupyter Notebook, execute:

jupyter notebook Loan-Application-PySpark.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Loan-Application-PySpark.ipynb		Loan-Application-PySpark.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Loan Application Analysis using PySpark

📌 Description

📚 Table of Contents

📋 Data Overview

application_train.csv

📄 Sample Data

📁 Project Structure

📓 Notebooks

📝 Sections in Notebook

💻 Code Snippets

🔨 Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📊 Loan Application Analysis using PySpark

📌 Description

📚 Table of Contents

📋 Data Overview

application_train.csv

📄 Sample Data

📁 Project Structure

📓 Notebooks

📝 Sections in Notebook

💻 Code Snippets

🔨 Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages