Skip to content

ChunkyTortoise/finance-analytics-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

119 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finance Analytics Portfolio

Finance Analytics Portfolio

Cayman Roden · Data Analyst | Credit Risk, Market Analysis & BI Deliverables

Tableau Dashboard Power BI BigQuery LookML Evidence Report Excel Executive PDF

Live Demo CI Tests Python License


SQL and Python analysis on real 2022 CFPB HMDA mortgage data (1.2M+ rows) covering credit risk, fair lending, and financial risk - findings delivered as a 25-page interactive dashboard, written business memos, 8-tab Excel workbook, and 11 SQL query templates.

Three findings that would change a lender's decisions today:

  1. DTI ratio, not FICO, drives default among prime borrowers (720+ FICO) - tightening DTI thresholds reduces expected losses by $600K-$900K per $100M with no impact on approval volume
  2. Geographic HMDA analysis surfaces county-level fair lending disparities with $5-15M estimated regulatory penalty exposure
  3. Parametric VaR understates tail risk by 12-18% - GARCH volatility forecasts reveal $12-18M in unhedged exposure per $1B equity portfolio

Live Dashboard - no sign-up required

Credentials: 21 professional certifications · 1,831 training hours · 127 courses — view breakdown on the Credentials page


For Hiring Managers

If you're hiring a... Start here
Data Analyst SQL Case Study · Advanced SQL Patterns · BigQuery SQL Layer · Tableau Dashboard · HMDA Fair Lending Memo · Narrative Notebook
Business Analyst Business Requirements Doc · Data Model ERD · Case Study Memo · Data Dictionary · HMDA Memo
Analytics Engineer dbt layer (5 marts, 33 schema tests) · BigQuery SQL · Evidence.dev Report (dbt → BI as code) · ETL pipeline · 12 SQL files in sql/
Risk / Quant Analyst VaR Analysis · Stress Testing (CCAR/DFAST) · Credit Risk Explorer
BI Developer Tableau Dashboard · Power BI PBIP (14 DAX measures) · LookML Models · Evidence.dev · Google Sheets Export

Written deliverable: CASE_STUDY.md — bank memo format, VP-ready.


Analyst Deliverables

Deliverable Description Link
Tableau Public Interactive mortgage credit risk dashboard — approval patterns, FICO analysis, fair lending disparities Live Dashboard
Power BI Project 3-page PBIP report (Credit Risk, Sector Performance, Macro Dashboard) with DAX measures — Git-friendly text format powerbi/
LookML Models Looker semantic layer — lending, stocks, and FRED views with credit risk explore lookml/
Evidence.dev 5-page narrative BI report — credit risk, fair lending, portfolio returns, sector performance — built on dbt marts with SQL + Svelte charts Live Report
BigQuery SQL Layer 5 analytics queries: DDL with partitioning/clustering, credit risk analysis, portfolio analytics, fair lending compliance, CCAR stress testing bigquery/
Business Requirements Doc BA-format BRD: project charter, RACI, 7 business requirements with MoSCoW priority and acceptance criteria, functional requirements, traceability matrix BUSINESS_REQUIREMENTS.md
Data Model ERD Star schema Mermaid diagram — 4 source tables + 5 dbt marts, grain statements, relationship cardinality, design decisions DATA_MODEL.md
Advanced SQL Patterns 5 interview-ready patterns: PIVOT cross-tab, QUALIFY top-N, CUME_DIST percentile, self-join period-over-period, GROUPING SETS subtotals sql/advanced_patterns.sql
Excel Workbook 8-tab export: Executive Summary, Lending, Stocks, FRED, Risk Tiers, Cohorts, What-If export script
HMDA Fair Lending Memo 4-page analyst memo — approval disparities, income controls, race gap, regulatory context, and 3 actionable recommendations hmda-fair-lending-memo.pdf
Narrative Notebook DA-style analysis (60% markdown) — findings presented to a compliance team, not to a code reviewer notebook 16
Executive PDF 4-page chart-embedded brief — portfolio overview, credit risk, macro recommendations executive-brief.pdf
Google Sheets Cloud-native export with formatting — same 8-tab structure as Excel export script
Case Study Memo VP-ready bank memo format with dollar-impact findings CASE_STUDY.md

Tableau Dashboard

Where Mortgage Approvals Break Down: HMDA Credit Risk Analysis — Tableau Public

Interactive executive dashboard analyzing CFPB HMDA mortgage lending data. Answers: Which loan grades carry the highest default risk, and where are applicants being denied at disproportionate rates?

Key findings surfaced:

  • Grade G defaults at ~4× the rate of Grade A; sub-prime tail risk concentrated in FICO < 640 band
  • Denial rates vary significantly by income tier — income-disparity pattern visible across geographies
  • DTI and loan purpose are stronger denial predictors than loan amount alone

Tableau Dashboard Screenshot

Data pipeline: 9.5MB HMDA dataset prepared with Python + dbt before visualization. Export script: scripts/export_tableau_data.py. Spec: docs/TABLEAU.md.


Evidence.dev Report (BI as Code)

Finance Analytics Report — deployed on Vercel, built from dbt marts

Five-page narrative analytics report built with Evidence.dev — a markdown-based BI framework that compiles SQL queries and Svelte chart components into a static site. Each page combines live SQL against DuckDB marts with analyst-written narrative, alert callouts, and actionable recommendations.

Report pages:

Page What It Covers Link
Homepage Executive overview with key metrics, portfolio composition, and navigation to all analyses View
Credit Risk Default rates by loan grade, FICO/DTI distributions, risk tier classification, underwriting recommendations View
Fair Lending Disparate impact analysis (ECOA/FHA), DI ratios by income tier and ownership, rate spread disparities, regulatory risk flags View
Portfolio Returns Sharpe ratios by sector and ticker, risk-return scatter, cumulative return trends, top 20 holdings summary View
Sector Performance Monthly sector rankings, rotation trends, volatility regimes, efficiency frontier scatter View

What makes this different from the Streamlit dashboard: Evidence.dev pages are written like analyst memos — each finding has a narrative title, an explanatory paragraph, and a "So What?" section with numbered business recommendations. The SQL source queries include calculated columns (Sharpe ratios, DI ratios, momentum labels, risk flags) that power the charts directly, demonstrating analytics engineering alongside storytelling.

Stack: Evidence.dev + dbt-core + DuckDB | Source: evidence-report/ | Auto-deploys from GitHub via Vercel.


Methodology

  • Data: 50K synthetic lending records (seed=42, matched to Lending Club statistical properties) + live yfinance/FRED with synthetic fallback. See docs/DATA_GENERATION.md.
  • Models: scikit-learn (RF/GBM/LogReg), statsmodels (OLS, ARIMA), arch (GARCH), lifelines (Kaplan-Meier), shap (TreeExplainer)
  • Analytics layer: dbt-core + DuckDB (3 staging models, 5 marts, 33 schema tests). See SQL & Analytics Engineering.

Model governance: docs/model_card.md — SR 11-7 compliant.


Quick Start

git clone https://github.com/ChunkyTortoise/finance-analytics-portfolio
cd finance-analytics-portfolio
make setup   # pip install -r requirements.txt
make app     # streamlit run dashboard/app.py

Or with Docker:

docker build -t finance-analytics .
docker run -p 8501:8501 finance-analytics
# open http://localhost:8501

No API keys required — synthetic data fallback is built in.


Key Findings

Three findings that would change a lender's underwriting or risk decisions today:

  1. DTI beats FICO for high-credit borrowers. Among borrowers with FICO > 720, debt-to-income ratio explains ~38% of default variance that FICO-only screening misses. For a $100M portfolio, tightening DTI thresholds in the 720-760 band is the higher-ROI lever without declining additional applications. → Credit Risk Explorer

  2. Stagflation is harder on portfolios than severe recession. Grade G PD multiplier reaches 2.4x under stagflation vs 2.1x for severe recession — floating-rate exposure amplifies rate shocks beyond what unemployment-only stress models predict. → Stress Testing

  3. Parametric VaR understates tail risk by 12–18%. GARCH(1,1) conditional volatility forecasts outperform rolling-window estimates with 15% lower out-of-sample RMSE. For a $1B equity portfolio, the gap implies ~$12–18M in unhedged tail exposure at 99% confidence. → VaR Analysis


Flagship Pages

Page What It Demonstrates
Credit Risk Explorer Random Forest + SHAP explanations, 3-model comparison (LogReg/RF/GBM), ROC curves, grade transition matrix, Reg B framing
VaR Analysis Parametric/Historical/Monte Carlo VaR + ES, Kupiec + Christoffersen backtests, GARCH(1,1), horizon scaling, regime classification
Causal Inference DiD estimator, synthetic control, regression discontinuity, propensity score matching — causal vs correlational
Fair Lending Geography HMDA 2022 Colorado mortgage data, county-level approval rates, income-disparity scatter, choropleth
Stress Testing CCAR/DFAST macro scenarios, grade-level PD amplification, sensitivity heatmap, Markov regime switching
Credit Risk Explorer Fair Lending Geography
Credit Risk Explorer Fair Lending Geography

Architecture
graph LR
    subgraph Sources
        YF[yfinance<br/>15 tickers]
        FRED[FRED API<br/>7 indicators]
        SYN[Synthetic<br/>seed=42]
    end

    subgraph "ETL Pipeline"
        EX[extract.py]
        TR[transform.py]
        LD[load.py]
    end

    subgraph Storage
        SP[stocks.parquet]
        FP[fred.parquet]
        LP[lending.parquet]
    end

    subgraph "dbt Analytics"
        STG[3 staging views]
        MRT[5 mart tables]
    end

    subgraph "Analysis (26 modules)"
        CM[credit_models.py]
        NC[neural_credit.py]
        VA[var_analysis.py]
        ST[stress_testing.py]
        MORE[+19 modules]
    end

    subgraph Dashboard
        APP[Streamlit<br/>24 pages]
    end

    YF --> EX --> TR --> LD
    FRED --> EX
    SYN --> EX
    LD --> SP & FP & LP
    SP & FP & LP --> STG --> MRT
    SP & FP & LP --> CM & VA & ST & MORE
    CM & VA & ST & MORE --> APP
Loading

All Pages

24 pages (click to expand)
Page Description
Credit Risk Explorer FICO/DTI distributions, Random Forest default model, SHAP explanations, model comparison (LogReg/RF/GBM + ROC), grade transition matrix
VaR Analysis Parametric/Historical/MC VaR + ES, Component VaR, Kupiec + Christoffersen backtests, GARCH(1,1), horizon risk scaling, regime classification
Causal Inference DiD, synthetic control, regression discontinuity, propensity score matching
Fair Lending Geography HMDA 2022 Colorado: county approval rates, income-disparity scatter, choropleth
Stress Testing CCAR/DFAST macro scenarios, grade-level PD amplification, sensitivity heatmap, Markov regime switching
Executive Summary CFO-ready KPIs (S&P YTD, Sharpe, default rate, recession risk), key findings, Excel export
Market Overview Sector treemap, top movers, market breadth, S&P 500 moving average signals
Economic Dashboard FRED macro time series, yield curve, event study (CAR/AAR)
Portfolio Optimizer Efficient frontier, Monte Carlo simulation, factor attribution, rolling beta
Forecasting Prophet time series with trend/seasonality decomposition
Survival Analysis Kaplan-Meier curves by grade, cohort heatmap, vintage curves
A/B Testing Lab Power analysis, frequentist z-test, Bonferroni, Bayesian Beta-Binomial
SQL Case Study Gap-and-islands, recursive CTEs, cohort analysis — with business recommendations
SQL Playground Interactive DuckDB runner — 7 templates, schema browser, NL-to-SQL
Segmentation K-means clustering, PCA 2D projection, segment profiles
Data Quality Schema validation, missing value profiling, freshness checks
Model Monitoring PSI/CSI score drift, calibration curve, threshold optimization
Anomaly Detection Isolation Forest, Z-score, CUSUM structural break detection
Fairness & Bias Disparate impact, equalized odds, threshold mitigation — ECOA/FHA framing
Regression Analysis OLS (LPM), VIF, stepwise selection (AIC/BIC), residual diagnostics
Live Market Monitor Real-time simulated streaming market data, rolling VaR, volatility regime classification
News Sentiment VADER headline sentiment, daily aggregation, sentiment-return correlation, top terms
Interview Walkthrough Guided walkthrough of the portfolio's analytical decisions and methodology
Credentials & Expertise 21 professional certifications, provider breakdown, skills radar, "Applied" badges

SQL & Analytics Engineering

dbt Analytics

make dbt-run        # dbt run --target dev (DuckDB)
make dbt-test       # dbt test --target dev (33 schema tests)
make dbt-snowflake  # dbt run + test --target snowflake (requires env vars)

Staging (views): stg_stocks, stg_lending, stg_macro — source via {{ source_table() }} macro, dispatches to read_parquet() (DuckDB) or {{ source() }} (Snowflake)

Marts (tables): mart_default_rates, mart_portfolio_returns, mart_sector_performance, mart_credit_risk, mart_fair_lendingmin_by()/max_by() throughout for cross-database compatibility

See docs/SNOWFLAKE.md for free trial setup and bulk loading.


SQL Case Studies

11 standalone queries in sql/ — each framed as a business question:

File Business Question
cohort_defaults.sql How do default rates evolve by loan origination cohort over time?
cross_domain_correlation.sql Does rising unemployment predict rising default rates?
gap_islands.sql Which origination periods show activity gaps vs continuous volume?
monthly_default_trend.sql What is the month-over-month default trend by loan grade?
recursive_cte.sql How do defaults cascade through a portfolio over a multi-period horizon?
risk_decile_ntile.sql Which borrowers fall in the top-risk decile, and what do they share?
borrower_risk_tiers.sql Which borrowers belong in each risk tier based on FICO and DTI combined?
sector_momentum.sql Which sectors show sustained momentum vs mean-reversion?
sector_rankings.sql How do sectors rank by risk-adjusted return?
sessionization.sql What are the stage-to-stage conversion rates in the loan application funnel?
yoy_comparison.sql How do sector returns compare year-over-year?

Tests (1,522 across 59 files)
make test   # pytest tests/ -v --tb=short

1,522 tests across 60 files covering ETL, SQL analytics, cohort/survival analysis, A/B testing, forecasting, model comparison, PyTorch neural networks, LangChain SQL chains, segmentation, data quality, Excel export, KPI computations, model monitoring, anomaly detection, fairness/bias, stress testing (CCAR/DFAST), VaR/ES (Component VaR, Christoffersen test, horizon scaling), GARCH volatility, OLS regression, factor attribution, rolling beta, credit transition matrices, Markov regime switching, event studies, HMDA geographic analysis, Snowflake SQL compatibility, and Great Expectations data quality validation.


Tool Integrations
Tool What's built Docs
Tableau Public Interactive mortgage credit risk dashboard (approval/denial patterns, FICO analysis, fair lending) built on CFPB HMDA data prepared via Python/SQL/dbt pipeline Live Dashboard
Power BI 3-page PBIP report with DAX measures, Git-friendly format powerbi/
LookML Looker semantic layer with lending, stocks, and FRED explores lookml/
Snowflake Dual-target dbt (--target dev or --target snowflake), source_table() dispatch macro, bulk parquet loader docs/SNOWFLAKE.md
Apache Airflow TaskFlow DAG (ETL→GX→dbt→Tableau), Docker Compose, isolated deps airflow/README.md
Great Expectations 3 validation suites (lending: 12, stocks: 8, FRED: 6 expectations), Data Docs HTML output python -m great_expectations
Google Sheets Cloud-native export with formatting — 8-tab structure export script
Polars pandas vs Polars benchmark notebook (6 ops, LazyFrame, DuckDB Arrow zero-copy) notebooks/14_polars_comparison.ipynb

Tech Stack
Category Libraries
Data manipulation pandas, numpy, duckdb, pyarrow
Statistics & modeling scipy, scikit-learn, statsmodels, lifelines
Forecasting / volatility prophet, arch (GARCH, EGARCH)
ML explainability shap
Visualization plotly, matplotlib, seaborn
Data acquisition yfinance, fredapi
Dashboard streamlit, openpyxl
Analytics layer dbt-core, dbt-duckdb

For engineering details (FastAPI, Airflow, streaming, neural networks), see docs/TECHNICAL.md.


Limitations

  • Synthetic lending data: Generated to match real-world FICO/DTI/grade distributions (seed=42), not real borrower records.
  • Single period: 2019–2024 window covers COVID volatility but not a full credit cycle.
  • Univariate GARCH: Per-asset GARCH(1,1). Production would use DCC-GARCH for cross-asset correlation.
  • No transaction costs: Portfolio optimization assumes frictionless trading.

About

SQL and Python analysis on 1.2M+ CFPB HMDA mortgage records. Credit risk, fair lending, regulatory compliance. Deliverables: 21-page dashboard, Tableau, Power BI, Excel, 11 SQL case studies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors