Cayman Roden · Data Analyst | Credit Risk, Market Analysis & BI Deliverables
SQL and Python analysis on real 2022 CFPB HMDA mortgage data (1.2M+ rows) covering credit risk, fair lending, and financial risk - findings delivered as a 25-page interactive dashboard, written business memos, 8-tab Excel workbook, and 11 SQL query templates.
Three findings that would change a lender's decisions today:
- DTI ratio, not FICO, drives default among prime borrowers (720+ FICO) - tightening DTI thresholds reduces expected losses by $600K-$900K per $100M with no impact on approval volume
- Geographic HMDA analysis surfaces county-level fair lending disparities with $5-15M estimated regulatory penalty exposure
- Parametric VaR understates tail risk by 12-18% - GARCH volatility forecasts reveal $12-18M in unhedged exposure per $1B equity portfolio
Live Dashboard - no sign-up required
Credentials: 21 professional certifications · 1,831 training hours · 127 courses — view breakdown on the Credentials page
| If you're hiring a... | Start here |
|---|---|
| Data Analyst | SQL Case Study · Advanced SQL Patterns · BigQuery SQL Layer · Tableau Dashboard · HMDA Fair Lending Memo · Narrative Notebook |
| Business Analyst | Business Requirements Doc · Data Model ERD · Case Study Memo · Data Dictionary · HMDA Memo |
| Analytics Engineer | dbt layer (5 marts, 33 schema tests) · BigQuery SQL · Evidence.dev Report (dbt → BI as code) · ETL pipeline · 12 SQL files in sql/ |
| Risk / Quant Analyst | VaR Analysis · Stress Testing (CCAR/DFAST) · Credit Risk Explorer |
| BI Developer | Tableau Dashboard · Power BI PBIP (14 DAX measures) · LookML Models · Evidence.dev · Google Sheets Export |
Written deliverable: CASE_STUDY.md — bank memo format, VP-ready.
| Deliverable | Description | Link |
|---|---|---|
| Tableau Public | Interactive mortgage credit risk dashboard — approval patterns, FICO analysis, fair lending disparities | Live Dashboard |
| Power BI Project | 3-page PBIP report (Credit Risk, Sector Performance, Macro Dashboard) with DAX measures — Git-friendly text format | powerbi/ |
| LookML Models | Looker semantic layer — lending, stocks, and FRED views with credit risk explore | lookml/ |
| Evidence.dev | 5-page narrative BI report — credit risk, fair lending, portfolio returns, sector performance — built on dbt marts with SQL + Svelte charts | Live Report |
| BigQuery SQL Layer | 5 analytics queries: DDL with partitioning/clustering, credit risk analysis, portfolio analytics, fair lending compliance, CCAR stress testing | bigquery/ |
| Business Requirements Doc | BA-format BRD: project charter, RACI, 7 business requirements with MoSCoW priority and acceptance criteria, functional requirements, traceability matrix | BUSINESS_REQUIREMENTS.md |
| Data Model ERD | Star schema Mermaid diagram — 4 source tables + 5 dbt marts, grain statements, relationship cardinality, design decisions | DATA_MODEL.md |
| Advanced SQL Patterns | 5 interview-ready patterns: PIVOT cross-tab, QUALIFY top-N, CUME_DIST percentile, self-join period-over-period, GROUPING SETS subtotals | sql/advanced_patterns.sql |
| Excel Workbook | 8-tab export: Executive Summary, Lending, Stocks, FRED, Risk Tiers, Cohorts, What-If | export script |
| HMDA Fair Lending Memo | 4-page analyst memo — approval disparities, income controls, race gap, regulatory context, and 3 actionable recommendations | hmda-fair-lending-memo.pdf |
| Narrative Notebook | DA-style analysis (60% markdown) — findings presented to a compliance team, not to a code reviewer | notebook 16 |
| Executive PDF | 4-page chart-embedded brief — portfolio overview, credit risk, macro recommendations | executive-brief.pdf |
| Google Sheets | Cloud-native export with formatting — same 8-tab structure as Excel | export script |
| Case Study Memo | VP-ready bank memo format with dollar-impact findings | CASE_STUDY.md |
Where Mortgage Approvals Break Down: HMDA Credit Risk Analysis — Tableau Public
Interactive executive dashboard analyzing CFPB HMDA mortgage lending data. Answers: Which loan grades carry the highest default risk, and where are applicants being denied at disproportionate rates?
Key findings surfaced:
- Grade G defaults at ~4× the rate of Grade A; sub-prime tail risk concentrated in FICO < 640 band
- Denial rates vary significantly by income tier — income-disparity pattern visible across geographies
- DTI and loan purpose are stronger denial predictors than loan amount alone
Data pipeline: 9.5MB HMDA dataset prepared with Python + dbt before visualization. Export script: scripts/export_tableau_data.py. Spec: docs/TABLEAU.md.
Finance Analytics Report — deployed on Vercel, built from dbt marts
Five-page narrative analytics report built with Evidence.dev — a markdown-based BI framework that compiles SQL queries and Svelte chart components into a static site. Each page combines live SQL against DuckDB marts with analyst-written narrative, alert callouts, and actionable recommendations.
Report pages:
| Page | What It Covers | Link |
|---|---|---|
| Homepage | Executive overview with key metrics, portfolio composition, and navigation to all analyses | View |
| Credit Risk | Default rates by loan grade, FICO/DTI distributions, risk tier classification, underwriting recommendations | View |
| Fair Lending | Disparate impact analysis (ECOA/FHA), DI ratios by income tier and ownership, rate spread disparities, regulatory risk flags | View |
| Portfolio Returns | Sharpe ratios by sector and ticker, risk-return scatter, cumulative return trends, top 20 holdings summary | View |
| Sector Performance | Monthly sector rankings, rotation trends, volatility regimes, efficiency frontier scatter | View |
What makes this different from the Streamlit dashboard: Evidence.dev pages are written like analyst memos — each finding has a narrative title, an explanatory paragraph, and a "So What?" section with numbered business recommendations. The SQL source queries include calculated columns (Sharpe ratios, DI ratios, momentum labels, risk flags) that power the charts directly, demonstrating analytics engineering alongside storytelling.
Stack: Evidence.dev + dbt-core + DuckDB | Source: evidence-report/ | Auto-deploys from GitHub via Vercel.
- Data: 50K synthetic lending records (seed=42, matched to Lending Club statistical properties) + live yfinance/FRED with synthetic fallback. See docs/DATA_GENERATION.md.
- Models: scikit-learn (RF/GBM/LogReg), statsmodels (OLS, ARIMA), arch (GARCH), lifelines (Kaplan-Meier), shap (TreeExplainer)
- Analytics layer: dbt-core + DuckDB (3 staging models, 5 marts, 33 schema tests). See SQL & Analytics Engineering.
Model governance: docs/model_card.md — SR 11-7 compliant.
git clone https://github.com/ChunkyTortoise/finance-analytics-portfolio
cd finance-analytics-portfolio
make setup # pip install -r requirements.txt
make app # streamlit run dashboard/app.pyOr with Docker:
docker build -t finance-analytics .
docker run -p 8501:8501 finance-analytics
# open http://localhost:8501No API keys required — synthetic data fallback is built in.
Three findings that would change a lender's underwriting or risk decisions today:
-
DTI beats FICO for high-credit borrowers. Among borrowers with FICO > 720, debt-to-income ratio explains ~38% of default variance that FICO-only screening misses. For a $100M portfolio, tightening DTI thresholds in the 720-760 band is the higher-ROI lever without declining additional applications. → Credit Risk Explorer
-
Stagflation is harder on portfolios than severe recession. Grade G PD multiplier reaches 2.4x under stagflation vs 2.1x for severe recession — floating-rate exposure amplifies rate shocks beyond what unemployment-only stress models predict. → Stress Testing
-
Parametric VaR understates tail risk by 12–18%. GARCH(1,1) conditional volatility forecasts outperform rolling-window estimates with 15% lower out-of-sample RMSE. For a $1B equity portfolio, the gap implies ~$12–18M in unhedged tail exposure at 99% confidence. → VaR Analysis
| Page | What It Demonstrates |
|---|---|
| Credit Risk Explorer | Random Forest + SHAP explanations, 3-model comparison (LogReg/RF/GBM), ROC curves, grade transition matrix, Reg B framing |
| VaR Analysis | Parametric/Historical/Monte Carlo VaR + ES, Kupiec + Christoffersen backtests, GARCH(1,1), horizon scaling, regime classification |
| Causal Inference | DiD estimator, synthetic control, regression discontinuity, propensity score matching — causal vs correlational |
| Fair Lending Geography | HMDA 2022 Colorado mortgage data, county-level approval rates, income-disparity scatter, choropleth |
| Stress Testing | CCAR/DFAST macro scenarios, grade-level PD amplification, sensitivity heatmap, Markov regime switching |
| Credit Risk Explorer | Fair Lending Geography |
|---|---|
![]() |
![]() |
Architecture
graph LR
subgraph Sources
YF[yfinance<br/>15 tickers]
FRED[FRED API<br/>7 indicators]
SYN[Synthetic<br/>seed=42]
end
subgraph "ETL Pipeline"
EX[extract.py]
TR[transform.py]
LD[load.py]
end
subgraph Storage
SP[stocks.parquet]
FP[fred.parquet]
LP[lending.parquet]
end
subgraph "dbt Analytics"
STG[3 staging views]
MRT[5 mart tables]
end
subgraph "Analysis (26 modules)"
CM[credit_models.py]
NC[neural_credit.py]
VA[var_analysis.py]
ST[stress_testing.py]
MORE[+19 modules]
end
subgraph Dashboard
APP[Streamlit<br/>24 pages]
end
YF --> EX --> TR --> LD
FRED --> EX
SYN --> EX
LD --> SP & FP & LP
SP & FP & LP --> STG --> MRT
SP & FP & LP --> CM & VA & ST & MORE
CM & VA & ST & MORE --> APP
24 pages (click to expand)
| Page | Description |
|---|---|
| Credit Risk Explorer | FICO/DTI distributions, Random Forest default model, SHAP explanations, model comparison (LogReg/RF/GBM + ROC), grade transition matrix |
| VaR Analysis | Parametric/Historical/MC VaR + ES, Component VaR, Kupiec + Christoffersen backtests, GARCH(1,1), horizon risk scaling, regime classification |
| Causal Inference | DiD, synthetic control, regression discontinuity, propensity score matching |
| Fair Lending Geography | HMDA 2022 Colorado: county approval rates, income-disparity scatter, choropleth |
| Stress Testing | CCAR/DFAST macro scenarios, grade-level PD amplification, sensitivity heatmap, Markov regime switching |
| Executive Summary | CFO-ready KPIs (S&P YTD, Sharpe, default rate, recession risk), key findings, Excel export |
| Market Overview | Sector treemap, top movers, market breadth, S&P 500 moving average signals |
| Economic Dashboard | FRED macro time series, yield curve, event study (CAR/AAR) |
| Portfolio Optimizer | Efficient frontier, Monte Carlo simulation, factor attribution, rolling beta |
| Forecasting | Prophet time series with trend/seasonality decomposition |
| Survival Analysis | Kaplan-Meier curves by grade, cohort heatmap, vintage curves |
| A/B Testing Lab | Power analysis, frequentist z-test, Bonferroni, Bayesian Beta-Binomial |
| SQL Case Study | Gap-and-islands, recursive CTEs, cohort analysis — with business recommendations |
| SQL Playground | Interactive DuckDB runner — 7 templates, schema browser, NL-to-SQL |
| Segmentation | K-means clustering, PCA 2D projection, segment profiles |
| Data Quality | Schema validation, missing value profiling, freshness checks |
| Model Monitoring | PSI/CSI score drift, calibration curve, threshold optimization |
| Anomaly Detection | Isolation Forest, Z-score, CUSUM structural break detection |
| Fairness & Bias | Disparate impact, equalized odds, threshold mitigation — ECOA/FHA framing |
| Regression Analysis | OLS (LPM), VIF, stepwise selection (AIC/BIC), residual diagnostics |
| Live Market Monitor | Real-time simulated streaming market data, rolling VaR, volatility regime classification |
| News Sentiment | VADER headline sentiment, daily aggregation, sentiment-return correlation, top terms |
| Interview Walkthrough | Guided walkthrough of the portfolio's analytical decisions and methodology |
| Credentials & Expertise | 21 professional certifications, provider breakdown, skills radar, "Applied" badges |
make dbt-run # dbt run --target dev (DuckDB)
make dbt-test # dbt test --target dev (33 schema tests)
make dbt-snowflake # dbt run + test --target snowflake (requires env vars)Staging (views): stg_stocks, stg_lending, stg_macro — source via {{ source_table() }} macro, dispatches to read_parquet() (DuckDB) or {{ source() }} (Snowflake)
Marts (tables): mart_default_rates, mart_portfolio_returns, mart_sector_performance, mart_credit_risk, mart_fair_lending — min_by()/max_by() throughout for cross-database compatibility
See docs/SNOWFLAKE.md for free trial setup and bulk loading.
11 standalone queries in sql/ — each framed as a business question:
| File | Business Question |
|---|---|
cohort_defaults.sql |
How do default rates evolve by loan origination cohort over time? |
cross_domain_correlation.sql |
Does rising unemployment predict rising default rates? |
gap_islands.sql |
Which origination periods show activity gaps vs continuous volume? |
monthly_default_trend.sql |
What is the month-over-month default trend by loan grade? |
recursive_cte.sql |
How do defaults cascade through a portfolio over a multi-period horizon? |
risk_decile_ntile.sql |
Which borrowers fall in the top-risk decile, and what do they share? |
borrower_risk_tiers.sql |
Which borrowers belong in each risk tier based on FICO and DTI combined? |
sector_momentum.sql |
Which sectors show sustained momentum vs mean-reversion? |
sector_rankings.sql |
How do sectors rank by risk-adjusted return? |
sessionization.sql |
What are the stage-to-stage conversion rates in the loan application funnel? |
yoy_comparison.sql |
How do sector returns compare year-over-year? |
Tests (1,522 across 59 files)
make test # pytest tests/ -v --tb=short1,522 tests across 60 files covering ETL, SQL analytics, cohort/survival analysis, A/B testing, forecasting, model comparison, PyTorch neural networks, LangChain SQL chains, segmentation, data quality, Excel export, KPI computations, model monitoring, anomaly detection, fairness/bias, stress testing (CCAR/DFAST), VaR/ES (Component VaR, Christoffersen test, horizon scaling), GARCH volatility, OLS regression, factor attribution, rolling beta, credit transition matrices, Markov regime switching, event studies, HMDA geographic analysis, Snowflake SQL compatibility, and Great Expectations data quality validation.
Tool Integrations
| Tool | What's built | Docs |
|---|---|---|
| Tableau Public | Interactive mortgage credit risk dashboard (approval/denial patterns, FICO analysis, fair lending) built on CFPB HMDA data prepared via Python/SQL/dbt pipeline | Live Dashboard |
| Power BI | 3-page PBIP report with DAX measures, Git-friendly format | powerbi/ |
| LookML | Looker semantic layer with lending, stocks, and FRED explores | lookml/ |
| Snowflake | Dual-target dbt (--target dev or --target snowflake), source_table() dispatch macro, bulk parquet loader |
docs/SNOWFLAKE.md |
| Apache Airflow | TaskFlow DAG (ETL→GX→dbt→Tableau), Docker Compose, isolated deps | airflow/README.md |
| Great Expectations | 3 validation suites (lending: 12, stocks: 8, FRED: 6 expectations), Data Docs HTML output | python -m great_expectations |
| Google Sheets | Cloud-native export with formatting — 8-tab structure | export script |
| Polars | pandas vs Polars benchmark notebook (6 ops, LazyFrame, DuckDB Arrow zero-copy) | notebooks/14_polars_comparison.ipynb |
Tech Stack
| Category | Libraries |
|---|---|
| Data manipulation | pandas, numpy, duckdb, pyarrow |
| Statistics & modeling | scipy, scikit-learn, statsmodels, lifelines |
| Forecasting / volatility | prophet, arch (GARCH, EGARCH) |
| ML explainability | shap |
| Visualization | plotly, matplotlib, seaborn |
| Data acquisition | yfinance, fredapi |
| Dashboard | streamlit, openpyxl |
| Analytics layer | dbt-core, dbt-duckdb |
For engineering details (FastAPI, Airflow, streaming, neural networks), see docs/TECHNICAL.md.
- Synthetic lending data: Generated to match real-world FICO/DTI/grade distributions (seed=42), not real borrower records.
- Single period: 2019–2024 window covers COVID volatility but not a full credit cycle.
- Univariate GARCH: Per-asset GARCH(1,1). Production would use DCC-GARCH for cross-asset correlation.
- No transaction costs: Portfolio optimization assumes frictionless trading.



