Credit RP Diagnostic
A regulatory-focused diagnostic tool that quantifies feature redundancy in credit risk models using sparse random projections. Assesses effective dimensionality of credit datasets to identify over-parameterization and ensure compliance with Basel IRB and IFRS 9 validation frameworks.
The Over-Parameterization Problem
Credit risk models used in Basel IRB and IFRS 9 frameworks must demonstrate stability and predictive power. A common failure mode is feature bloat: models incorporate dozens of features, many of which are highly correlated or redundant, creating instability under stress testing and regulatory scrutiny. This diagnostic quantifies exactly how many features are truly independent.
Sparse Random Projection Approach
Rather than traditional dimensionality reduction (PCA, autoencoders), this project uses sparse random projections— a computationally efficient method that preserves pairwise distances while projecting high-dimensional data into lower-dimensional spaces. The approach:
Train a balanced logistic regression classifier on the full feature set (59 credit features post-processing) using the Home Credit Default Risk dataset (300k samples, binary classification).
Apply sparse random projection to reduce the feature space from k to 1, measuring AUC at each dimension. The "elbow" point where AUC plateaus indicates the intrinsic dimensionality.
Run the projection pipeline 20 times with different random seeds and average the AUC curves. This stabilizes results against seed-dependent variance.
Calculate the redundancy ratio as (total features − intrinsic dimensions) / total features. This ratio is the compliance metric reported to regulators.
Diagnostic Findings
Model Complexity Audit
Regulatory Verdict: PASS — Well-calibrated dimensionality. The model is not over-parameterized; redundancy is modest and the effective feature space is tightly aligned with intrinsic data structure. This positioning provides strong defense against accusations of fitting spurious correlations or hidden test-set leakage.
The diagnostic assumes that features are not deliberately engineered to collude. If features are derived from the same underlying source (e.g., multiple ratios computed from the same balance sheet), correlation is expected and reflects data structure, not model bloat.
Output & Artifacts
The project is implemented as a self-contained Jupyter notebook (single-file execution). It includes:
- Data loading and preprocessing: Home Credit dataset preparation, feature engineering
- Baseline model training: Logistic regression with balanced class weighting
- Projection pipeline: Sparse RP dimensionality sweeps with ensemble averaging
- Visualization: AUC curves, dimensionality elbow plots, redundancy heatmaps
- Compliance report: Structured output summarizing regulatory findings
- Dependencies: scikit-learn, pandas, numpy, matplotlib
Basel IRB & IFRS 9 Alignment
This diagnostic directly addresses Basel III IRB validation requirements, which mandate:
- Stability testing: Models must remain stable under economic stress (addressed by redundancy reduction)
- Predictive power: Metrics like AUC must be documented and defended (quantified here)
- Feature justification: Every feature must have a documented business and statistical rationale (this tool justifies exclusion of redundant ones)
- Transparency: Model complexity must be communicated to risk committees and regulators (the redundancy ratio is the key metric)
Similarly, IFRS 9 Expected Credit Loss (ECL) models rely on stable, well-calibrated PD estimates. Over-parameterized models fail in out-of-sample periods and regulatory backtests. This diagnostic provides quantitative evidence that the model is appropriately tuned.