Skip to content

Conversation

@MaxGhenis
Copy link
Contributor

Summary

  • Adds impute_student_loan_balance() function that estimates outstanding student loan balances based on plan type and years since graduation
  • Adds load_was_student_loan_data() helper for extracting SLC debt from WAS Round 7 data
  • Integrates into the dataset creation pipeline
  • Adds unit tests for balance calculation logic

Implementation Details

Balance estimates by plan type:

  • Plan 1 (pre-2012): £15k base with 3% annual decay
  • Plan 2 (2012-2023): £45k base with 2% annual decay
  • Plan 5 (2023+): £25k (new loans, minimal repayment)

Totals are scaled to match SLC admin statistics (£294bn as of March 2025).

Test plan

  • Unit tests pass for balance calculation logic
  • Tests verify scaling factors work correctly
  • Integration testing with full dataset pipeline

Closes #238

🤖 Generated with Claude Code

MaxGhenis and others added 7 commits November 30, 2025 23:49
Adds impute_student_loan_balance() function that:
- Estimates balance based on plan type and years since graduation
- Plan 1: £15k base with 3% annual decay
- Plan 2: £45k base with 2% annual decay
- Plan 5: £25k (new loans)
- Scales totals to match SLC admin statistics (£294bn)

Also adds load_was_student_loan_data() helper for extracting SLC debt
from WAS Round 7 (Tot_LosR7_aggr - Tot_los_exc_SLCR7_aggr).

Closes #238

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
- Replace crude scaling approach with QRF model trained on WAS data
- Add generate_was_student_loan_table() to prepare training data
- Add save_student_loan_model() and create_student_loan_model() helpers
- Impute household-level SLC debt, then allocate to individuals with loans
- Calibration to admin totals will happen in main calibration step
- Update tests to reflect new allocation-based approach

The QRF approach is consistent with other imputations (wealth, consumption)
and allows proper calibration rather than crude scaling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds calibration targets for student loans to the loss function:
- Total outstanding balance (£294bn in 2025, from SLC)
- Total annual repayments (£5.6bn in 2025, from DfE/OBR)
- Number of borrowers with balance (~9.4m)
- Number of people making repayments (~3.5m)

These targets will be used during calibration to adjust weights
to match admin statistics from SLC, DfE, and OBR.

Sources:
- SLC: gov.uk/government/statistics/student-loans-in-england-2024-to-2025
- DfE forecasts: gov.uk/government/statistics/student-loan-forecasts-for-england
- OBR: obr.uk/forecasts-in-depth/tax-by-tax-spend-by-spend/student-loans/

Closes #237

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
WAS severely undercounts student loan debt (£33bn weighted vs £294bn admin),
making the QRF approach unreliable. Instead:

- Assign balances based on plan type using SLC admin averages
- Plan 1: £10k base with 2% annual decay
- Plan 2: £45k base with 1% annual decay
- Plan 4: £13k base with 2% annual decay
- Plan 5: £15k (new loans)

Note: FRS only captures ~3.75m repayers vs 9.4m borrowers in admin data.
Calibration targets in loss.py will adjust weights to match admin totals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Allows sampling from different parts of the conditional distribution.
Useful when source data undercounts and you want to sample from upper tail.

Tested for student loan balance but WAS undercount is too severe (£15bn max
vs £294bn target) - even q=0.99 can't compensate for missing observations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The QRF model was predicting very different student loan debt rates
between WAS and FRS despite similar income/household composition.
Adding HRP age band as a predictor dramatically improves results:

Before (without age): FRS at q=0.99 predicted £14.5bn
After (with age band): FRS at q=0.5 predicts £30.3bn (close to WAS £33.4bn)

Changes:
- Add hrp_age_band to STUDENT_LOAN_PREDICTORS
- Add age_to_band() function to convert ages to WAS-style bands (2-8)
- Add get_frs_predictors() to extract household-level predictors from FRS
- Include age band in WAS data extraction

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Enhance student loan balance imputation with additional predictors:

1. Add tenure_type: Strong predictor - mortgaged owners (8.3%) vs
   outright owners (1.3%) have very different debt rates

2. Add hrp_employed: Employment status distinguishes employed (7.3%)
   from retired (0.5%) households

3. Use FRS reported repayments to identify loan holders: FRS captures
   ~4.35m repayers vs admin ~3.8m, providing good coverage

Results improved significantly:
- WAS at q=0.5: £29.2bn (actual: £33.4bn)
- FRS at q=0.5: £28.3bn (much closer alignment with WAS)

Also adds tests for age_to_band() and tenure mappings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impute student loan balance from WAS to FRS

2 participants