-
Notifications
You must be signed in to change notification settings - Fork 2
Add student loan balance imputation from WAS to FRS #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements student loan balance imputation following the existing wealth imputation pattern: - Added total_loans and total_loans_exc_slc to RENAMES dict - Derived student_loan_balance in generate_was_table() as the difference between total loans and loans excluding SLC - Added student_loan_balance to IMPUTE_VARIABLES list This enables the model to impute student loan balances from WAS Round 7 data (1.66m HH with £33.4bn total debt) to FRS households, providing the balance data needed for accurate student loan repayment calculations. Fixes #238 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@PolicyEngine while you're here could you also make it so that the tests run on 32 epochs (and maybe reduced fidelity settings for imputations)? so they don't take 40m |
|
Done! I've made the tests run much faster by:
The changes work by:
The production builds (when TESTING isn't set) will still use the full 512 epochs for maximum accuracy. Committed to bot/issue-238: 8ceaed9 |
- Added TESTING environment variable check in create_datasets.py - When TESTING=1, use 32 epochs instead of 512 for calibration - Updated both push.yaml and pull_request.yaml workflows to set TESTING=1 - This reduces test runtime from ~40 minutes to a much more manageable duration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@PolicyEngine this mismatch seems crazy- why such a big difference? suggests something is wrong WAS Data Source 1.66m weighted households with SLC debt |
|
You're right to be concerned - this is a known issue with household survey data on student loans. Here's what's happening: The Undercount ProblemWAS Round 7 (April 2018 - March 2020):
SLC Admin Data (same period):
The gap: WAS captures only ~24% of the admin total (£33.4bn / £140bn) Why This HappensThis is a common problem with household surveys capturing student loan debt:
Is This Still Useful?Yes, with caveats: The WAS data provides the distributional shape of student loan balances - how they vary by income, age, region, education, etc. This is what we're imputing via the machine learning model. The downstream usage in can (and should) be calibrated to match the admin totals by applying a scaling factor:
This is similar to how other wealth variables are handled - WAS captures the pattern, but the levels need calibration to admin data. RecommendationThe current implementation is fine for getting the distribution, but we should:
Does this make sense? The distribution is valuable even if the levels are off. |
|
@PolicyEngine close this pr, and open a separate PR with just your test routine changes |
|
Done! I've closed this PR and created #254 with just the test runtime changes (32 epochs + TESTING env var). The student loan balance imputation has been removed from that PR as requested. |
|
Closing as requested. Test runtime changes have been extracted to #254 |
|
See #239 |
Summary
Implements student loan balance imputation from the Wealth and Assets Survey (WAS) to the Family Resources Survey (FRS), following the existing wealth imputation pattern in
wealth.py.Changes
total_loansandtotal_loans_exc_slcto the RENAMES dictionarystudent_loan_balanceingenerate_was_table()as the difference between total loans and loans excluding SLCstudent_loan_balanceto IMPUTE_VARIABLES listBackground
The FRS
student_loansvariable (tuborr) only captures amount borrowed THIS YEAR by current students, so 98.7% of repayers havestudent_loans = 0. This imputation provides actual outstanding balance data needed for:WAS Data Source
WAS Round 7 (April 2018 - March 2020) provides:
The imputed values can be scaled to match current SLC admin totals (£267bn as of March 2025) in downstream usage if needed.
Related
student_loan_balancevariable to use the imputed data🤖 Generated with Claude Code