-
Notifications
You must be signed in to change notification settings - Fork 2
Move all randomness to data package for deterministic country package #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
birth_year should be calculated from age and period in the model, not stored as static data in the dataset. This allows birth_year to properly update in multi-year projections. With static birth_year in the dataset: - 2026: birth_year stays 2006-2023 (based on 2023 survey) - 2029: birth_year stays 2006-2023 (incorrect) By calculating birth_year = period.year - age: - 2026: birth_year becomes 2009-2026 (correct for 2026) - 2029: birth_year becomes 2012-2029 (correct for 2029) This fix is required for PolicyEngine/policyengine-uk#1352 to work correctly and ensure two-child limit cost projections increase over time as expected. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Re-add dividends to calibration set
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove birth_year from FRS dataset generation
Relax childcare test pass condition
Relax childcare tests by 100 percent
This change moves random number generation from policyengine-uk into the dataset generation, following the pattern established in policyengine-us-data. Changes: - Add random seed generation in FRS dataset for 11 independent random decisions (4 person-level, 4 benunit-level, 3 household-level seeds) - Update SPI dataset to use seeded generator for age assignment - Update income imputation to use seeded generator for age assignment - Update capital gains imputation to use seeded generator for quantile sampling - Update childcare assumptions to use seeded generator All random generation now uses np.random.default_rng(seed=100) for full reproducibility across dataset builds. Each seed corresponds to a specific independent random decision to avoid artificial correlations between unrelated stochastic processes. Related: policyengine-uk PR (must be merged after this)
a22bf4e to
d3266af
Compare
1b78434 to
0720fc5
Compare
This change moves ALL random number generation from policyengine-uk into the dataset generation in policyengine-uk-data. The country package is now a purely deterministic rules engine. ## Key Changes ### policyengine-uk-data: - Add take-up rate YAML parameter files in `parameters/take_up/` - Generate all stochastic decisions in FRS dataset using these rates - Generate boolean would_claim variables directly in dataset - Generate random draws for variables that need them (tie-breaking, etc.) - Use seeded RNG (seed=100) for full reproducibility ### Stochastic variables generated: **Take-up decisions (boolean):** - would_claim_child_benefit - child_benefit_opts_out - would_claim_pc (Pension Credit) - would_claim_uc (Universal Credit) - would_claim_marriage_allowance - would_claim_tfc (Tax-Free Childcare) - would_claim_extended_childcare - would_claim_universal_childcare - would_claim_targeted_childcare **Other stochastic variables (boolean):** - household_owns_tv - would_evade_tv_licence_fee - main_residential_property_purchased_is_first_home - is_disabled_for_benefits (based on reported benefits) **Random draws (float [0,1)):** - is_higher_earner_random_draw (for tie-breaking) - attends_private_school_random_draw (for income-conditional probability) ## Trade-offs **IMPORTANT**: Take-up rates can no longer be adjusted dynamically via policy reforms or in the web app. They are fixed in the microdata. This is an acceptable trade-off for the cleaner architecture of keeping the country package purely deterministic. To adjust take-up rates, the microdata must be regenerated. Related: policyengine-uk PR (must be merged after this)
0720fc5 to
c8e385d
Compare
371eeb6 to
26ccdd1
Compare
Contributor
Author
|
Closing stale PR - recreated fresh as #246 to avoid merge conflicts |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR moves ALL random number generation from policyengine-uk into the dataset generation in policyengine-uk-data. The country package is now a purely deterministic rules engine.
Changes
New take-up rate parameters
Added YAML parameter files in
policyengine_uk_data/parameters/take_up/:FRS dataset generation
Stochastic variables generated
Take-up decisions (boolean):
Other stochastic variables (boolean):
Random draws (float [0,1)):
Trade-offs
IMPORTANT: Take-up rates can no longer be adjusted dynamically via policy reforms or in the web app. They are fixed in the microdata at generation time. This is an acceptable trade-off for the cleaner architecture of keeping the country package purely deterministic.
To adjust take-up rates for analysis, the microdata must be regenerated with updated parameter values.
Test Plan
Related PRs
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com