Skip to content

Conversation

@MaxGhenis
Copy link
Contributor

@MaxGhenis MaxGhenis commented Oct 5, 2025

Summary

This PR moves ALL random number generation from policyengine-uk into the dataset generation in policyengine-uk-data. The country package is now a purely deterministic rules engine.

⚠️ MERGE ORDER: This PR (#203) must be merged BEFORE the companion policyengine-uk PR #1355

Changes

New take-up rate parameters

Added YAML parameter files in policyengine_uk_data/parameters/take_up/:

  • child_benefit.yaml (0.97 → 0.89 over time)
  • child_benefit_opts_out_rate.yaml (0.23)
  • pension_credit.yaml (0.7)
  • universal_credit.yaml (0.55)
  • marriage_allowance.yaml (1.0 - full take-up)
  • tax_free_childcare.yaml (0.586)
  • extended_childcare.yaml (0.812)
  • universal_childcare.yaml (0.563)
  • targeted_childcare.yaml (0.597)

FRS dataset generation

  • Load take-up rates from YAML parameter files
  • Generate all stochastic boolean decisions (would_claim_*, etc.)
  • Generate random draws for tie-breaking and conditional probabilities
  • Use seeded RNG (seed=100) for full reproducibility
  • All other random processes also seeded for reproducibility

Stochastic variables generated

Take-up decisions (boolean):

  • would_claim_child_benefit
  • child_benefit_opts_out
  • would_claim_pc (Pension Credit)
  • would_claim_uc (Universal Credit)
  • would_claim_marriage_allowance
  • would_claim_tfc (Tax-Free Childcare)
  • would_claim_extended_childcare
  • would_claim_universal_childcare
  • would_claim_targeted_childcare

Other stochastic variables (boolean):

  • household_owns_tv (96% rate)
  • would_evade_tv_licence_fee (6% rate)
  • main_residential_property_purchased_is_first_home (30% rate)

Random draws (float [0,1)):

  • higher_earner_tie_break (for tie-breaking in income comparisons)
  • attends_private_school_random_draw (for income-conditional probability)

Trade-offs

IMPORTANT: Take-up rates can no longer be adjusted dynamically via policy reforms or in the web app. They are fixed in the microdata at generation time. This is an acceptable trade-off for the cleaner architecture of keeping the country package purely deterministic.

To adjust take-up rates for analysis, the microdata must be regenerated with updated parameter values.

Test Plan

  • FRS dataset generation completes successfully
  • All stochastic variables are generated correctly
  • All tests pass
  • Companion policyengine-uk PR #1355 passes all tests after this is merged

Related PRs

  • policyengine-uk: #1355 (must be merged AFTER this)

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

nikhilwoodruff and others added 22 commits October 1, 2025 11:26
birth_year should be calculated from age and period in the model,
not stored as static data in the dataset. This allows birth_year to
properly update in multi-year projections.

With static birth_year in the dataset:
- 2026: birth_year stays 2006-2023 (based on 2023 survey)
- 2029: birth_year stays 2006-2023 (incorrect)

By calculating birth_year = period.year - age:
- 2026: birth_year becomes 2009-2026 (correct for 2026)
- 2029: birth_year becomes 2012-2029 (correct for 2029)

This fix is required for PolicyEngine/policyengine-uk#1352 to work
correctly and ensure two-child limit cost projections increase over
time as expected.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Remove birth_year from FRS dataset generation
This change moves random number generation from policyengine-uk into the
dataset generation, following the pattern established in policyengine-us-data.

Changes:
- Add random seed generation in FRS dataset for 11 independent random decisions
  (4 person-level, 4 benunit-level, 3 household-level seeds)
- Update SPI dataset to use seeded generator for age assignment
- Update income imputation to use seeded generator for age assignment
- Update capital gains imputation to use seeded generator for quantile sampling
- Update childcare assumptions to use seeded generator

All random generation now uses np.random.default_rng(seed=100) for full
reproducibility across dataset builds.

Each seed corresponds to a specific independent random decision to avoid
artificial correlations between unrelated stochastic processes.

Related: policyengine-uk PR (must be merged after this)
@MaxGhenis MaxGhenis force-pushed the migrate-random-to-data branch 2 times, most recently from 1b78434 to 0720fc5 Compare October 5, 2025 21:20
This change moves ALL random number generation from policyengine-uk into the
dataset generation in policyengine-uk-data. The country package is now a
purely deterministic rules engine.

## Key Changes

### policyengine-uk-data:
- Add take-up rate YAML parameter files in `parameters/take_up/`
- Generate all stochastic decisions in FRS dataset using these rates
- Generate boolean would_claim variables directly in dataset
- Generate random draws for variables that need them (tie-breaking, etc.)
- Use seeded RNG (seed=100) for full reproducibility

### Stochastic variables generated:
**Take-up decisions (boolean):**
- would_claim_child_benefit
- child_benefit_opts_out
- would_claim_pc (Pension Credit)
- would_claim_uc (Universal Credit)
- would_claim_marriage_allowance
- would_claim_tfc (Tax-Free Childcare)
- would_claim_extended_childcare
- would_claim_universal_childcare
- would_claim_targeted_childcare

**Other stochastic variables (boolean):**
- household_owns_tv
- would_evade_tv_licence_fee
- main_residential_property_purchased_is_first_home
- is_disabled_for_benefits (based on reported benefits)

**Random draws (float [0,1)):**
- is_higher_earner_random_draw (for tie-breaking)
- attends_private_school_random_draw (for income-conditional probability)

## Trade-offs

**IMPORTANT**: Take-up rates can no longer be adjusted dynamically via policy
reforms or in the web app. They are fixed in the microdata. This is an
acceptable trade-off for the cleaner architecture of keeping the country
package purely deterministic. To adjust take-up rates, the microdata must be
regenerated.

Related: policyengine-uk PR (must be merged after this)
@MaxGhenis
Copy link
Contributor Author

Closing stale PR - recreated fresh as #246 to avoid merge conflicts

@MaxGhenis MaxGhenis closed this Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants