Skip to content

The random() function means that local area microsimulations will never match their calibration #412

@baogorek

Description

@baogorek

core's random() function (lines 308-348) is used for 3 variables in policyengine-us and 15 variables in policyengine-uk. Two inputs determine the seed for each person:

  1. Entity ID (f"{population.entity.key}_id")
  2. Call count: How many times random() has been called in the simulation

The seed formula:
seed = int(abs(id * 100 + population.simulation.count_random_calls))

Example:

  • Person 5, 1st call to random() → seed = int(abs(5 * 100 + 1)) = 501
  • Person 5, 2nd call to random() → seed = int(abs(5 * 100 + 2)) = 502
  • Person 7, 1st call to random() → seed = int(abs(7 * 100 + 3)) = 703

For context, the multiplication by 100 has caused integer overflow problems in the past.

In the local area calibration case, where donor households must have their state_fips swapped, re-keying with with new person_ ids is unavoidable. Because the random() function is linked to person_id, the final Microsimulation from local area calibration will never match the matrix times the weights.

For instance, here is how snap in policyengine-us relates to the random function:

snap
  └── snap_gross_income
        └── snap_unearned_income (uses `adds`)
              └── ssi (SSI benefit amount)
                    └── is_ssi_eligible
                          └── meets_ssi_resource_test
                                └── random()  ← stochastic eligibility

What this means is that a household with $2000 in snap, that was assigned a weight of 150 - partially due to this snap value - might end up in the final microsimulation with $1800 in snap, but still a weight of 150. Then when we run Microsimulation.calculate('snap').sum(), we don't match the values from X @ w in the calibration. Whether that because of a bug in the construction of the very complex X, or is it because of random snap, is very difficult to tell.

Recommendation: Use seeds stored in the microdata like the SNAP take-up seed, and remove random from core.

SNAP's "take-up seed" works quite differently. (policyengine-us/policyengine_us/variables/gov/usda/snap/snap_take_up_seed.py)

Here's the SNAP takeup mechanism:

File 1: snap_take_up_seed.py (lines 1-8)
class snap_take_up_seed(Variable):
value_type = float
entity = SPMUnit
label = "Randomly assigned seed for SNAP take-up"
definition_period = YEAR
No formula

File 2: takes_up_snap_if_eligible.py (lines 10-13)
def formula(spm_unit, period, parameters):
seed = spm_unit("snap_take_up_seed", period)
takeup_rate = parameters(period).gov.usda.snap.takeup_rate
return seed < takeup_rate

Here, the snap_take_up_seed is defined in cps.py (line 230) in policyengine-us-data:

  data["snap_take_up_seed"] = generator.random(len(data["spm_unit_id"]))

This approach work better with local area calibration because that seed becomes linked to the household as a sort of property. We could really define one seed value per person, household, etc. (really every unit) and anything random could depend on it. Reproducibility would also be much simpler.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions