-
Notifications
You must be signed in to change notification settings - Fork 26
Description
core's random() function (lines 308-348) is used for 3 variables in policyengine-us and 15 variables in policyengine-uk. Two inputs determine the seed for each person:
- Entity ID (
f"{population.entity.key}_id") - Call count: How many times random() has been called in the simulation
The seed formula:
seed = int(abs(id * 100 + population.simulation.count_random_calls))
Example:
- Person 5, 1st call to random() →
seed = int(abs(5 * 100 + 1)) = 501 - Person 5, 2nd call to random() →
seed = int(abs(5 * 100 + 2)) = 502 - Person 7, 1st call to random() →
seed = int(abs(7 * 100 + 3)) = 703
For context, the multiplication by 100 has caused integer overflow problems in the past.
In the local area calibration case, where donor households must have their state_fips swapped, re-keying with with new person_ ids is unavoidable. Because the random() function is linked to person_id, the final Microsimulation from local area calibration will never match the matrix times the weights.
For instance, here is how snap in policyengine-us relates to the random function:
snap
└── snap_gross_income
└── snap_unearned_income (uses `adds`)
└── ssi (SSI benefit amount)
└── is_ssi_eligible
└── meets_ssi_resource_test
└── random() ← stochastic eligibility
What this means is that a household with $2000 in snap, that was assigned a weight of 150 - partially due to this snap value - might end up in the final microsimulation with $1800 in snap, but still a weight of 150. Then when we run Microsimulation.calculate('snap').sum(), we don't match the values from X @ w in the calibration. Whether that because of a bug in the construction of the very complex X, or is it because of random snap, is very difficult to tell.
Recommendation: Use seeds stored in the microdata like the SNAP take-up seed, and remove random from core.
SNAP's "take-up seed" works quite differently. (policyengine-us/policyengine_us/variables/gov/usda/snap/snap_take_up_seed.py)
Here's the SNAP takeup mechanism:
File 1: snap_take_up_seed.py (lines 1-8)
class snap_take_up_seed(Variable):
value_type = float
entity = SPMUnit
label = "Randomly assigned seed for SNAP take-up"
definition_period = YEAR
No formula
File 2: takes_up_snap_if_eligible.py (lines 10-13)
def formula(spm_unit, period, parameters):
seed = spm_unit("snap_take_up_seed", period)
takeup_rate = parameters(period).gov.usda.snap.takeup_rate
return seed < takeup_rate
Here, the snap_take_up_seed is defined in cps.py (line 230) in policyengine-us-data:
data["snap_take_up_seed"] = generator.random(len(data["spm_unit_id"]))
This approach work better with local area calibration because that seed becomes linked to the household as a sort of property. We could really define one seed value per person, household, etc. (really every unit) and anything random could depend on it. Reproducibility would also be much simpler.