Skip to content

Conversation

@kassyray
Copy link
Member

@kassyray kassyray commented Nov 18, 2025

This pull request introduces configuration-driven filtering of diseases during the preprocessing step of the pipeline. The main change is the ability to specify diseases to ignore via a config file, which then filters both the vaccine reference data and the records enriched during preprocessing. This improves flexibility and control over which diseases are included in downstream analysis.

Configuration and Filtering Enhancements:

  • Added support for loading an ignore_diseases list from the config file (parameters.yaml) in run_step_2_preprocess, and used it to filter out specified diseases from the vaccine reference before further processing.
  • Updated the preprocessing pipeline to pass the ignore_diseases parameter through to the build_preprocess_result and enrich_grouped_records functions, allowing ignored diseases to be excluded during enrichment. [1] [2] [3]

Disease Enrichment Logic:

  • Modified the enrich_grouped_records function to accept ignore_diseases and remove any diseases specified in this list from the enrichment results.
  • Adjusted the disease lookup and enrichment logic to ensure that records with vaccines not mapped in the filtered reference are removed, and improved normalization of vaccine names. [1] [2]

Preprocessing Pipeline Integration:

  • Ensured that the filtered vaccine reference and ignore list are consistently used throughout the preprocessing pipeline, including in artifact creation and enrichment.

These changes make the pipeline more configurable and robust by allowing easy exclusion of unwanted diseases via configuration.

@kassyray kassyray requested a review from jangevaare November 19, 2025 21:19
@codecov
Copy link

codecov bot commented Nov 19, 2025

Codecov Report

❌ Patch coverage is 41.66667% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pipeline/orchestrator.py 25.00% 8 Missing and 1 partial ⚠️
pipeline/preprocess.py 58.33% 3 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@kassyray
Copy link
Member Author

Regarding this PR - the code pushed is specific to a feature request from a PHU. I think the logic will break the needs required by other PHUs.

Need to think of logic to handle various use cases and update the function accordingly.

… from the immunization histories of clients. This needs to be an optional request and should not yet be pushed to main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants