Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion config/parameters.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
bundling:
bundle_size: 100
group_by: null
group_by: school
chart_diseases_header:
- Diphtheria
- Tetanus
Expand All @@ -21,6 +21,8 @@ encryption:
enabled: false
password:
template: '{date_of_birth_iso_compact}'
ignore_diseases:
- Other
ignore_agents:
- RSVAb
- VarIg
Expand Down
Empty file removed phu_templates/.gitkeep
Empty file.
111 changes: 32 additions & 79 deletions phu_templates/README.md
Original file line number Diff line number Diff line change
@@ -1,103 +1,56 @@
# PHU Templates Directory
# WDGPH VIPER Template Library

This directory contains Public Health Unit (PHU) specific template customizations.
The VIPER Template Library is a private repository that provides a centralized, version-controlled collection of reusable templates that are used across provinical Public Health VIPER workflows.

## Usage
This includes Typst document templates, YAML configuration schemas, and other standardized assets reuqired by the VIPER pipeline.

Each PHU should create a subdirectory here with their organization-specific templates:
By storing these artifacts in a dedicated private repostiroy and consuming them as a Git submodule, we ensure the following:

```
phu_templates/
├── my_phu/
│ ├── en_template.py (required for English output)
│ ├── fr_template.py (required for French output)
│ ├── conf.typ (required)
│ └── assets/ (optional - only if templates reference assets)
│ ├── logo.png (optional)
│ └── signature.png (optional)
```

## Running with PHU Templates
* Reproducibiltiy: downstream pipelines reference an exact, pinned version
* Consistency: all PHUs use the same reviewed and approved templates
* Governance: updates are traceable, reviewable, and versioned
* Modularity: templates remain independent from pipeline logic
* Security: PHU-specific assets remain internal and controlled

To use a PHU-specific template, specify the template name with `--template`:
This repository is intended for **internal** use and is included as a **submodule** inside downstream pipelienes under:

```bash
# Generate English notices
uv run viper students.xlsx en --template my_phu

# Generate French notices
uv run viper students.xlsx fr --template my_phu
```
/pipeline/phu_templates
```

This will load templates from `phu_templates/my_phu/`.

## Template File Requirements

### Core Requirements (Always Required)

- `conf.typ` - Typst configuration and utility functions

### Language-Specific Requirements (Based on Output Language)

- `en_template.py` - Required only if generating English notices (`--language en`)
- Must define `render_notice()` function
- Consulted only when `--language en` is specified

- `fr_template.py` - Required only if generating French notices (`--language fr`)
- Must define `render_notice()` function
- Consulted only when `--language fr` is specified

**Note:** A PHU may provide templates for only one language. If a user requests a language your template does not support, the pipeline will fail with a clear error message. If you only support one language, only include that template file (e.g., only `en_template.py`).

### Asset Requirements (Based on Template Implementation)

Assets in the `assets/` directory are **optional** and depend entirely on your template implementation:

- `assets/logo.png` - Only required if your `en_template.py` or `fr_template.py` references a logo
- `assets/signature.png` - Only required if your `en_template.py` or `fr_template.py` references a signature
- Other files - Any additional assets (e.g., `assets/header.png`, `assets/seal.pdf`) may be included and referenced in your templates
Pipeline developers are encouraged to update the submodule when template changes are required, ensuring changes are reviewed and versioned before integration.

**Note:** If your template references an asset (e.g., `include "assets/logo.png"` in Typst), that asset **must** exist. The pipeline will fail with a clear error if a referenced asset is missing.
## Directory Structure

## Creating a PHU Template
Templates are stored inside a folder named using the standard PHU acronym.

If your PHU supports both English and French:
Within each PHU folder, templates and assets can follow the following structure:

```bash
cp -r templates/ phu_templates/my_phu/
```
<PHU_ACRONYM>/
├── assets/
│ ├── logo.png
│ └── signature.png
├──en_template.py
├──fr_template.py
```

Then customize:
- Replace `assets/logo.png` with your PHU logo
- Replace `assets/signature.png` with your signature
- Modify `en_template.py` and `fr_template.py` as needed
- Adjust `conf.typ` for organization-specific styling
## Using the Submodule

### Testing Your Template
Downstream VIPER pipelines consume this repository as a Git submodule, allowing each pipeline to reference a specific, version-controlled snapshot of the templates.

```bash
# Test English generation
uv run viper students.xlsx en --template my_phu
### Adding the Submodule (Initial Setup)

# Test French generation (if you provided fr_template.py)
uv run viper students.xlsx fr --template my_phu
```
If the pipeline does not yet include the template library:

If a language template is missing:
```
FileNotFoundError: Template module not found: /path/to/phu_templates/my_phu/fr_template.py
Expected fr_template.py in /path/to/phu_templates/my_phu
git submodule add <PRIVATE_REPO_URL> phu_templates
git submodule update --init --recursive
```

If an asset referenced by your template is missing:
This will create:

```
FileNotFoundError: Logo not found: /path/to/phu_templates/my_phu/assets/logo.png
phu_templates/ # Points to this template library
```

## Git Considerations

**Important:** PHU-specific templates are excluded from version control via `.gitignore`.

- Templates in this directory will NOT be committed to the main repository
- Each PHU should maintain their templates in their own fork or separate repository
- The `README.md` file and `.gitkeep` are the only tracked files in this directory
30 changes: 29 additions & 1 deletion pipeline/orchestrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,7 @@ def run_step_2_preprocess(
output_dir: Path,
language: str,
run_id: str,
config_dir: Path,
) -> int:
"""Step 2: Preprocessing.

Expand All @@ -231,12 +232,38 @@ def run_step_2_preprocess(
df = preprocess.check_addresses_complete(df)

# Load configuration
config = load_config(config_dir / "parameters.yaml")
ignore_diseases = config.get("ignore_diseases", {})

vaccine_reference_path = preprocess.VACCINE_REFERENCE_PATH
vaccine_reference = json.loads(vaccine_reference_path.read_text(encoding="utf-8"))

if ignore_diseases:
ignore_set = set(ignore_diseases)

filtered_reference = {}

for vaccine, agents in vaccine_reference.items():

# Remove ignored diseases from agent list
cleaned_agents = [a for a in agents if a not in ignore_set]

# Keep only if something remains
if cleaned_agents:
filtered_reference[vaccine] = cleaned_agents

print(f"Ignored diseases: {', '.join(sorted(ignore_set))}")
print(
f"Filtered vaccine reference to {len(filtered_reference)} vaccines "
f"from {len(vaccine_reference)} total."
)

else:
filtered_reference = vaccine_reference

# Build preprocessing result
result = preprocess.build_preprocess_result(
df, language, vaccine_reference, preprocess.REPLACE_UNSPECIFIED
df, language, filtered_reference, preprocess.REPLACE_UNSPECIFIED, ignore_diseases
)

# Write artifact
Expand Down Expand Up @@ -574,6 +601,7 @@ def main() -> int:
output_dir,
args.language,
run_id,
config_dir
)
step_duration = time.time() - step_start
step_times.append(("Preprocessing", step_duration))
Expand Down
99 changes: 53 additions & 46 deletions pipeline/preprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -652,63 +652,69 @@ def enrich_grouped_records(
vaccine_reference: Dict[str, Any],
language: str,
chart_diseases_header: List[str] | None = None,
ignore_diseases: List[str] | None = None,
) -> List[Dict[str, Any]]:
"""Enrich grouped vaccine records with disease information.

If chart_diseases_header is provided, diseases not in the list are
collapsed into the "Other" category.
enriched = []
ignore_set = set(ignore_diseases or [])

Parameters
----------
grouped : List[Dict[str, Any]]
Grouped vaccine records with date_given and vaccine list.
vaccine_reference : Dict[str, Any]
Map of vaccine codes to disease names.
language : str
Language code for logging.
chart_diseases_header : List[str], optional
List of diseases to include in chart. Diseases not in this list
are mapped to "Other".

Returns
-------
List[Dict[str, Any]]
Enriched records with date_given, vaccine, and diseases fields.
"""
enriched: List[Dict[str, Any]] = []
for item in grouped:
vaccines = [
v.replace("-unspecified", "*").replace(" unspecified", "*")
for v in item["vaccine"]
]
diseases: List[str] = []
for vaccine in vaccines:
ref = vaccine_reference.get(vaccine, vaccine)
if isinstance(ref, list):
diseases.extend(ref)
else:
diseases.append(ref)

# Collapse diseases not in chart to "Other"
# Rule 1 — If ANY vaccine does not exist, drop entire row
if any(v not in vaccine_reference for v in item["vaccine"]):
continue

final_vaccines = []
final_diseases = []

for vaccine in item["vaccine"]:

disease_mapping = vaccine_reference.get(vaccine)

# Rule 2 — If mapping is "Other", skip this vaccine entirely
if disease_mapping == ["Other"] or disease_mapping == "Other":
continue

# Add vaccine to valid list
final_vaccines.append(vaccine)

# Add disease mapping
if isinstance(disease_mapping, list):
final_diseases.extend(disease_mapping)

# Rule 3 — HepB or HPV → also append "Other"
if any(d in ["Hepatitis B", "HPV"] for d in disease_mapping):
final_diseases.append("Other")

# If no vaccines remain, drop entire row
if not final_vaccines:
continue

# Rule 4 — Remove ignored diseases
final_diseases = [d for d in final_diseases if d not in ignore_set]

# Rule 5 — Collapse diseases not in header → "Other"
if chart_diseases_header:
filtered_diseases: List[str] = []
has_unmapped = False
for disease in diseases:
if disease in chart_diseases_header:
filtered_diseases.append(disease)
else:
has_unmapped = True
if has_unmapped and "Other" not in filtered_diseases:
filtered_diseases.append("Other")
diseases = filtered_diseases
allowed = set(chart_diseases_header)
final_diseases = [
d if d in allowed else "Other"
for d in final_diseases
]

vaccines_normalized = [
v.replace("-unspecified", "*").replace(" unspecified", "*")
for v in final_vaccines
]


enriched.append(
{
"date_given": item["date_given"],
"vaccine": vaccines,
"diseases": diseases,
"vaccine": vaccines_normalized,
"diseases": final_diseases,
}
)

return enriched


Expand All @@ -717,6 +723,7 @@ def build_preprocess_result(
language: str,
vaccine_reference: Dict[str, Any],
replace_unspecified: List[str],
ignore_diseases
) -> PreprocessResult:
"""Process and normalize client data into structured artifact.

Expand Down Expand Up @@ -790,7 +797,7 @@ def build_preprocess_result(
]
received_grouped = process_received_agents(row.IMMS_GIVEN, replace_unspecified) # type: ignore[attr-defined]
received = enrich_grouped_records(
received_grouped, vaccine_reference, language, chart_diseases_header
received_grouped, vaccine_reference, language, chart_diseases_header, ignore_diseases=None
)

postal_code = row.POSTAL_CODE if row.POSTAL_CODE else "Not provided" # type: ignore[attr-defined]
Expand Down
2 changes: 1 addition & 1 deletion templates/conf.typ
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
let table_content = align(center)[
#table(
columns: columns,
rows: (envelope_window_height),
rows: auto,
inset: font_size,
col1_content,
table.vline(stroke: vline_stroke),
Expand Down
Loading
Loading