Skip to content

Conversation

@jangevaare
Copy link
Member

@jangevaare jangevaare commented Oct 29, 2025

This needs additional testing, but I believe it's at a point now for review and comment.

Lots of changes, lots to discuss. This refactor is aimed at supporting all of our v0.2 milestone features. It has not focused on performance. I anticipate because of the focus on the single-client pdf compilation that pdf compilation will be slower overall, but performance testing is required to determine how much slower.

Because of how significant the changes are I don't think reviewing the files changed in github interface will be the way to go, but would rather recommend checking out the branch and trying run code and tests.

Documentation:

  • Cherry-picked docs/BRANCHING.md
  • Added docs/DOCUMENTATION_STANDARDS.md and docs/TESTING_STANDARDS.md
  • Added AGENTS.md for more consistent AI recommendations when used for development
  • numpy style strings used for functions
  • Module level docstrings included with module contracts and real world outcomes

Organization:

  • Scripts has been split to "pipeline" and "template" directories.
  • All shell scripts are now python, including run_pipeline.sh which is now orchestrator.py (Issue Explore Orchestration in Python? #7)
  • Pipeline is meant to be interacted with with uv run viper <input file> <language code>
  • Informative messages when pipeline steps used directly rather than via orchestrator.py/run viper
  • Entire pipeline built around per-client artifact creation and compilation. The exception is the initial preprocessed file creation
  • config/parameters.yaml is now the main place for configuration, and configuratons documented in config/README.md. Now favouring configuration more over CLI arguments generally
  • Cherry-picked python templates (PR refactor: add Python scripts for generating immunization notice templ… #38)
  • outputs organized into (An attempt at Change output file path naming convention #31)
    • output/artifacts/qr_codes, output/artifacts_typst
    • output/logs (just pre-processing for now)
    • output/metadata (batch manifests, pdf validation results - just page counts for now)
    • output/pdf_combined
    • output/pdf_individual

Batching

  • Batching occurs after pdf creation in an isolated step
  • Batching is disabled if encryption is used
  • Batching possible by sequence, as well as school + sequence and school board + sequence
  • Managed via config/parameters.yaml

Language & Dates & Standards

  • Languages now in ISO 639-1, and validated agaisnt an enum (Issue Update Language Flags to ISO 639-1 Standard #36)
  • Language specific files receive a prefix of the language code
  • Language stored in client data
  • Dates now in ISO 8601
  • babel package is used to format dates for notice use in target language
  • config/disease_normalization.json, is used in conjunction with language specific files within config/translation for display in overdue list and in immunization history chart, which reference normalized disease names.

Data processing

  • Diseases not selected for display in immunization history chart collapsed into "Other" category
  • If a School ID and Board ID is not provided, these will be generated using a short hash of the board and school name
  • Warning if board or school name missing
  • Warning if multiple clients per client ID

QR Codes & Encryption

  • Merged from @eswarchandravidyasagar 's feature branches (PR Qr codes #14 and Add PDF encryption functionality #21)
  • Full templating of both supported, available template fields validated agaisnt TemplateField enum. Shared utilities help implement.
  • Both managed via config/parameters.yaml
  • Unencrypted files immediately replaced with encrypted versions when enabled

Dependency/Project management

PR #60

  • Change pdf page counting to a more generic pdf validation step (retain pdf page counting check)
  • Example of using invisible markers in typst to support more specific pdf validation
  • Configuration of validation rules (disabled/warn/error) in parameters.yaml

PR #62

  • PDF batching is now PDF bundling
  • Bundling can be used at the same time as encryption - this is to support QA workflows, while still running the pipeline with digital notice delivery in mind
  • Clean up of non-encrypted PDFs now happens during the existing clean up stage, after optional batching has occurred
  • Renamed parameters surrounding clean up to improve clarity around what is happening before a run vs. after
  • before a pipeline run, we can clear the whole output directory, minus logs, which are retained
  • after a pipeline run we can clear non-encrypted pdfs (if batching AND/OR encryption is used), and remove output/artifacts directory

Question

  1. In the future could we accept language code in a column of the input data, rather than as a CLI argument
  2. Can we enable batching based on language, length of pages, and combined batching criteria (e.g. school + language)
  3. Can we check early in pipeline for pressence of required files for desired languages?
  4. How can we best incorporate various versions of notices, beyond language differences (e.g. versions specific for satellite office locations, or support A/B testing to improve communications)
  5. How can we do performance testing
  6. How can we implement parallel processing for PDF creation and PDF encryption particularly; can we share utilities?
    7. Can we transform counting pdf pages to a more fullsome PDF validation, using approaches like in @TiaTuinstra's testing work, and also add that to a metadata output? Addressed via Pdf validation #60
  7. Does it make sense to include data quality module directly in pipeline?
  8. Added docs were mainly to assist development; let's review and decide what make sense to keep/update, or drop going forward
    10. We may want to support batch-unencrypted + singular-encrypted outputs within the same run to facilitate PHN review Addressed via PR Bundle + Encrypt #62

Missing features
1. Page check is just number of pdf pages, beyond going to 3 pages, there is not a unique error for this, or routing of the malformed PDF elsewhere. I understand this may be a regression and we need to implement. Addressed via #60
2. The tests are questionable, we should plan to review more closely. I didn't think it was worth highlighting that there were 400+ tests now as I can't say I manually reviewed them all
3. Logging work @kassyray has begun on not present here (#37), but think there's a great opportunity to add!

eswarchandravidyasagar and others added 30 commits October 6, 2025 20:55
- Added QR code generation utility function in utils.py
- Modified preprocess.py to generate QR codes for each client
- Updated French and English template scripts to include QR codes
- Modified conf.typ to display QR codes in PDF notices
- QR codes contain client ID, name, DOB, and school information
- QR codes are saved as PNG files and embedded in PDFs
- Added QR code column to client_info_tbl_fr and client_info_tbl_en functions
- QR codes are displayed in the top-right corner of immunization notices
- Uses image.decode to render base64 encoded QR code images
- Maintains responsive layout with proper column widths
…ality; update template columns for better layout in English and French versions.
…for subsequent steps but slated for removal. Also add tests.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… moving from .sh to .py scripts where possible as working through pipeline.
PDF batching clean up, remove deprecated Pypdf2

More clean up
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jangevaare jangevaare merged commit 404e93c into main Nov 5, 2025
1 of 2 checks passed
@kassyray kassyray deleted the refactor/single-client-pdfs branch November 6, 2025 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants