Skip to content

Conversation

@rw251
Copy link
Collaborator

@rw251 rw251 commented Nov 28, 2025

  1. APCS report now gets the primary diagnosis counts in addition to the ones from the "all diagnoses" field.
  2. New output that does a similar thing for ONS deaths where we have a count for the primary cause of death, and another for all the supplementary cause of death codes
  3. A validation script that reports on any financial years, or ICD10 codes that don't match our expected regex. These might be ok, but should be a quick way to spot anything unusual - and eliminate the possibility of outputting erroneous patient identifiers in the large all diagnosis field.

- Now gives separate counts for occurrences in primary diagnosis and the all diagnoses field
Query to count the occurrences of ICD10 codes in the ONS death data. Counts the primary diagnosis, and the contributing factors separately. Rounds counts to 10, suppresses values <15, and excludes type 1 opt outs.
It occurs to me that we might get unexpected data. Validating this by the output checkers might be hard so let's add a report to help us:

- The financial_year column in apcs might contain unexpected data as I don't think anyone has used this before. It should be in the format "202425" or maybe "2024-25", so if not we report it.
- The ICD10 codes in ONS have, I think, already been validated, but I'm not 100% and the ones in the all_diagnoses field probably haven't. So we check each one against a regex and report on those that don't match. Some of these may be valid, in which case we can update the regex. But this removes the risk that somehow patient identifiable data appears in that field.
Copy link

@Jongmassey Jongmassey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor nits.

Also, having re-read the docs, there's a secondary diagnosis as well - we should probably take the opportunity to include this for completeness. The information as to how often primary/secondary appear in "all" or not is very useful on its own!

# - A000 or A00X (4 chars without dot)
# - A00.00 or A00.0X (6 chars with dot)
# - A0000 or A000X (5 chars without dot)
ICD10_PATTERN = re.compile(r"^[A-Z][0-9]{2}\.?[0-9X]?[0-9]?$")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to correctly match A00X0 and A00.X0 so I think the comment is wrong?

Der_Financial_Year,
apcs.APCS_Ident,
apcs.Der_Financial_Year,
LTRIM(RTRIM(der.Spell_Primary_Diagnosis)) as primary_diagnosis,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

der.Spell_Secondary_Diagnosis too for good measure?

our docs say

Code indicating secondary diagnosis. This is a single code giving the first listed secondary diagnosis, but there may other secondary diagnoses listed in the all_diagnoses field below.

@rw251 rw251 merged commit b8442e9 into main Dec 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants