Skip to content

Conversation

@majiayu000
Copy link

Description

This PR addresses two related issues with MHC (Mental Health Catalogue) matching:

  1. Issue Remove empty items from MHC #7 - Remove empty items from MHC: Empty or whitespace-only MHC questions are now skipped during matching. This prevents invalid items from being matched to user questions.

  2. Issue Don't match to MHC items if similarity is too low #8 - Don't match to MHC items if similarity is too low: Added a new mhc_min_similarity parameter that allows filtering out matches below a similarity threshold. Default is 0.0 for backward compatibility, but can be set higher (e.g., 0.5) to filter unrelated matches.

Changes

  • Modified matcher.py to build a mask of valid MHC questions and skip empty/whitespace items
  • Added mhc_min_similarity parameter to both match_instruments_with_function and match_instruments
  • Added comprehensive unit tests for the new functionality

Testing

  • All 108 existing tests pass
  • Added 5 new tests specifically for MHC filtering:
    • test_empty_mhc_questions_are_skipped
    • test_whitespace_only_mhc_questions_are_skipped
    • test_low_similarity_no_match
    • test_high_similarity_match
    • test_threshold_filters_unrelated

Usage

# Default behavior (backward compatible)
match_response = match_instruments([instrument], mhc_questions=mhc_questions, ...)

# With similarity threshold to filter unrelated matches
match_response = match_instruments([instrument], mhc_questions=mhc_questions, 
                                   mhc_min_similarity=0.5, ...)

Fixes #7, #8

…ydata#7, harmonydata#8)

- Skip MHC questions with empty or whitespace-only text when finding
  matches, ensuring invalid items don't get matched
- Add mhc_min_similarity parameter to filter out matches below a
  threshold (default 0.0 for backward compatibility)
- Add unit tests for empty MHC filtering and similarity threshold

Signed-off-by: majiayu000 <1835304752@qq.com>
@woodthom2
Copy link
Contributor

Thanks @majiayu000 , please give me a couple of weeks to check this over

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove empty items from MHC

2 participants