Skip to content

Conversation

@SFJohnson24
Copy link
Collaborator

@SFJohnson24 SFJohnson24 commented Nov 5, 2025

neg.json
Rule_underscores.json
pos.json

This PR updates the get_codelist_attributes operator for rule cg0288. The operation now correctly works for blank rows, only returns CT for rows that have a CT package corresponding with the CT from the operation params. To test-- run the rule above (has an exists clause that the editor test data does not)

Key changes include:

Refactoring and Enhancements to Codelist Attribute Extraction:

  • Refactored _get_codelist_attributes in get_codelist_attributes.py to use a new row-wise mapping approach for determining CT package names, supporting both pandas and Dask datasets, and mapping packages to codelist sets more efficiently. [1] [2]
  • Introduced helper methods for extracting codes by attribute (e.g., term code, codelist code, preferred term) from CT package metadata, replacing the previous generic logic with more precise and extensible functions.

Improvements to Controlled Terminology Package Handling:

  • Improved CT package loading by parsing package names to extract type and version, ensuring the right CT package data is loaded before metadata access.
  • Removed the now-unused ct_package parameter in favor of the plural ct_packages everywhere, simplifying parameter handling and reducing confusion. [1] [2] [3]

Test Suite Updates:

  • Updated unit tests for get_codelist_attributes to match the new CT package structure, including submission_lookup and detailed term metadata, ensuring tests reflect the new logic and data model. [1] [2] [3]
  • Cleaned up test imports by removing unused imports.

Bug Fixes and Data Handling:

  • Improved handling of set and Series types in is_contained_by to ensure compatibility with pandas isin and vectorized operations.

These changes collectively make the codebase more robust, easier to maintain, and ready for future extensions in codelist and controlled terminology processing.

@SFJohnson24 SFJohnson24 marked this pull request as ready for review November 6, 2025 01:04
Copy link
Collaborator

@RamilCDISC RamilCDISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR updates the CT package lookup in engine as per discussion in connected issue.
The PR was validated by:

  1. Reviewing the PR for any unwanted changes or comments.
  2. Reviewing the updated logic in accordance with the AC.
  3. Ensuring all unit and regression tests pass.
  4. Ensuring all unit or regression tests are updated if required.
  5. Ensuring the schema is updated.
  6. Running manual tests on dev editor using positive dataset.
  7. Running manual dataset on test using negative dataset.
  8. Ensuring compatibility with other operators.

@RamilCDISC RamilCDISC merged commit 13ee441 into main Nov 10, 2025
18 of 19 checks passed
@RamilCDISC RamilCDISC deleted the CG0288 branch November 10, 2025 23:00
alexfurmenkov pushed a commit that referenced this pull request Nov 13, 2025
* attribute incorrectly named

* tests, operator working

* docs

---------

Co-authored-by: RamilCDISC <113539111+RamilCDISC@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants