Skip to content

[Feature] Implement KeyClass (Weak Supervision) for Clinical Notes #1107

@ShuqingZou

Description

@ShuqingZou

Hi PyHealth Team,

I am Shuqing Zou, an MSCS student at USF. My current work and internship focus on heuristic-based label generation and clinical risk prediction from unstructured EHR text, which aligns perfectly with the KeyClass pipeline.

Because of this strong alignment, I would love to give it a try and implement it from scratch for PyHealth. Since this is a non-trivial feature, I'd like to propose a phased approach to keep the PRs easy to review:

  • Phase 1: Data processing pipeline for MIMIC-III discharge summaries (including the TF-IDF filtering step).
  • Phase 2: Integrate weak supervision logic (Labeling Functions + Snorkel Label Model).
  • Phase 3: End-to-end self-training classifier (e.g., via BERT).

I read the contributing guidelines and am ready to branch off develop. Could you let me know if anyone is currently working on this? If not, I can start putting together a draft PR for Phase 1!

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions