[Feature] Implement KeyClass (Weak Supervision) for Clinical Notes

Hi PyHealth Team,

I am Shuqing Zou, an MSCS student at USF. My current work and internship focus on heuristic-based label generation and clinical risk prediction from unstructured EHR text, which aligns perfectly with the KeyClass pipeline.

Because of this strong alignment, I would love to give it a try and implement it from scratch for PyHealth. Since this is a non-trivial feature, I'd like to propose a phased approach to keep the PRs easy to review:

- Phase 1: Data processing pipeline for MIMIC-III discharge summaries (including the TF-IDF filtering step).
- Phase 2: Integrate weak supervision logic (Labeling Functions + Snorkel Label Model).
- Phase 3: End-to-end self-training classifier (e.g., via BERT).

I read the contributing guidelines and am ready to branch off `develop`. Could you let me know if anyone is currently working on this? If not, I can start putting together a draft PR for Phase 1!

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Implement KeyClass (Weak Supervision) for Clinical Notes #1107

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Implement KeyClass (Weak Supervision) for Clinical Notes #1107

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions