Hi PyHealth Team,
I am Shuqing Zou, an MSCS student at USF. My current work and internship focus on heuristic-based label generation and clinical risk prediction from unstructured EHR text, which aligns perfectly with the KeyClass pipeline.
Because of this strong alignment, I would love to give it a try and implement it from scratch for PyHealth. Since this is a non-trivial feature, I'd like to propose a phased approach to keep the PRs easy to review:
- Phase 1: Data processing pipeline for MIMIC-III discharge summaries (including the TF-IDF filtering step).
- Phase 2: Integrate weak supervision logic (Labeling Functions + Snorkel Label Model).
- Phase 3: End-to-end self-training classifier (e.g., via BERT).
I read the contributing guidelines and am ready to branch off develop. Could you let me know if anyone is currently working on this? If not, I can start putting together a draft PR for Phase 1!
Thanks!
Hi PyHealth Team,
I am Shuqing Zou, an MSCS student at USF. My current work and internship focus on heuristic-based label generation and clinical risk prediction from unstructured EHR text, which aligns perfectly with the KeyClass pipeline.
Because of this strong alignment, I would love to give it a try and implement it from scratch for PyHealth. Since this is a non-trivial feature, I'd like to propose a phased approach to keep the PRs easy to review:
I read the contributing guidelines and am ready to branch off
develop. Could you let me know if anyone is currently working on this? If not, I can start putting together a draft PR for Phase 1!Thanks!