Promptehr pr integration by jalengg · Pull Request #878 · sunlabuiuc/PyHealth

jalengg · 2026-03-02T00:09:13Z

No description provided.

…w issues

Also: accept BartConfig object as bart_config_name for tiny test models.

…eneration

Guard drive.mount() with os.path.isdir('/content/drive/MyDrive') check so re-running the cell does not raise ValueError: Mountpoint must not already contain files.

…scade Wrap cardiology_detect (scipy), EEG_abnormal/events (mne), sleep_staging variants (mne), and temple_university_EEG_tasks (mne) in try/except so that pyhealth.tasks import does not fail in Colab where numpy 2.x breaks scipy._lib._util. Mirrors the identical fix in halo-pr-528.

- When all 3 files exist in DATA_DIR (Drive-backed), print sizes and skip upload entirely — mirrors HALO notebook UX - Normalize uploaded filenames via shutil.copy so Colab's duplicate rename (e.g. ADMISSIONS (1).csv) maps to canonical name in Drive - Keep idempotent drive.mount() guard from previous fix

Wrap ChestXray14Dataset, COVID19CXRDataset (PIL/torchvision), SleepEDFDataset, TUABDataset, TUEVDataset (mne) in try/except so datasets/__init__ does not fail when optional deps are absent. TUABDataset was the immediate cause: tuab.py imports EEGAbnormalTUAB from pyhealth.tasks, which is now silently absent when mne is unavailable. Mirrors identical guards in halo-pr-528.

- Add --force-reinstall to pip install so Colab never loads a stale cached build that lacks the try/except import guards - Switch to subprocess.run with returncode check (mirrors HALO pattern) - Update preamble: last_modified 2026-03-03, commit 394e128

Wrap biot (einops), cnn/graph_torchvision/torchvision/vision_embedding (PIL/torchvision), grasp (sklearn→scipy cascade), molerec/safedrug (rdkit), tfm_tokenizer (einops), transformers_model/text_embedding/sdoh (transformers) in try/except — mirrors halo-pr-528. Also removes duplicate medlink import.

- tuab.py, tuev.py: wrap task imports in try/except (= None fallback) so TUABDataset/TUEVDataset load cleanly when mne is unavailable. Mirrors halo-pr-528 commit b1470ad. - Notebook preamble: restructured to match HALO layout (What You'll Need / How It Works / Important Notes / References); removed 'Why PromptEHR is different from HALO' section per user request. - Timestamp: 2026-03-04 08:37:50 UTC

…clobber --force-reinstall reinstalls all transitive deps, which could downgrade scipy back to the old Colab binary. Installing scipy>=1.14 in a second pip call after PyHealth ensures it is the final version on disk when s4-dataset later triggers the transformers→sklearn→scipy import chain.

PIL._typing._Ink moved between Pillow versions; --force-reinstall can leave the package in an inconsistent state. Pinning Pillow>=10.4.0 in the post-PyHealth upgrade step ensures consistent PIL internals.

…y (PIL/torchvision unavailable)

…ded each session Root cause: s2-config called os.makedirs(DATA_DIR) before Drive was mounted, creating a local /content/drive/MyDrive directory. The s3-upload guard then saw isdir('/content/drive/MyDrive') == True and skipped drive.mount(), so all file checks ran against an empty local path. Fix: - s2-config: skip makedirs in Colab (Drive not yet mounted) - s3-upload: use os.path.ismount('/content/drive') guard (checks actual filesystem mount, not directory existence); makedirs after mount

… state PyHealth --force-reinstall can leave numpy/scipy in a mixed state where Python files and compiled .so extensions are from different versions, causing 'cannot import name _center from numpy._core.umath'. Fix: add --force-reinstall and explicit numpy~=2.2.0 to the post-PyHealth pip upgrade step, guaranteeing all numpy/scipy files are from consistent versions that support each other and numpy 2.x.

- Add FeatureProcessor import (was in __all__ but never imported) - Remove LabelProcessor from __all__ (class does not exist) - Guard ImageProcessor/TimeImageProcessor with (ImportError, RuntimeError) to catch broken Pillow installs that raise RuntimeError, not ImportError - Build __all__ dynamically so guarded processors are only listed when their imports succeed - Change numpy~=2.2.0 → numpy>=2.0.0 in notebook post-install to avoid hard ceiling at <2.3 that would downgrade as Colab numpy advances

- pyproject.toml: numpy~=2.2.0 → numpy>=2.0.0 (removes <2.3 ceiling; prevents downgrade when Colab has numpy 2.3.x, which was the root cause of the recurring _center ImportError) - s1-setup: remove --force-reinstall and numpy from post-install step; use --upgrade instead (force-reinstall of scipy force-reinstalls numpy transitively, creating mixed-version compiled/Python state) - s3-upload: drive.mount(..., force_remount=True) to handle stale FUSE mount state that raised "Mountpoint must not already contain files"

…ll+install

… install Same pattern as HALO's scipy fix (b80f837): PyHealth install may partially upgrade Pillow (via torch→torchvision→Pillow cascade), leaving mixed .py/.so files. Force-reinstall only Pillow (--no-deps) before it gets imported so all files come from one version.

transformers 4.53+ eagerly imports loss_utils → image_utils → torchvision → PIL, even for non-vision models like BART. In Colab, Pillow is in a mixed-version state that can't be fixed by pip (system-managed files). Fix: temporarily remove torchvision from sys.modules during the BART import so transformers skips the vision chain entirely. PromptEHR only needs BART, not vision functionality.

No PromptEHR task uses icustays, and most users don't have the file.

HuggingFace Trainer moves bart_model to GPU but doesn't move the parent PromptEHR module. self.device (from _dummy_param) stays CPU while bart_model is on GPU, causing RuntimeError during generation.

transformers defaults to beam search which fails with batch_size inference on our single-token encoder input. PromptEHR uses nucleus/greedy sampling, not beam search.

BART generate() always starts output with decoder_start_token_id (BOS=1). The ported code treated BOS as a stop token (break), causing decode_tokens to return empty visits for every patient. Original pehr_scratch/generate.py::parse_sequence_to_visits uses continue to skip BOS — this was a porting bug. Fix from promptehr-port branch commit 97f6a7b.

jalengg added 30 commits March 1, 2026 01:41

T1: add PromptEHR source files, task stub, and examples

078ecf9

T2: add PromptEHRGenerationMIMIC3 BaseTask with demographics

6a65142

T3+fix: Refactor PromptEHR to BaseModel; fix T2 early exit + T3 revie…

39ec171

…w issues

T5: Add PromptEHR PyHealth 2.0 generation example

68f0ca3

T4: Add PromptEHR PyHealth 2.0 training example

54a0836

T7: Update PromptEHR docstrings to Google/PyHealth style

9e9589a

T8: Add PromptEHR integration tests (8 pass, 4 skip MIMIC-III)

ad0b27b

Also: accept BartConfig object as bart_config_name for tiny test models.

Add PromptEHR Colab notebook: demographic-conditioned synthetic EHR g…

a7be297

…eneration

Fix: idempotent Drive mount in Colab notebook

b1cc36d

Guard drive.mount() with os.path.isdir('/content/drive/MyDrive') check so re-running the cell does not raise ValueError: Mountpoint must not already contain files.

Chore: update preamble SHA to 8c176e9

e77178a

Chore: switch preamble timestamp to UTC (no SHA lag)

6a4e1c8

Chore: update notebook timestamp 2026-03-04 08:21:17 UTC

2dad59c

Fix: add Pillow>=10.4.0 to post-install step to fix mixed PIL state

2d4b5be

PIL._typing._Ink moved between Pillow versions; --force-reinstall can leave the package in an inconsistent state. Pinning Pillow>=10.4.0 in the post-PyHealth upgrade step ensures consistent PIL internals.

Fix: guard ImageProcessor/TimeImageProcessor in processors/__init__.p…

b30c27b

…y (PIL/torchvision unavailable)

Chore: update notebook timestamp to 2026-03-04 09:55:13 (UTC)

2a50760

Chore: fix notebook timestamp to 2026-03-04 10:20:35 (UTC)

9ac2960

Chore: fix notebook timestamp to 2026-03-04 10:31:20 (UTC)

cbdd115

Fix numpy mixed-version error: replace --force-reinstall with uninsta…

364d6f6

…ll+install

Remove step 2 dep upgrades causing Pillow mixed-version state

121999f

jalengg force-pushed the promptehr-pr-integration branch from 9b5cfa5 to 732d207 Compare March 4, 2026 19:22

jalengg added 9 commits March 4, 2026 13:37

Fix PyHealth 2.0 API: remove code_mapping kwarg, use unique_patient_ids

cd42f5b

Remove icustays from MIMIC3Dataset defaults (same fix as HALO c52aa0b)

8c1120e

No PromptEHR task uses icustays, and most users don't have the file.

Fix device mismatch in synthesize_dataset: use bart_model's device

cb0f6f9

HuggingFace Trainer moves bart_model to GPU but doesn't move the parent PromptEHR module. self.device (from _dummy_param) stays CPU while bart_model is on GPU, causing RuntimeError during generation.

Fix beam search crash in generate: set num_beams=1 explicitly

f5a7d4c

transformers defaults to beam search which fails with batch_size inference on our single-token encoder input. PromptEHR uses nucleus/greedy sampling, not beam search.

Cleanup: simplify PromptEHR notebook to match HALO structure

3a551c5

Cleanup: remove duplicate reference section and footer from notebook

a967111

Fix: suppress wandb prompt and early_stopping warning during training

7c1ac57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promptehr pr integration#878

Promptehr pr integration#878
jalengg wants to merge 40 commits intosunlabuiuc:masterfrom
jalengg:promptehr-pr-integration

jalengg commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jalengg commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant