Draft
Conversation
Also: accept BartConfig object as bart_config_name for tiny test models.
Guard drive.mount() with os.path.isdir('/content/drive/MyDrive') check
so re-running the cell does not raise ValueError: Mountpoint must not
already contain files.
…scade Wrap cardiology_detect (scipy), EEG_abnormal/events (mne), sleep_staging variants (mne), and temple_university_EEG_tasks (mne) in try/except so that pyhealth.tasks import does not fail in Colab where numpy 2.x breaks scipy._lib._util. Mirrors the identical fix in halo-pr-528.
- When all 3 files exist in DATA_DIR (Drive-backed), print sizes and skip upload entirely — mirrors HALO notebook UX - Normalize uploaded filenames via shutil.copy so Colab's duplicate rename (e.g. ADMISSIONS (1).csv) maps to canonical name in Drive - Keep idempotent drive.mount() guard from previous fix
Wrap ChestXray14Dataset, COVID19CXRDataset (PIL/torchvision), SleepEDFDataset, TUABDataset, TUEVDataset (mne) in try/except so datasets/__init__ does not fail when optional deps are absent. TUABDataset was the immediate cause: tuab.py imports EEGAbnormalTUAB from pyhealth.tasks, which is now silently absent when mne is unavailable. Mirrors identical guards in halo-pr-528.
- Add --force-reinstall to pip install so Colab never loads a stale cached build that lacks the try/except import guards - Switch to subprocess.run with returncode check (mirrors HALO pattern) - Update preamble: last_modified 2026-03-03, commit 394e128
Wrap biot (einops), cnn/graph_torchvision/torchvision/vision_embedding (PIL/torchvision), grasp (sklearn→scipy cascade), molerec/safedrug (rdkit), tfm_tokenizer (einops), transformers_model/text_embedding/sdoh (transformers) in try/except — mirrors halo-pr-528. Also removes duplicate medlink import.
- tuab.py, tuev.py: wrap task imports in try/except (= None fallback) so TUABDataset/TUEVDataset load cleanly when mne is unavailable. Mirrors halo-pr-528 commit b1470ad. - Notebook preamble: restructured to match HALO layout (What You'll Need / How It Works / Important Notes / References); removed 'Why PromptEHR is different from HALO' section per user request. - Timestamp: 2026-03-04 08:37:50 UTC
…clobber --force-reinstall reinstalls all transitive deps, which could downgrade scipy back to the old Colab binary. Installing scipy>=1.14 in a second pip call after PyHealth ensures it is the final version on disk when s4-dataset later triggers the transformers→sklearn→scipy import chain.
PIL._typing._Ink moved between Pillow versions; --force-reinstall can leave the package in an inconsistent state. Pinning Pillow>=10.4.0 in the post-PyHealth upgrade step ensures consistent PIL internals.
…y (PIL/torchvision unavailable)
…ded each session
Root cause: s2-config called os.makedirs(DATA_DIR) before Drive was mounted,
creating a local /content/drive/MyDrive directory. The s3-upload guard then
saw isdir('/content/drive/MyDrive') == True and skipped drive.mount(), so
all file checks ran against an empty local path.
Fix:
- s2-config: skip makedirs in Colab (Drive not yet mounted)
- s3-upload: use os.path.ismount('/content/drive') guard (checks actual
filesystem mount, not directory existence); makedirs after mount
… state PyHealth --force-reinstall can leave numpy/scipy in a mixed state where Python files and compiled .so extensions are from different versions, causing 'cannot import name _center from numpy._core.umath'. Fix: add --force-reinstall and explicit numpy~=2.2.0 to the post-PyHealth pip upgrade step, guaranteeing all numpy/scipy files are from consistent versions that support each other and numpy 2.x.
- Add FeatureProcessor import (was in __all__ but never imported) - Remove LabelProcessor from __all__ (class does not exist) - Guard ImageProcessor/TimeImageProcessor with (ImportError, RuntimeError) to catch broken Pillow installs that raise RuntimeError, not ImportError - Build __all__ dynamically so guarded processors are only listed when their imports succeed - Change numpy~=2.2.0 → numpy>=2.0.0 in notebook post-install to avoid hard ceiling at <2.3 that would downgrade as Colab numpy advances
- pyproject.toml: numpy~=2.2.0 → numpy>=2.0.0 (removes <2.3 ceiling; prevents downgrade when Colab has numpy 2.3.x, which was the root cause of the recurring _center ImportError) - s1-setup: remove --force-reinstall and numpy from post-install step; use --upgrade instead (force-reinstall of scipy force-reinstalls numpy transitively, creating mixed-version compiled/Python state) - s3-upload: drive.mount(..., force_remount=True) to handle stale FUSE mount state that raised "Mountpoint must not already contain files"
… install Same pattern as HALO's scipy fix (b80f837): PyHealth install may partially upgrade Pillow (via torch→torchvision→Pillow cascade), leaving mixed .py/.so files. Force-reinstall only Pillow (--no-deps) before it gets imported so all files come from one version.
9b5cfa5 to
732d207
Compare
transformers 4.53+ eagerly imports loss_utils → image_utils → torchvision → PIL, even for non-vision models like BART. In Colab, Pillow is in a mixed-version state that can't be fixed by pip (system-managed files). Fix: temporarily remove torchvision from sys.modules during the BART import so transformers skips the vision chain entirely. PromptEHR only needs BART, not vision functionality.
No PromptEHR task uses icustays, and most users don't have the file.
HuggingFace Trainer moves bart_model to GPU but doesn't move the parent PromptEHR module. self.device (from _dummy_param) stays CPU while bart_model is on GPU, causing RuntimeError during generation.
transformers defaults to beam search which fails with batch_size inference on our single-token encoder input. PromptEHR uses nucleus/greedy sampling, not beam search.
BART generate() always starts output with decoder_start_token_id (BOS=1). The ported code treated BOS as a stop token (break), causing decode_tokens to return empty visits for every patient. Original pehr_scratch/generate.py::parse_sequence_to_visits uses continue to skip BOS — this was a porting bug. Fix from promptehr-port branch commit 97f6a7b.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.