Skip to content

feat: expand python-agent-driver to 102 pip packages#80

Merged
danbugs merged 2 commits into
mainfrom
feat/expand-packages
May 22, 2026
Merged

feat: expand python-agent-driver to 102 pip packages#80
danbugs merged 2 commits into
mainfrom
feat/expand-packages

Conversation

@danbugs
Copy link
Copy Markdown
Contributor

@danbugs danbugs commented May 22, 2026

Summary

  • Expand shipped pip packages from 18 to 102, covering data science, web, NLP, dev tools, and more
  • Removed edgartools (its transitive dep pyarrow pulls in concurrent.futures.thread which crashes Unikraft)
  • Add docx and pptx to the pre-import warmup list (zero import cost at runtime)
  • Increase default heap from 1.5 GiB to 2.5 GiB to accommodate the larger rootfs
  • Add pydoc stub for pyarrow compatibility
  • Add docs/python-packages.md listing all supported packages with import names
  • Add CI smoke test step and improve pandas benchmark reliability

Test plan

  • pyhl setup --force completes without crash
  • pyhl run -c "import pandas; print(pandas.DataFrame({'a':[1,2,3]}).describe())" works
  • Benchmark CI passes on both Linux and Windows
  • Spot-check new packages: import duckdb, import polars, import sklearn

Copilot AI review requested due to automatic review settings May 22, 2026 19:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the python-agent-driver image to include a much larger set of preinstalled Python packages, updates warmup pre-imports to reduce per-run import overhead for a few additional modules, and increases the VM heap/memory defaults to accommodate the larger rootfs. It also adds documentation enumerating supported packages and their import names.

Changes:

  • Increased sandbox heap sizing used by pyhl install/setup and pydriver-run to 2.5 GiB.
  • Expanded the pre-import warmup module list (Rust help + guest warmup loop) and broadened the shipped package set in the driver rootfs Docker build.
  • Added docs/python-packages.md to document supported packages and import names.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
host/src/pyhl.rs Increases heap size used during install warmup/snapshot creation.
host/src/bin/pyhl.rs Expands preimport list, updates --help text, and increases heap size during pyhl setup.
host/src/bin/pydriver_run.rs Aligns heap size with the new 2.5 GiB default for running the driver.
examples/python-agent-driver/Justfile Updates the example memory setting value.
examples/python-agent-driver/hl_pydriver.c Expands guest-side warmup pre-import list to match pyhl’s list.
examples/python-agent-driver/Dockerfile Installs a significantly larger set of Python packages into the rootfs.
docs/python-packages.md Adds documentation listing supported packages and import names.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/python-packages.md Outdated
@@ -0,0 +1,150 @@
# Python package support

The python-agent-driver ships CPython 3.12 with the full standard library and 103 third-party pip packages.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated wording to 'explicitly installed top-level packages'.

Comment thread host/src/bin/pyhl.rs Outdated
Comment on lines +123 to +137
Additional packages shipped in the rootfs (pay import cost on \
first use): aiohttp, altair, APScheduler, bandit, bokeh, \
boto3, builtwith, celery, coverage, distro, docx2txt, duckdb, \
edgartools, exchange-calendars, fabric, Faker, fastapi, \
feedparser, fpdf2, gensim, gitpython, \
google-api-python-client, hypercorn, hypothesis, loguru, \
markdown, markdownify, mutagen, networkx, nltk, \
numpy-financial, odfpy, paramiko, pdfplumber, pdfrw, pexpect, \
pipdeptree, platformdirs, plotly, polars, praw, pycountry, \
pydub, pyflakes, pygments, pylint, PyPDF2, pytest, \
pytest-asyncio, pytest-cov, pyxlsb, qrcode, radon, rapidfuzz, \
rarfile, reportlab, rope, ruff, schedule, scikit-learn, scipy, \
scrapy, send2trash, slack-sdk, srt, statsmodels, svgwrite, \
sympy, textblob, trafilatura, tweepy, typer, uvicorn, vulture, \
watchdog, websockets, wordcloud, xlrd, xlsxwriter.\n\n\
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — deduplicating CLI help text into a shared source is a good idea but out of scope for this PR.

Comment on lines 12 to +33
FROM python:3.12-slim AS deps
RUN pip install --target=/deps --no-cache-dir \
tqdm pyyaml jinja2 beautifulsoup4 tabulate click tenacity \
python-dotenv pypdf openpyxl markdown-it-py pydantic pillow \
lxml cryptography python-dateutil numpy pandas
lxml cryptography python-dateutil numpy pandas \
python-docx python-pptx \
chardet charset-normalizer \
requests httpx aiohttp \
feedparser markdown markdownify \
Faker pycountry \
loguru schedule send2trash \
duckdb polars \
xlrd xlsxwriter pyxlsb odfpy \
pdfplumber pdfrw PyPDF2 \
qrcode svgwrite \
rapidfuzz \
networkx sympy \
pydub srt mutagen \
plotly altair bokeh \
statsmodels scikit-learn scipy \
wordcloud \
nltk textblob gensim \
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added --prefer-binary to avoid sdist fallbacks.

kernel := ".unikraft/build/python-agent-driver-hyperlight_hyperlight-x86_64"
initrd := "python-agent-driver-initrd.cpio"
memory := "1Gi"
memory := "2560Mi"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing — the memory variable was already unused before this PR. Leaving as-is for now.

@danbugs danbugs force-pushed the feat/expand-packages branch from 99cb3b7 to 31f2a74 Compare May 22, 2026 19:41
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux Benchmarks

Details
Benchmark suite Current: fd6265c Previous: 7162172 Ratio
hello_world (median) 20 ms 20 ms 1
pandas (median) 110 ms 100 ms 1.10
density (per VM) 11 MB 7 MB 1.57
snapshot (disk) 656 MiB 385 MiB 1.70

This comment was automatically generated by workflow using github-action-benchmark.

danbugs added 2 commits May 22, 2026 22:29
- Add 88 new pip packages to Dockerfile (removed edgartools due to
  pyarrow pulling in concurrent.futures.thread which crashes Unikraft)
- Bump heap to 2.5 GiB across all entry points to accommodate the
  larger rootfs
- Pre-import docx and pptx during warmup for zero-cost access
- Add pydoc stub for pyarrow compatibility
- Update --help to list all shipped packages
- Add docs/python-packages.md reference

Signed-off-by: danbugs <danilochiarlone@gmail.com>
- Add smoke test step before benchmarks to catch pyhl run failures early
- Add explicit pandas smoke check before timing loop
- Redirect stdout to /dev/null in timing runs to isolate timing output

Signed-off-by: danbugs <danilochiarlone@gmail.com>
@danbugs danbugs force-pushed the feat/expand-packages branch from 6dbc7a8 to fd6265c Compare May 22, 2026 22:29
@danbugs danbugs changed the title feat: expand python-agent-driver to 103 pip packages feat: expand python-agent-driver to 102 pip packages May 22, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows Benchmarks

Details
Benchmark suite Current: fd6265c Previous: 7162172 Ratio
hello_world (median) 352 ms 234 ms 1.50
pandas (median) 1075 ms 699 ms 1.54
density (per VM) 10 MB 6 MB 1.67
snapshot (disk) 663 MiB 392 MiB 1.69

This comment was automatically generated by workflow using github-action-benchmark.

@danbugs danbugs merged commit 581b8d8 into main May 22, 2026
79 checks passed
@danbugs danbugs deleted the feat/expand-packages branch May 22, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants