Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .github/workflows/benchmarks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,15 @@ jobs:
if: steps.kvm_check.outputs.available == 'true'
run: pyhl setup --from src-dir --force

- name: "Smoke: verify pyhl run works"
if: steps.kvm_check.outputs.available == 'true'
run: |
echo "--- snapshot and system info ---"
ls -lh .pyhl/
free -h
echo "--- smoke test ---"
pyhl run -c "print('smoke ok')" || { echo "FAIL: pyhl run exited $?"; exit 1; }

- name: "Perf: hello world (15 runs)"
if: steps.kvm_check.outputs.available == 'true'
run: |
Expand Down Expand Up @@ -210,9 +219,10 @@ jobs:
df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))
print(df.describe())
PYEOF
pyhl run /tmp/pandas_bench.py > /dev/null || { echo "FAIL: pandas smoke check failed"; exit 1; }
times=()
for i in $(seq 1 10); do
ms=$( { /usr/bin/time -f "%e" pyhl run /tmp/pandas_bench.py; } 2>&1 | tail -1 )
ms=$( { /usr/bin/time -f "%e" pyhl run /tmp/pandas_bench.py > /dev/null; } 2>&1 )
ms_int=$(echo "$ms * 1000" | bc | cut -d. -f1)
times+=($ms_int)
echo " run $i: ${ms_int}ms"
Expand Down
150 changes: 150 additions & 0 deletions docs/python-packages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Python package support

The python-agent-driver ships CPython 3.12 with the full standard library and 102 explicitly installed top-level pip packages (transitive dependencies are also included).

## Standard library

The complete CPython 3.12 standard library is included. Commonly used modules:

| Module | Description |
|--------|-------------|
| `ast` | Abstract syntax tree |
| `cProfile` | Deterministic profiling |
| `csv` | CSV file reading/writing |
| `datetime` | Date and time types |
| `email` | Email handling |
| `filecmp` | File and directory comparison |
| `fnmatch` | Unix filename pattern matching |
| `glob` | Unix-style pathname expansion |
| `imaplib` | IMAP4 protocol client |
| `json` | JSON encoder/decoder |
| `os` | OS interfaces |
| `pathlib` | Object-oriented filesystem paths |
| `platform` | Platform identification |
| `profile` | Python profiler |
| `re` | Regular expressions |
| `shutil` | High-level file operations |
| `socket` | Low-level networking |
| `ssl` | TLS/SSL wrapper |
| `subprocess` | Subprocess management (shimmed) |
| `tempfile` | Temporary files and directories |

Other standard library modules (`collections`, `itertools`, `functools`, `hashlib`, `struct`, `threading`, `typing`, `urllib`, `xml`, `zipfile`, `tarfile`, `sqlite3`, etc.) are also available.

## Pre-imported packages (zero import cost)

These packages are imported during `pyhl setup` warmup. They are already in `sys.modules` when your code runs, so `import` is instant.

| Package | Import name |
|---------|-------------|
| beautifulsoup4 | `bs4` |
| click | `click` |
| cryptography | `cryptography` |
| Jinja2 | `jinja2` |
| lxml | `lxml` |
| markdown-it-py | `markdown_it` |
| numpy | `numpy` |
| openpyxl | `openpyxl` |
| pandas | `pandas` |
| Pillow | `PIL` |
| pydantic | `pydantic` |
| pypdf | `pypdf` |
| python-dateutil | `dateutil` |
| python-docx | `docx` |
| python-dotenv | `dotenv` |
| python-pptx | `pptx` |
| PyYAML | `yaml` |
| tabulate | `tabulate` |
| tenacity | `tenacity` |
| tqdm | `tqdm` |

## Shipped packages (import cost on first use)

These packages are in the rootfs but not pre-imported. The first `import` pays the usual module load cost.

| Package | Import name |
|---------|-------------|
| aiohttp | `aiohttp` |
| altair | `altair` |
| APScheduler | `apscheduler` |
| bandit | `bandit` |
| bokeh | `bokeh` |
| boto3 | `boto3` |
| builtwith | `builtwith` |
| celery | `celery` |
| chardet | `chardet` |
| charset-normalizer | `charset_normalizer` |
| coverage | `coverage` |
| distro | `distro` |
| docx2txt | `docx2txt` |
| duckdb | `duckdb` |

| exchange-calendars | `exchange_calendars` |
| fabric | `fabric` |
| Faker | `faker` |
| fastapi | `fastapi` |
| feedparser | `feedparser` |
| fpdf2 | `fpdf` |
| gensim | `gensim` |
| gitpython | `git` |
| google-api-python-client | `googleapiclient` |
| hypercorn | `hypercorn` |
| httpx | `httpx` |
| hypothesis | `hypothesis` |
| loguru | `loguru` |
| markdown | `markdown` |
| markdownify | `markdownify` |
| mutagen | `mutagen` |
| networkx | `networkx` |
| nltk | `nltk` |
| numpy-financial | `numpy_financial` |
| odfpy | `odf` |
| paramiko | `paramiko` |
| pdfplumber | `pdfplumber` |
| pdfrw | `pdfrw` |
| pexpect | `pexpect` |
| pipdeptree | `pipdeptree` |
| platformdirs | `platformdirs` |
| plotly | `plotly` |
| polars | `polars` |
| praw | `praw` |
| pycountry | `pycountry` |
| pydub | `pydub` |
| pyflakes | `pyflakes` |
| pygments | `pygments` |
| pylint | `pylint` |
| PyPDF2 | `PyPDF2` |
| pytest | `pytest` |
| pytest-asyncio | `pytest_asyncio` |
| pytest-cov | `pytest_cov` |
| pyxlsb | `pyxlsb` |
| qrcode | `qrcode` |
| radon | `radon` |
| rapidfuzz | `rapidfuzz` |
| rarfile | `rarfile` |
| reportlab | `reportlab` |
| requests | `requests` |
| rope | `rope` |
| ruff | `ruff` |
| schedule | `schedule` |
| scikit-learn | `sklearn` |
| scipy | `scipy` |
| scrapy | `scrapy` |
| send2trash | `send2trash` |
| slack-sdk | `slack_sdk` |
| srt | `srt` |
| statsmodels | `statsmodels` |
| svgwrite | `svgwrite` |
| sympy | `sympy` |
| textblob | `textblob` |
| trafilatura | `trafilatura` |
| tweepy | `tweepy` |
| typer | `typer` |
| typing-extensions | `typing_extensions` |
| uvicorn | `uvicorn` |
| vulture | `vulture` |
| watchdog | `watchdog` |
| websockets | `websockets` |
| wordcloud | `wordcloud` |
| xlrd | `xlrd` |
| xlsxwriter | `xlsxwriter` |
33 changes: 31 additions & 2 deletions examples/python-agent-driver/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,38 @@ ARG BASE=ghcr.io/hyperlight-dev/hyperlight-unikraft/python-base:latest

# Stage 1: Python deps (same as python-agent).
FROM python:3.12-slim AS deps
RUN pip install --target=/deps --no-cache-dir \
RUN pip install --target=/deps --no-cache-dir --prefer-binary \
tqdm pyyaml jinja2 beautifulsoup4 tabulate click tenacity \
python-dotenv pypdf openpyxl markdown-it-py pydantic pillow \
lxml cryptography python-dateutil numpy pandas
lxml cryptography python-dateutil numpy pandas \
python-docx python-pptx \
chardet charset-normalizer \
requests httpx aiohttp \
feedparser markdown markdownify \
Faker pycountry \
loguru schedule send2trash \
duckdb polars \
xlrd xlsxwriter pyxlsb odfpy \
pdfplumber pdfrw PyPDF2 \
qrcode svgwrite \
rapidfuzz \
networkx sympy \
pydub srt mutagen \
plotly altair bokeh \
statsmodels scikit-learn scipy \
wordcloud \
nltk textblob gensim \
fastapi uvicorn hypercorn \
typer pygments platformdirs distro \
pytest pytest-cov pytest-asyncio coverage hypothesis \
ruff pylint pyflakes bandit vulture radon rope \
websockets \
fpdf2 reportlab \
APScheduler celery \
numpy-financial docx2txt pipdeptree watchdog rarfile \
boto3 google-api-python-client slack-sdk praw tweepy \
scrapy trafilatura builtwith exchange-calendars \
paramiko fabric pexpect gitpython
RUN set -eux; \
find /deps -maxdepth 1 -type d -name '*.dist-info' -exec rm -rf {} + 2>/dev/null || true; \
find /deps \( -type d -name tests -o -type d -name test \) -exec rm -rf {} + 2>/dev/null || true; \
Expand Down Expand Up @@ -41,6 +69,7 @@ RUN PY_INC=$(python3.12 -c 'import sysconfig; print(sysconfig.get_path("include"
FROM ${BASE} AS rootfs
COPY --from=deps /deps /usr/local/lib/python3.12/site-packages
COPY --from=driver-build /src/hl_pydriver /bin/hl_pydriver
COPY pydoc_stub.py /usr/local/lib/python3.12/pydoc.py

# Stage 4: pack CPIO.
FROM alpine:3.20 AS cpio
Expand Down
2 changes: 1 addition & 1 deletion examples/python-agent-driver/Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ export DOCKER_BUILDKIT := "0"

kernel := ".unikraft/build/python-agent-driver-hyperlight_hyperlight-x86_64"
initrd := "python-agent-driver-initrd.cpio"
memory := "1Gi"
memory := "2560Mi"
image := "python-agent-driver-hyperlight"
script := "../python-agent/agent.py"

Expand Down
3 changes: 2 additions & 1 deletion examples/python-agent-driver/hl_pydriver.c
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,8 @@ static void py_initialize_once(void)
" 'numpy', 'pandas', 'pydantic', 'yaml', 'jinja2',"
" 'bs4', 'tabulate', 'click', 'tenacity', 'tqdm',"
" 'openpyxl', 'pypdf', 'markdown_it', 'PIL', 'lxml',"
" 'cryptography', 'dateutil', 'dotenv'):\n"
" 'cryptography', 'dateutil', 'dotenv',"
" 'docx', 'pptx'):\n"
" try:\n"
" importlib.import_module(_mod)\n"
" except Exception as _e:\n"
Expand Down
11 changes: 11 additions & 0 deletions examples/python-agent-driver/pydoc_stub.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
"""Minimal pydoc stub — provides getdoc() for pyarrow's vendored docscrape."""

def getdoc(obj):
try:
doc = obj.__doc__
except AttributeError:
return ''
if not doc:
return ''
import re
return re.sub('^ *\n', '', doc.rstrip())
2 changes: 1 addition & 1 deletion host/src/bin/pydriver_run.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ fn main() -> Result<()> {
let t_evolve = Instant::now();
let mut sandbox = Sandbox::builder(&kernel)
.initrd_file(&initrd)
.heap_size(2 * 1024 * 1024 * 1024)
.heap_size(5 * 512 * 1024 * 1024)
.build()?;
eprintln!(
"[timing] evolve={:.1}ms",
Expand Down
34 changes: 22 additions & 12 deletions host/src/bin/pyhl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ const PREIMPORTED_MODULES: &[&str] = &[
"cryptography",
"dateutil",
"dotenv",
"docx",
"pptx",
];

/// Build the long-about blurb shown by `pyhl --help`. Lists the
Expand All @@ -114,9 +116,23 @@ fn long_about() -> String {
}
s.push_str(
"\n\n\
Other third-party packages shipped in the rootfs still work — \
they just pay the usual import cost on first access. Packages \
not in the rootfs will raise ModuleNotFoundError.",
Additional packages shipped in the rootfs (pay import cost on \
first use): aiohttp, altair, APScheduler, bandit, bokeh, \
boto3, builtwith, celery, chardet, charset-normalizer, \
coverage, distro, docx2txt, duckdb, \
exchange-calendars, fabric, Faker, fastapi, feedparser, \
fpdf2, gensim, gitpython, google-api-python-client, \
hypercorn, httpx, hypothesis, loguru, markdown, markdownify, \
mutagen, networkx, nltk, numpy-financial, odfpy, paramiko, \
pdfplumber, pdfrw, pexpect, pipdeptree, platformdirs, plotly, \
polars, praw, pycountry, pydub, pyflakes, pygments, pylint, \
PyPDF2, pytest, pytest-asyncio, pytest-cov, pyxlsb, qrcode, \
radon, rapidfuzz, rarfile, reportlab, requests, rope, ruff, \
schedule, scikit-learn, scipy, scrapy, send2trash, slack-sdk, \
srt, statsmodels, svgwrite, sympy, textblob, trafilatura, \
tweepy, typer, uvicorn, vulture, watchdog, websockets, \
wordcloud, xlrd, xlsxwriter.\n\n\
Packages not in the rootfs will raise ModuleNotFoundError.",
);
s
}
Expand Down Expand Up @@ -435,7 +451,7 @@ fn cmd_setup(args: SetupArgs) -> Result<()> {
{
let mut builder = Sandbox::builder(&dst_kernel)
.initrd_file(&dst_initrd)
.heap_size(3 * 512 * 1024 * 1024);
.heap_size(5 * 512 * 1024 * 1024);
for p in &setup_preopens {
builder = builder.preopen(p.clone());
}
Expand Down Expand Up @@ -577,18 +593,12 @@ fn cmd_run(args: RunArgs) -> Result<()> {
listen_ports.is_some(),
)?;

let initrd = home.join(INITRD_FILE);

let t_load = Instant::now();
let initrd_ref = if initrd.is_file() {
Some(initrd.as_path())
} else {
None
};
let initrd = home.join(INITRD_FILE);
let mut sandbox = Sandbox::from_snapshot_file_configured(
&snapshot,
&run_preopens,
initrd_ref,
Some(initrd.as_path()),
network.as_ref(),
listen_ports.as_ref(),
)?;
Expand Down
9 changes: 2 additions & 7 deletions host/src/pyhl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ pub fn install(opts: &InstallOptions<'_>) -> Result<InstallReport> {
{
let mut builder = Sandbox::builder(&dst_kernel)
.initrd_file(&dst_initrd)
.heap_size(3 * 512 * 1024 * 1024);
.heap_size(5 * 512 * 1024 * 1024);
for p in opts.mounts {
builder = builder.preopen(p.clone());
}
Expand Down Expand Up @@ -275,15 +275,10 @@ impl Runtime {
);
}
let initrd = home.join(INITRD_FILE);
let initrd_ref = if initrd.is_file() {
Some(initrd.as_path())
} else {
None
};
let sandbox = Sandbox::from_snapshot_file_configured(
&snap,
mounts,
initrd_ref,
Some(initrd.as_path()),
network,
listen_ports,
)?;
Expand Down
Loading