Skip to content

Conversation

@JessieeeNotLi
Copy link

@JessieeeNotLi JessieeeNotLi commented Dec 15, 2025

Summary by CodeRabbit

  • New Features

    • Added new benchmark suites: Monte Carlo Pi estimation, BERT language inference, image classification, recommendation systems, linear algebra operations (matmul, AXPY, Jacobi2D, Cholesky, LU, SpMV), and scientific computing (channel flow, compute operations) using JAX framework.
    • Implemented GitHub Actions workflow for automated code quality checks.
  • Chores

    • Migrated from CircleCI to GitHub Actions for CI/CD.
    • Updated benchmark dependencies and configurations.
    • Standardized code formatting across benchmark files.

✏️ Tip: You can customize this high-level summary in your review settings.

McLavish and others added 30 commits October 30, 2025 08:53
sync: image-classification requirements + add 605.lu benchmark

- Resolve conflicts in 413.image-classification/python/requirements*
- Drop py3.6/py3.7 variants removed upstream; keep/update 3.8–3.11
- Add new 600.linearalgebra/605.lu benchmark (config, input, function, reqs)
- Rename local_deployment.tmp -> 600.linearalgebra/605.lu/config.json
- Update local_deployment.json; add out_benchmark*.json; update out_storage.json
Russellpang and others added 28 commits November 16, 2025 02:15
Removed configuration details for MinIO and ScyllaDB.
Removed sensitive SSH private key from eval command.
Merge benchmarks by Russell and Yuxuan into Development
This document provides detailed instructions for running the GPU Cache Latency benchmark, including system requirements, build steps, and example output.
This readme provides detailed instructions for running the GPU Cache Latency benchmark, including system requirements, build steps, and example commands.
@coderabbitai
Copy link

coderabbitai bot commented Dec 15, 2025

Walkthrough

This PR transitions CI from CircleCI to GitHub Actions, updates repository configuration, normalizes code formatting across benchmark files to use double quotes and consistent line wrapping, adds configuration for development tools (VSCode, pre-commit, mypy), updates the benchmarks-data submodule URL, and introduces ten new benchmark implementations across multiple categories including Monte Carlo Pi, BERT language inference, image classification, recommendation systems, scientific computing (JAX-based), and linear algebra operations.

Changes

Cohort / File(s) Summary
CI and Workflow Configuration
.circleci/config.yml, .github/workflows/lint.yml
Removed entire CircleCI v2.1 setup including Python orb, linting/test jobs, and workflow definitions. Added new GitHub Actions "Lint" workflow triggering on push/PR, running Black and Flake8 checks with artifact uploads.
Repository Configuration
.gitmodules, benchmarks-data
Updated benchmarks-data submodule URL from spcl/serverless-benchmarks-data to McLavish/serverless-benchmarks-data-dphpc and updated submodule commit reference.
Development Tools Configuration
.mypy.ini, .pre-commit-config.yaml, .vscode/settings.json
Added mypy section for docker.* namespace, local pre-commit hooks for flake8/black targeting sebs/ and benchmarks/, and VSCode workspace settings for Python formatting with Black and Flake8.
Existing Benchmark Style Normalization
benchmarks/000.microbenchmarks/*/, benchmarks/100.webapps/*/, benchmarks/200.multimedia/*/, benchmarks/300.utilities/{311.compression,320.monte-carlo-pi}/, benchmarks/400.inference/411.image-recognition/, benchmarks/500.scientific/*/
Normalized string literals from single to double quotes, reformatted function signatures across multiple lines, updated dictionary key formatting, and adjusted whitespace. Functional behavior unchanged; formatting and style updates only.
New Benchmark: Monte Carlo Pi
benchmarks/300.utilities/320.monte-carlo-pi/config.json, input.py, python/function.py, python/requirements.txt
Added complete benchmark with configuration, input generator returning sampling parameters, PyTorch-based Pi estimator with GPU support, CUDA device selection, batched sampling, and timing measurements.
New Benchmark: BERT Language Inference
benchmarks/400.inference/412.language-bert/config.json, input.py, python/function.py, python/init.sh, python/package.sh, python/requirements.txt*
Added ONNX-based BERT-tiny language model inference benchmark with model/text upload functions, lazy model initialization, tokenization pipeline, softmax prediction mapping, and CUDA ExecutionProvider requirement. Includes version-specific requirements files.
New Benchmark: Image Classification
benchmarks/400.inference/413.image-classification/config.json, input.py, python/function.py, python/imagenet_class_index.json, python/init.sh, python/package.sh, python/requirements.txt*
Added ResNet-50 ONNX-based image classification benchmark with model/image upload, ImageNet label mapping, preprocessing (resize/center-crop), inference with top-1/top-5 predictions, caching, and timing per stage. Version-specific and ARM-specific requirements.
New Benchmark: Recommendation System
benchmarks/400.inference/413.recommendation/config.json, input.py, python/function.py, python/init.sh, python/package.sh, python/requirements.txt*
Added tiny DLRM recommendation model benchmark with model download/caching, batch preparation, embedding/MLP inference, device management, and detailed timing measurements across download, computation, and model phases.
New Benchmarks: Scientific Computing (JAX-based)
benchmarks/500.scientific/5xx.channel_flow_jax_npbench/, 5xx.compute_jax_npbench/, 5xx.deep_learning_resnet_jax_npbench/
Added three JAX-based scientific benchmarks: 2D channel flow solver with Navier–Stokes and periodic BCs, matrix computation with JAX JIT, and lightweight ResNet-like block with convolution/batch norm. Each includes config, input generator, and requirements.
New Benchmarks: Linear Algebra (PyTorch CUDA)
benchmarks/600.linearalgebra/{601.matmul,602.axpy,603.jacobi2d,604.cholesky,605.lu,606.spmv}/
Added six PyTorch-based CUDA linear algebra benchmarks: matrix multiplication (GEMM), scalar-vector product (AXPY), Jacobi iteration (2D), Cholesky decomposition, LU factorization, and sparse matrix-vector product. Each includes config, input generator, kernel functions, and torch 2.4.1 requirement.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas requiring extra attention:

  • New benchmark implementations (benchmarks/400.inference/412.language-bert/python/function.py, 413.image-classification/python/function.py, 413.recommendation/python/function.py): Verify model loading, caching mechanisms, timing accuracy, device handling (CUDA), and error paths for inference pipelines
  • JAX-based scientific benchmarks (benchmarks/500.scientific/5xx.*/python/function.py): Validate JIT compilation, boundary conditions, numerical correctness, and device_get conversions
  • PyTorch CUDA kernels (benchmarks/600.linearalgebra/*/python/function.py): Ensure CUDA event timing accuracy, synchronization points, memory management, and reproducibility with seeding
  • CI transition (.circleci/config.yml removal + .github/workflows/lint.yml addition): Confirm GitHub Actions workflow completeness, artifact handling, and coverage parity with removed CircleCI config
  • Package optimization scripts (package.sh files): Verify correctness of torch.zip creation logic, conditional platform/version checks, and size reporting accuracy

Poem

🐰 From CircleCI to Actions we run,
New benchmarks bloom under GPU sun,
PyTorch and JAX compute with flair,
Double quotes everywhere, formatting care!
Images, BERT, DLRM take flight,
Linear algebra burns fast and bright!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Jessie/monte carlo' is vague and does not clearly describe the substantive changes in the pull request, which include CI/CD migration, code formatting standardization, and multiple new benchmarks across inference and scientific computing domains. Consider using a more descriptive title such as 'Add Monte Carlo PI, inference, and scientific benchmarks with CI/CD migration' to better communicate the scope and purpose of the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
benchmarks/300.utilities/311.compression/python/function.py (1)

20-44: Harden temporary directory handling and address S108 warning

The main concern in this block is how the temporary working directory is created and managed:

  • Line 26 builds download_path as "/tmp/{}-{}".format(key, uuid.uuid4()) and then calls os.makedirs(download_path).
  • Static analysis (Ruff S108) correctly flags this as a probable insecure temp dir pattern: you are composing a path in /tmp manually and including a value (key) that may ultimately come from external input.
  • The directory is never explicitly cleaned up, so in warm containers you can accumulate data under /tmp over many invocations.

Even if uuid.uuid4() makes the directory name hard to guess, this is still weaker than using the standard library’s safe temp-dir helpers, and mixing user-controlled strings into filesystem paths is something to avoid unless you are certain of constraints.

I recommend switching to tempfile.TemporaryDirectory, which will:

  • Create the directory securely.
  • Automatically clean it up at the end of the block.
  • Address the S108 warning and improve your security posture.

A minimal refactor preserving your current behavior would look like:

@@
-import datetime
-import os
-import shutil
-import uuid
+import datetime
+import os
+import shutil
+import uuid
+import tempfile
@@
 def handler(event):
@@
-    bucket = event.get("bucket").get("bucket")
-    input_prefix = event.get("bucket").get("input")
-    output_prefix = event.get("bucket").get("output")
-    key = event.get("object").get("key")
-    download_path = "/tmp/{}-{}".format(key, uuid.uuid4())
-    os.makedirs(download_path)
-
-    s3_download_begin = datetime.datetime.now()
-    client.download_directory(bucket, os.path.join(input_prefix, key), download_path)
-    s3_download_stop = datetime.datetime.now()
-    size = parse_directory(download_path)
-
-    compress_begin = datetime.datetime.now()
-    shutil.make_archive(os.path.join(download_path, key), "zip", root_dir=download_path)
-    compress_end = datetime.datetime.now()
-
-    s3_upload_begin = datetime.datetime.now()
-    archive_name = "{}.zip".format(key)
-    archive_size = os.path.getsize(os.path.join(download_path, archive_name))
-    key_name = client.upload(
-        bucket, os.path.join(output_prefix, archive_name), os.path.join(download_path, archive_name)
-    )
-    s3_upload_stop = datetime.datetime.now()
+    bucket = event.get("bucket").get("bucket")
+    input_prefix = event.get("bucket").get("input")
+    output_prefix = event.get("bucket").get("output")
+    key = event.get("object").get("key")
+
+    with tempfile.TemporaryDirectory(dir="/tmp") as download_path:
+        s3_download_begin = datetime.datetime.now()
+        client.download_directory(bucket, os.path.join(input_prefix, key), download_path)
+        s3_download_stop = datetime.datetime.now()
+        size = parse_directory(download_path)
+
+        compress_begin = datetime.datetime.now()
+        shutil.make_archive(os.path.join(download_path, key), "zip", root_dir=download_path)
+        compress_end = datetime.datetime.now()
+
+        s3_upload_begin = datetime.datetime.now()
+        archive_name = "{}.zip".format(key)
+        archive_size = os.path.getsize(os.path.join(download_path, archive_name))
+        key_name = client.upload(
+            bucket,
+            os.path.join(output_prefix, archive_name),
+            os.path.join(download_path, archive_name),
+        )
+        s3_upload_stop = datetime.datetime.now()

Optional follow-ups (good to consider, but not strictly required for this PR):

  • If event["object"]["key"] can contain path separators or .., you may want to either validate/sanitize it before using it in a local filesystem path or restrict it to os.path.basename(key) for the archive name.
  • The nested .get() calls (event.get("bucket").get("bucket")) don’t actually make the code more robust; if the outer key is missing, you still get AttributeError on .get of None. If you want defensive parsing, consider defaults (event.get("bucket", {}).get("bucket")) or just use indexing if the shape is guaranteed.

Based on static analysis hints (Ruff S108).

benchmarks/000.microbenchmarks/020.network-benchmark/python/function.py (1)

11-57: Fix undefined key on repeated failures and consider safer temp-file handling

If Line 35’s consecutive_failures == 5 condition trips, you break out of the loop with consecutive_failures == 5, skip the if consecutive_failures != 5: block, and then hit return {"result": key} on Line 57 with key never defined. That will throw at runtime on repeated timeouts.

A minimal fix is to initialize key before the if and still return something sensible when all attempts fail:

-    if consecutive_failures != 5:
-        with open("/tmp/data.csv", "w", newline="") as csvfile:
+    key = None
+    if consecutive_failures != 5:
+        with open("/tmp/data.csv", "w", newline="") as csvfile:
             writer = csv.writer(csvfile, delimiter=",")
             writer.writerow(["id", "client_send", "client_rcv"])
             for row in times:
                 writer.writerow(row)
 
         client = storage.storage.get_instance()
         filename = "results-{}.csv".format(request_id)
         key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv")
 
     return {"result": key}

Separately, both Line 47 and Line 55 rely on a fixed /tmp/data.csv path. For concurrent invocations in the same runtime, that’s prone to clobbering. If this benchmark may run concurrently, consider using a per-request filename (e.g., include request_id) or tempfile.NamedTemporaryFile(delete=False) to avoid collisions.

benchmarks/000.microbenchmarks/030.clock-synchronization/python/function.py (1)

38-41: Logic mismatch: consecutive_failures check uses wrong value.

The loop breaks when consecutive_failures == 7 (line 38), but the post-loop check uses consecutive_failures != 5 (line 62). This condition will always be true when the connection fails, potentially causing the code to attempt writing an empty CSV. The check should use 7 to match the break condition.

-    if consecutive_failures != 5:
+    if consecutive_failures != 7:

Also applies to: 62-73

benchmarks/500.scientific/504.dna-visualisation/python/function.py (1)

1-49: Harden temporary file handling and tidy lint issues

Two concrete improvements here:

  1. Safer /tmp path and file handling (Ruff S108)
    download_path = "/tmp/{}".format(key) takes key verbatim and open(download_path, "r").read() leaves the handle for GC. This is inconsistent with patterns already established in similar handlers; several other benchmarks sanitize the path and use context managers:
-    key = event.get("object").get("key")
-    download_path = "/tmp/{}".format(key)
+    key = event.get("object").get("key")
+    safe_name = os.path.basename(key)
+    download_path = os.path.join("/tmp", safe_name)
@@
-    data = open(download_path, "r").read()
+    with open(download_path, "r") as f:
+        data = f.read()
  1. Executable file without shebang (Ruff EXE002)
    This file is marked executable in git (100755) but lacks a shebang. Since this is a library-style module, drop the executable bit rather than adding a shebang.

The rest of the handler (download/transform/upload timing and return structure) looks consistent with other benchmarks.

♻️ Duplicate comments (5)
benchmarks/000.microbenchmarks/010.sleep/input.py (1)

1-11: Same unused-argument pattern as server-reply input

This file follows the same structure as 040.server-reply/input.py: only size is used, others are present to satisfy the harness signature. Any Ruff ARG001 warnings here can be handled the same way (underscore prefix or # noqa: ARG001 on the function) if you decide to keep the linter strict.

benchmarks/400.inference/413.image-classification/python/requirements.txt.3.10 (1)

1-4: Dependencies look consistent with other Python 3.9+ requirements files.

The dependency versions are consistent with other requirements files for Python 3.9+ versions. The verification of torch/torchvision compatibility requested in requirements.txt.arm.3.9 applies here as well.

benchmarks/400.inference/413.image-classification/python/requirements.txt.3.12 (1)

1-4: Dependencies are consistent across Python 3.9+ versions.

The dependency versions match other requirements files for Python 3.9+ versions, maintaining good consistency across the benchmark configurations.

benchmarks/600.linearalgebra/605.lu/input.py (1)

1-7: Same pattern as other linear algebra inputs; apply the same Ruff fix

This follows the same size_generators + {"size", "seed"} pattern as 603.jacobi2d/input.py and looks correct.

You’ll see the same ARG001 unused-arg hints here; the same remedy applies (function-level # noqa: ARG001 or _-prefixed param names) if you want a clean lint run.

benchmarks/600.linearalgebra/606.spmv/input.py (1)

1-7: SPMV config looks good; same note on unused arguments

The size/seed/density structure is clear and appropriate for SPMV.

To avoid repeated ARG001 warnings for the unused parameters in generate_input, you can mirror the approach used in other files (function-level # noqa: ARG001 or _-prefixed parameter names) without changing the external interface.

🟠 Major comments (15)
benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/function.py-84-90 (1)

84-90: Missing fallback when "size" key is absent causes NameError.

If event doesn't contain "size", the variables N, W, H, C1, C2 remain undefined, causing a NameError on line 94. Other benchmarks in the project have the same pattern, but consider adding validation or default values.

+def handler(event):
+    if "size" not in event:
+        raise ValueError("Missing required 'size' key in event")
+
+    size = event["size"]
+    N = size["N"]
+    W = size["W"]
+    H = size["H"]
+    C1 = size["C1"]
+    C2 = size["C2"]
-def handler(event):
-
-    if "size" in event:
-        size = event["size"]
-        N = size["N"]
-        W = size["W"]
-        H = size["H"]
-        C1 = size["C1"]
-        C2 = size["C2"]

Committable suggestion skipped: line range outside the PR's diff.

benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/requirements.txt-1-1 (1)

1-1: Pin the JAX version for benchmark reproducibility.

Unpinned dependencies can introduce breaking changes and make benchmark results non-reproducible across different installations or time periods. For scientific benchmarks, version stability is essential.

Apply this diff to pin to a specific version:

-jax[cuda12]
+jax[cuda12]==0.4.35

Note: Verify the appropriate version for your use case. To check the latest stable version, run:

#!/bin/bash
curl -s https://pypi.org/pypi/jax/json | jq -r '.info.version'
benchmarks/400.inference/413.image-classification/python/package.sh-1-4 (1)

1-4: Add shebang and fix shell script robustness issues.

The script is missing a shebang line, and PACKAGE_DIR is assigned but never used (you reference $1 directly throughout). Additionally, cd commands should handle failures to prevent operations in the wrong directory.

+#!/bin/bash
+set -euo pipefail
+
 # Stripping package code is based on https://github.com/ryfeus/lambda-packs repo

 PACKAGE_DIR=$1
 echo "Original size $(du -sh $1 | cut -f1)"

Also consider using $PACKAGE_DIR instead of $1 throughout, or remove the unused variable.

benchmarks/400.inference/412.language-bert/python/package.sh-1-7 (1)

1-7: Add shebang and handle cd failures.

Same issues as 413.image-classification/python/package.sh: missing shebang, unused PACKAGE_DIR variable, and cd commands without error handling.

+#!/bin/bash
+set -euo pipefail
+
 # Stripping package code is based on https://github.com/ryfeus/lambda-packs repo

 PACKAGE_DIR=$1
 echo "Original size $(du -sh $1 | cut -f1)"

 CUR_DIR=$(pwd)
-cd $1
+cd "$1" || exit 1
benchmarks/500.scientific/5xx.channel_flow_jax_npbench/python/requirements.txt-1-1 (1)

1-1: Pin the JAX version for reproducibility.

The unpinned JAX dependency can lead to non-reproducible benchmark results across different installations or over time.

Apply this diff to pin the JAX version:

-jax[cuda12]
+jax[cuda12]==0.4.35

Note: Verify the appropriate JAX version for your CUDA 12 setup and update accordingly.

benchmarks/400.inference/413.image-classification/python/init.sh-10-10 (1)

10-10: Quote shell variables to prevent word splitting and globbing.

Unquoted variables ${path} and ${DIR} can cause failures if they contain spaces or special characters.

Apply this diff:

-cp ${path} ${DIR}
+cp "${path}" "${DIR}"
benchmarks/100.webapps/120.uploader/python/function.py-17-19 (1)

17-19: Unsafe chained .get() calls could raise AttributeError.

If event.get("bucket") returns None, the subsequent .get("bucket") call will raise an AttributeError. The same issue applies to lines 18-19.

Apply this diff to safely handle missing keys:

-    bucket = event.get("bucket").get("bucket")
-    output_prefix = event.get("bucket").get("output")
-    url = event.get("object").get("url")
+    bucket_info = event.get("bucket", {})
+    bucket = bucket_info.get("bucket")
+    output_prefix = bucket_info.get("output")
+    url = event.get("object", {}).get("url")
benchmarks/600.linearalgebra/603.jacobi2d/python/function.py-41-53 (1)

41-53: Dead code: seed variable is set but never used.

Same issue as in 604.cholesky/python/function.py - the seed variable is assigned but initialize_torch() doesn't accept a seed parameter. Remove the dead code or add seed support for consistency.

 def handler(event):
-
     size = event.get("size")
-    if "seed" in event:
-        import random
-
-        random.seed(event["seed"])
-
-        seed = event.get("seed", 42)
-        seed = int(seed)

     matrix_generating_begin = datetime.datetime.now()
benchmarks/600.linearalgebra/604.cholesky/python/function.py-34-45 (1)

34-45: Dead code: seed variable is set but never used.

The seed variable is assigned inside the if block but initialize_torch() in this file doesn't accept a seed parameter. Either remove the dead code or add seed support to initialize_torch() for consistency with other benchmarks (e.g., 602.axpy, 607.fw).

 def handler(event):
     size = event.get("size")
-    if "seed" in event:
-        import random
-
-        random.seed(event["seed"])
-
-        seed = event.get("seed", 42)
-        seed = int(seed)

     gen_begin = datetime.datetime.now()
     A = initialize_torch(size, dtype=torch.float32, device="cuda")

Alternatively, if deterministic initialization is desired, add a seed parameter to initialize_torch() similar to the pattern in benchmarks/600.linearalgebra/607.fw/python/function.py.

benchmarks/600.linearalgebra/605.lu/python/function.py-51-59 (1)

51-59: Same issues as 601.matmul: potential UnboundLocalError and unused seed.

The handler has the same problems noted in the matmul benchmark:

  1. If "size" is missing, size is None and line 62 fails.
  2. seed is extracted but never used for reproducibility.

See the fix suggested for 601.matmul/python/function.py.

benchmarks/500.scientific/5xx.compute_jax_npbench/python/function.py-24-33 (1)

24-33: Potential UnboundLocalError if size key is missing.

If the event dict doesn't contain "size", variables M, N, and size remain undefined, causing a runtime error at line 33 or 59. Either raise an explicit error for missing input or provide defaults.

 def handler(event):
-
-    if "size" in event:
-        size = event["size"]
-        M = size["M"]
-        N = size["N"]
+    size = event.get("size")
+    if size is None:
+        raise ValueError("Missing required 'size' in event")
+    M = size["M"]
+    N = size["N"]
 
     generate_begin = datetime.datetime.now()
benchmarks/600.linearalgebra/601.matmul/python/function.py-32-44 (1)

32-44: Potential UnboundLocalError and unused seed variable.

Two issues:

  1. If "size" is missing from event, the size variable will be None and unpacking at line 44 will fail.
  2. The seed variable is assigned but never used—unlike 602.axpy which passes it to initialize_torch.
 def handler(event):
-
     size = event.get("size")
+    if size is None:
+        raise ValueError("Missing required 'size' in event")
+    
+    seed = 42
     if "seed" in event:
-        import random
-
-        random.seed(event["seed"])
-
         seed = event.get("seed", 42)
         seed = int(seed)
+        torch.manual_seed(seed)
+        torch.cuda.manual_seed_all(seed)
benchmarks/500.scientific/5xx.channel_flow_jax_npbench/python/function.py-239-246 (1)

239-246: NameError if "size" key is missing from event.

If "size" is not in the event, the variables ny, nx, nit, rho, nu, F, and size are never assigned, causing a NameError on line 250 or 276. Consider providing defaults or raising a descriptive error.

-    if "size" in event:
-        size = event["size"]
-        ny = size["ny"]
-        nx = size["nx"]
-        nit = size["nit"]
-        rho = size["rho"]
-        nu = size["nu"]
-        F = size["F"]
+    if "size" not in event:
+        raise ValueError("Event must contain 'size' with keys: ny, nx, nit, rho, nu, F")
+    size = event["size"]
+    ny = size["ny"]
+    nx = size["nx"]
+    nit = size["nit"]
+    rho = size["rho"]
+    nu = size["nu"]
+    F = size["F"]
benchmarks/400.inference/411.image-recognition/python/function.py-14-14 (1)

14-14: ZipFile not properly closed and unsafe extractall().

The ZipFile object should use a context manager to ensure proper cleanup. Additionally, extractall() can be vulnerable to path traversal (zip slip) attacks.

-    zipfile.ZipFile("function/torch.zip").extractall("/tmp/")
+    with zipfile.ZipFile("function/torch.zip") as zf:
+        for name in zf.namelist():
+            if name.startswith("/") or ".." in name:
+                raise ValueError(f"Unsafe path in archive: {name}")
+        zf.extractall("/tmp/")
benchmarks/400.inference/413.image-classification/python/function.py-56-57 (1)

56-57: Unsafe tarfile.extractall() usage — path traversal vulnerability.

extractall() without validation can be exploited by malicious archives containing path traversal sequences. Since this archive is sourced from user-provided storage (line 53), validate member paths before extraction.

The Python 3.12 filter parameter should not be relied upon as sole mitigation—recent CVEs (June 2025) demonstrate bypasses in the filter implementation. Use manual path validation instead:

         with tarfile.open(archive_path, "r:gz") as tar:
+            for member in tar.getmembers():
+                if member.name.startswith("/") or ".." in member.name:
+                    raise ValueError(f"Unsafe path in archive: {member.name}")
             tar.extractall(MODEL_DIRECTORY)
🟡 Minor comments (14)
benchmarks/400.inference/413.recommendation/python/package.sh-3-4 (1)

3-4: Use the assigned PACKAGE_DIR variable.

Line 3 assigns PACKAGE_DIR=$1, but Line 4 uses $1 directly instead of $PACKAGE_DIR. This makes the variable assignment pointless.

Apply this diff:

 PACKAGE_DIR=$1
-echo "DLRM GPU package size $(du -sh $1 | cut -f1)"
+echo "DLRM GPU package size $(du -sh "$PACKAGE_DIR" | cut -f1)"

Note: Also added quotes around the variable to handle paths with spaces.

.gitmodules-6-6 (1)

6-6: Update documentation reference to match the new submodule URL.

The new repository at https://github.com/McLavish/serverless-benchmarks-data-dphpc.git is accessible and the change is already reflected in install.py. However, docs/modularity.md still references the old https://github.com/spcl/serverless-benchmarks-data URL, which will confuse contributors. Update the documentation to point to the new repository URL for consistency.

benchmarks/400.inference/412.language-bert/python/requirements.txt.3.11-1-3 (1)

1-3: Consider upgrading onnxruntime-gpu to a more recent stable version.

Version 1.16.3 has known distribution and bug issues. While no specific CVE directly affects 1.16.3, related path-traversal vulnerabilities were patched in 1.16.2, and there are reported thread-safety and installation problems with 1.16.3. Upgrading to the latest stable release would address these maintenance and stability concerns.

The numpy 1.24.4 and tokenizers 0.13.3 versions have no published security advisories. The .3.11 filename extension is a deliberate naming convention used consistently across this project for Python version-specific requirements files.

benchmarks/500.scientific/504.dna-visualisation/input.py-9-20 (1)

9-20: Guard against missing .fasta files before using data

data is first assigned inside the for file in glob.glob(...): loop (Line 13) but is used after the loop to populate input_config["object"]["key"] (Line 17). If there are no *.fasta files in data_dir, data will be undefined and this function will crash.

Consider explicitly tracking whether any file was found, e.g.:

-def generate_input(
-    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
-):
-
-    for file in glob.glob(os.path.join(data_dir, "*.fasta")):
-        data = os.path.relpath(file, data_dir)
-        upload_func(0, data, file)
+def generate_input(
+    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
+):
+    data = None
+    for file in glob.glob(os.path.join(data_dir, "*.fasta")):
+        data = os.path.relpath(file, data_dir)
+        upload_func(0, data, file)
+
+    if data is None:
+        raise FileNotFoundError(f"No .fasta files found in {data_dir!r}")
 
     input_config = {"object": {}, "bucket": {}}
     input_config["object"]["key"] = data

This keeps current behavior when data exists and fails fast with a clearer error when the dataset is missing.

benchmarks/300.utilities/311.compression/input.py-17-31 (1)

17-31: Orphaned docstring not attached to the function.

The docstring at lines 18-26 is floating as a module-level string literal and is not associated with generate_input. It should be placed immediately after the function definition.

-
-"""
-    Generate test, small and large workload for compression test.
-
-    :param data_dir: directory where benchmark data is placed
-    :param size: workload size
-    :param input_buckets: input storage containers for this benchmark
-    :param output_buckets:
-    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
-"""
-
-
 def generate_input(
     data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
 ):
+    """Generate test, small and large workload for compression test.
+
+    :param data_dir: directory where benchmark data is placed
+    :param size: workload size
+    :param input_buckets: input storage containers for this benchmark
+    :param output_buckets:
+    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
+    """
benchmarks/100.webapps/120.uploader/input.py-18-25 (1)

18-25: Align parameter names with the standard convention across benchmarks.

This function uses input_buckets and output_buckets, while virtually all other benchmarks (compression, image-recognition, recommendation, thumbnailer, video-processing, linear-algebra, and others) use input_paths and output_paths. Additionally, input_buckets is not used in the function body. Consider renaming to match the standard convention for consistency, or document the intentional semantic difference if buckets are meant to convey different semantics than paths.

benchmarks/400.inference/413.image-classification/python/package.sh-28-32 (1)

28-32: Align torch packaging logic between the two benchmark scripts.

The scripts use fundamentally different approaches for conditional torch packaging despite similar structure. Script 413 uses environment variables (PLATFORM, PYTHON_VERSION) while 412 checks directory existence ([ -d "$1/${TORCH_DIR}" ]). Additionally, they zip different paths ($1/torch vs .python_packages/lib/site-packages/torch), use different working directories, and report sizes differently:

  • 413 does not cd into $1 before zipping and reports size from the original working directory
  • 412 cds into $1 before zipping and reports size from the package directory

This inconsistency in conditional logic and path handling makes the scripts harder to maintain. Standardize both scripts to use a consistent approach (prefer directory existence checks over environment variables, as 412 does—this is more robust) and ensure they handle paths and size reporting uniformly.

benchmarks/300.utilities/320.monte-carlo-pi/input.py-1-1 (1)

1-1: Stale/incorrect path in comment.

The comment references 000.microbenchmarks/060.monte-carlo-pi but this file is located at 300.utilities/320.monte-carlo-pi.

-# benchmarks/000.microbenchmarks/060.monte-carlo-pi/input.py
+# benchmarks/300.utilities/320.monte-carlo-pi/input.py
benchmarks/600.linearalgebra/601.matmul/python/function.py-56-62 (1)

56-62: Missing gpu_time in return—inconsistent with peer benchmarks.

The gpu_ms timing from CUDA events is computed but not returned. Other benchmarks in 600.linearalgebra (cholesky, lu, axpy, jacobi2d, fw) all include gpu_time in their measurement output.

     return {
-        # "result": result[0],
         "measurement": {
             "generating_time": matrix_generating_time,
             "compute_time": matmul_time,
+            "gpu_time": gpu_ms,
         },
     }
benchmarks/300.utilities/320.monte-carlo-pi/python/function.py-1-1 (1)

1-1: Incorrect file path in comment.

The comment references 000.microbenchmarks/060.monte-carlo-pi but the actual file path is 300.utilities/320.monte-carlo-pi.

-# benchmarks/000.microbenchmarks/060.monte-carlo-pi/python/function.py
+# benchmarks/300.utilities/320.monte-carlo-pi/python/function.py
benchmarks/400.inference/411.image-recognition/python/function.py-29-29 (1)

29-29: File handle not closed.

Same issue as the other benchmark file.

-class_idx = json.load(open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r"))
+with open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r") as f:
+    class_idx = json.load(f)
benchmarks/400.inference/413.image-classification/python/function.py-18-18 (1)

18-18: File handle not closed.

Using open() without a context manager leaves the file handle open. Use a with statement.

-class_idx = json.load(open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r"))
+with open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r") as f:
+    class_idx = json.load(f)
benchmarks/400.inference/411.image-recognition/python/function.py-36-40 (1)

36-40: AttributeError if event keys are missing.

Unlike the 413.image-classification handler, this uses .get("bucket").get(...) without a default, which will raise AttributeError if the outer key is missing.

-    bucket = event.get("bucket").get("bucket")
-    input_prefix = event.get("bucket").get("input")
-    model_prefix = event.get("bucket").get("model")
-    key = event.get("object").get("input")
-    model_key = event.get("object").get("model")
+    bucket_info = event.get("bucket") or {}
+    object_info = event.get("object") or {}
+    bucket = bucket_info.get("bucket")
+    input_prefix = bucket_info.get("input")
+    model_prefix = bucket_info.get("model")
+    key = object_info.get("input")
+    model_key = object_info.get("model")
benchmarks/400.inference/413.image-classification/python/function.py-139-144 (1)

139-144: Potential AttributeError on missing event keys.

Chained .get().get() will raise AttributeError if the first .get() returns None. Use defensive access or validate the event structure.

-    bucket = event.get("bucket", {}).get("bucket")
-    input_prefix = event.get("bucket", {}).get("input")
-    model_prefix = event.get("bucket", {}).get("model")
-    key = event.get("object", {}).get("input")
-    model_key = event.get("object", {}).get("model")
+    bucket_info = event.get("bucket") or {}
+    object_info = event.get("object") or {}
+    bucket = bucket_info.get("bucket")
+    input_prefix = bucket_info.get("input")
+    model_prefix = bucket_info.get("model")
+    key = object_info.get("input")
+    model_key = object_info.get("model")
🧹 Nitpick comments (39)
benchmarks/000.microbenchmarks/010.sleep/python/function.py (1)

4-9: Formatting change looks good; consider validating sleep input

The switch to double quotes is fine and behavior-preserving. However, event.get("sleep") can return None (or a non‑numeric value), which will cause sleep() to raise a TypeError at runtime. If you want a clearer error and stricter contract for this microbenchmark, you could validate the parameter before calling sleep:

 def handler(event):

     # start timing
-    sleep_time = event.get("sleep")
-    sleep(sleep_time)
-    return {"result": sleep_time}
+    sleep_time = event.get("sleep")
+    if sleep_time is None:
+        raise ValueError("Missing 'sleep' key in event")
+
+    try:
+        sleep_time = float(sleep_time)
+    except (TypeError, ValueError) as exc:
+        raise ValueError("'sleep' must be a number") from exc
+
+    sleep(sleep_time)
+    return {"result": sleep_time}
benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/input.py (1)

8-17: Handle Ruff ARG001 unused-argument warnings in generate_input (optional)

The full parameter list is needed to match the common benchmark input interface, so leaving these arguments unused is fine. If you want to keep Ruff quiet, you could either (a) prefix unused parameters with _ or (b) add a # noqa: ARG001 on the function definition; both are purely cosmetic and don’t affect behavior.

benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/function.py (3)

13-36: Consider using JAX's built-in convolution operations.

The manual implementation using nested lax.scan is functional but less efficient than JAX's optimized lax.conv_general_dilated or jax.scipy.signal.correlate2d. If the goal is benchmarking convolution performance rather than the scan pattern itself, built-in ops would better represent real-world usage.


68-79: Minor: Prefer explicit numpy dtype for numpy operations.

Using jnp.float32 with numpy's random generator works (since it resolves to numpy.float32), but using np.float32 or the string "float32" would be clearer and avoid confusion about library boundaries.

 def initialize(N, W, H, C1, C2):
     from numpy.random import default_rng
+    import numpy as np

     rng = default_rng(42)

     # Input
-    input = rng.random((N, H, W, C1), dtype=jnp.float32)
+    input = rng.random((N, H, W, C1), dtype=np.float32)
     # Weights
-    conv1 = rng.random((1, 1, C1, C2), dtype=jnp.float32)
-    conv2 = rng.random((3, 3, C2, C2), dtype=jnp.float32)
-    conv3 = rng.random((1, 1, C2, C1), dtype=jnp.float32)
+    conv1 = rng.random((1, 1, C1, C2), dtype=np.float32)
+    conv2 = rng.random((3, 3, C2, C2), dtype=np.float32)
+    conv3 = rng.random((1, 1, C2, C1), dtype=np.float32)
     return (input, conv1, conv2, conv3)

109-112: Broad exception handling silently swallows errors.

Catching bare Exception and passing silently can mask real issues. Consider logging the exception or catching a more specific exception type (e.g., jax.errors.TracerArrayConversionError).

     try:
         results = jax.device_get(results)
-    except Exception:
-        pass
+    except (AttributeError, TypeError):
+        # results may already be a host array
+        pass
benchmarks/400.inference/413.image-classification/python/requirements.txt.3.11 (1)

1-4: Consider pinning all versions for benchmark reproducibility.

The mixed versioning strategy (>= for numpy/pillow, == for torch/torchvision) may reduce benchmark reproducibility. Additionally, verify that torch 2.5.1 and torchvision 0.20.1 are compatible with each other and with Python 3.11.

Consider pinning specific versions for numpy and pillow:

-numpy>=2.0
-pillow>=10.0
+numpy==2.1.3
+pillow==11.0.0
 torch==2.5.1
 torchvision==0.20.1

To verify torch/torchvision compatibility:

Are PyTorch 2.5.1 and torchvision 0.20.1 compatible with each other?
benchmarks/400.inference/412.language-bert/python/requirements.txt.3.10 (1)

1-3: No security vulnerabilities found; consider updating to newer versions.

Verification confirms that numpy 1.24.4, onnxruntime-gpu 1.16.3, and tokenizers 0.13.3 have no known CVEs and support Python 3.10 (numpy requires >=3.8). However, significantly newer versions are available: numpy 2.3.5, onnxruntime-gpu 1.23.2, and tokenizers 0.22.1. While the current pinned versions are not vulnerable, updating to newer releases would improve compatibility and maintenance.

.pre-commit-config.yaml (1)

1-30: Pre-commit hook setup looks consistent and well-scoped

Local flake8/black hooks with pinned versions and files: ^(sebs/|benchmarks/) are a solid setup; nothing blocking here. Only minor nit: the flake8 (project env) name is slightly misleading given language: python (pre-commit venv), but that’s purely cosmetic.

benchmarks/000.microbenchmarks/040.server-reply/input.py (1)

1-11: generate_input matches suite pattern; consider handling unused-arg linting

The size_generators mapping and {"sleep": size_generators[size]} shape are consistent with other microbenchmark inputs. All parameters except size are intentionally unused here, so tools like Ruff (ARG001) will complain. If you plan to enforce those checks, consider either prefixing unused parameters with _ or adding a # noqa: ARG001 on the function definition, rather than changing the signature shape expected by the harness.

benchmarks/400.inference/413.image-classification/python/package.sh (1)

16-17: Use null-delimited find/xargs for filenames with special characters.

Files with spaces or special characters will cause issues with the current approach.

-find -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip
-find -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip
+find . -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -print0 | xargs -0 -r strip
+find . -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -print0 | xargs -0 -r strip
.github/workflows/lint.yml (1)

3-5: Consider adding branch filters to reduce unnecessary CI runs.

The workflow triggers on all pushes and pull requests. You may want to filter to specific branches (e.g., master, main) to avoid running on every feature branch push.

 on:
-  push:
-  pull_request:
+  push:
+    branches: [master, main]
+  pull_request:
+    branches: [master, main]
benchmarks/400.inference/412.language-bert/python/package.sh (1)

25-34: Quote variables and handle cd failures in the torch zipping section.

-cd ${CUR_DIR}
+cd "${CUR_DIR}" || exit 1
 echo "Stripped size $(du -sh $1 | cut -f1)"

 TORCH_DIR=".python_packages/lib/site-packages/torch"
 if [ -d "$1/${TORCH_DIR}" ]; then
-	cd $1
-	zip -qr torch.zip ${TORCH_DIR}
-	rm -rf ${TORCH_DIR}
-	cd ${CUR_DIR}
+	cd "$1" || exit 1
+	zip -qr torch.zip "${TORCH_DIR}"
+	rm -rf "${TORCH_DIR}"
+	cd "${CUR_DIR}" || exit 1
 	echo "Torch-zipped size $(du -sh $1 | cut -f1)"
 fi
benchmarks/500.scientific/5xx.compute_jax_npbench/python/requirements.txt (1)

1-1: Consider pinning JAX and confirming CUDA12 is available where this runs.

Using jax[cuda12] without a version means future JAX releases (and CUDA/CuDNN constraints) can change behavior over time, which hurts reproducibility. It also assumes CUDA 12 drivers/libs are present on all runners that execute this benchmark. If that assumption doesn’t always hold, you may want a CPU-only installation for those environments instead, or at least pin a tested JAX version in a constraints file.

benchmarks/400.inference/413.image-classification/python/requirements.txt.3.9 (1)

1-4: Consider tightening numpy/pillow versions and verify PyTorch stack compatibility

You pin torch/torchvision exactly but allow numpy>=2.0 and pillow>=10.0. For a benchmark repo, you may want either exact pins or at least upper bounds (e.g., <3.0, <11.0) to avoid silent breakage from future major releases, and to keep results reproducible. Also please double‑check that torch==2.5.1/torchvision==0.20.1 and the chosen NumPy/Pillow versions are available and supported for your target Python 3.9 + CUDA/runtime combo.

benchmarks/400.inference/413.image-classification/python/imagenet_class_index.json (1)

1-1: Consider formatting JSON for readability.

The entire 1000-entry JSON is on a single line, making it difficult to read, diff, or maintain. Consider pretty-printing this file with proper indentation.

You can reformat using:

#!/bin/bash
# Pretty-print the JSON file
python -m json.tool benchmarks/400.inference/413.image-classification/python/imagenet_class_index.json > temp.json && mv temp.json benchmarks/400.inference/413.image-classification/python/imagenet_class_index.json
benchmarks/500.scientific/503.graph-bfs/input.py (1)

7-7: Consider adding KeyError handling for invalid size values.

If an invalid size is passed, size_generators[size] will raise a KeyError. Adding validation or a default would improve robustness.

-    return {"size": size_generators[size], "seed": 42}
+    if size not in size_generators:
+        raise ValueError(f"Invalid size '{size}'. Valid options: {list(size_generators.keys())}")
+    return {"size": size_generators[size], "seed": 42}
benchmarks/600.linearalgebra/604.cholesky/input.py (1)

7-7: Consider adding KeyError handling for invalid size values.

If an invalid size is passed, size_generators[size] will raise a KeyError. Adding validation improves robustness.

-    return {"size": size_generators[size], "seed": 42}
+    if size not in size_generators:
+        raise ValueError(f"Invalid size '{size}'. Valid options: {list(size_generators.keys())}")
+    return {"size": size_generators[size], "seed": 42}
benchmarks/500.scientific/5xx.channel_flow_jax_npbench/input.py (1)

17-17: Consider adding KeyError handling for invalid size values.

If an invalid size is passed, size_generators[size] will raise a KeyError. Adding validation improves robustness.

-    return {"size": size_generators[size]}
+    if size not in size_generators:
+        raise ValueError(f"Invalid size '{size}'. Valid options: {list(size_generators.keys())}")
+    return {"size": size_generators[size]}
benchmarks/600.linearalgebra/601.matmul/input.py (1)

7-7: Consider adding KeyError handling for invalid size values.

If an invalid size is passed, size_generators[size] will raise a KeyError. Adding validation improves robustness.

-    return {"size": size_generators[size], "seed": 42}
+    if size not in size_generators:
+        raise ValueError(f"Invalid size '{size}'. Valid options: {list(size_generators.keys())}")
+    return {"size": size_generators[size], "seed": 42}
benchmarks/400.inference/413.image-classification/python/init.sh (1)

1-10: Consider adding error handling for missing files.

The script doesn't check if imagenet_class_index.json exists before attempting to copy it, which could lead to silent failures.

 #!/bin/bash
+set -e
 
 DIR=$1
 VERBOSE=$2
 SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 path="${SCRIPT_DIR}/imagenet_class_index.json"
+
+if [ ! -f "${path}" ]; then
+  echo "Error: ${path} not found" >&2
+  exit 1
+fi
+
 if [ "$VERBOSE" = true ]; then
   echo "Update ${DIR} with json ${path}"
 fi
 cp "${path}" "${DIR}"
benchmarks/400.inference/411.image-recognition/input.py (1)

10-15: Tighten loop var naming and consider silencing unused-arg lints

  • Line 10: dirs is never used; Ruff’s B007 is valid. Renaming to _dirs (or _) keeps intent clear and silences the warning:
-    for root, dirs, files in os.walk(data_dir):
+    for root, _dirs, files in os.walk(data_dir):
  • Lines 29-31: size, output_paths, and nosql_func are unused but are part of the common generate_input interface. If you want Ruff clean while keeping the signature, you can either prefix them with _ or add a function-level noqa:
-def generate_input(
-    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
-):
+def generate_input(  # noqa: ARG001
+    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
+):

Optionally, if you intend to exercise more than the first image, you might later extend "object"]["input"] beyond input_images[0][0], but that’s non-blocking here.

benchmarks/600.linearalgebra/603.jacobi2d/input.py (1)

1-7: Interface looks good; consider addressing unused-arg warnings

The size_generators lookup and {"size": ..., "seed": 42} shape match other linear algebra benchmarks and look fine.

To keep Ruff happy without changing the public interface, you can mark the function as intentionally ignoring most parameters:

-def generate_input(
-    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
-):
+def generate_input(  # noqa: ARG001
+    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
+):

(or alternatively prefix unused parameters with _).

benchmarks/100.webapps/110.dynamic-html/input.py (1)

1-9: Input generator is fine; align unused-arg handling with the rest

Behavior here is simple and clear: random_len comes from size_generators[size] and a fixed "username" is set; that matches patterns in other webapp benchmarks.

As with other generate_input helpers, most parameters are unused. To avoid repeated Ruff ARG001 warnings while keeping the shared interface, consider:

-def generate_input(
-    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
-):
+def generate_input(  # noqa: ARG001
+    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
+):
benchmarks/600.linearalgebra/602.axpy/input.py (1)

1-7: AXPY input helper matches the common pattern; consider a shared approach to ARG001

This matches the other linear algebra inputs (matmul, jacobi2d, etc.): sizes via size_generators and a fixed seed, which looks correct.

Since this same generate_input signature is reused across many benchmarks and most arguments are intentionally unused, you might standardize how you silence Ruff, e.g.:

  • Add # noqa: ARG001 on each generate_input definition, or
  • Configure Ruff to ignore ARG001 for these files/patterns if you prefer to keep signatures untouched, or
  • Prefix the unused parameters with _.

Any of these will keep the interface consistent while avoiding noisy lints.

benchmarks/500.scientific/5xx.compute_jax_npbench/input.py (1)

1-17: Consistent NPBench sizing; same minor unused-arg consideration

The "test"/"small"/"large" shapes for M/N align with the other JAX NPBench inputs, and generate_input returning {"size": ...} is consistent.

As with other generate_input functions, the extra parameters are unused by design. If Ruff ARG001 is enforced here too, consider a function-level # noqa: ARG001 or _-prefixing the unused parameters to keep lint noise down without changing the call contract.

benchmarks/400.inference/412.language-bert/python/function.py (3)

62-66: Consider supporting CPU fallback for environments without CUDA.

Requiring CUDAExecutionProvider will cause the function to fail in CPU-only environments. If this is intentional for GPU benchmarking, consider documenting this constraint in the config or adding a more informative error message.


134-144: Add strict=True to zip() for safer iteration.

Using strict=True ensures sentences and probabilities have matching lengths, catching potential mismatches early.

-    for sentence, probs in zip(sentences, probabilities):
+    for sentence, probs in zip(sentences, probabilities, strict=True):

107-119: Missing validation for required event keys.

If bucket, model_prefix, text_prefix, or text_key are None, the function will fail with an unclear error during download. Consider adding explicit validation.

 def handler(event):
     bucket = event.get("bucket", {}).get("bucket")
     model_prefix = event.get("bucket", {}).get("model")
     text_prefix = event.get("bucket", {}).get("text")
     text_key = event.get("object", {}).get("input")
+
+    if not all([bucket, model_prefix, text_prefix, text_key]):
+        raise ValueError("Missing required event keys: bucket, model, text, or input")

     download_begin = datetime.datetime.now()
benchmarks/600.linearalgebra/602.axpy/python/function.py (1)

47-48: Prefix unused variable with underscore.

The y_out result is unused. Prefix it to signal intentional discard.

-    y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
+    _y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
benchmarks/600.linearalgebra/604.cholesky/python/function.py (1)

49-50: Prefix unused variable with underscore.

The L result from kernel_cholesky is unused. Prefix it to signal intentional discard.

-    L, gpu_ms = kernel_cholesky(A)
+    _L, gpu_ms = kernel_cholesky(A)
benchmarks/600.linearalgebra/603.jacobi2d/python/function.py (2)

56-63: Misleading variable names: matmul_begin/matmul_end for Jacobi iteration.

The variable names suggest matrix multiplication, but this is a Jacobi 2D stencil operation. Rename for clarity.

-    matmul_begin = datetime.datetime.now()
+    compute_begin = datetime.datetime.now()
     A_out, B_out, gpu_ms = kernel_jacobi2d(A, B, iters=50)
-    matmul_end = datetime.datetime.now()
+    compute_end = datetime.datetime.now()

     matrix_generating_time = (matrix_generating_end - matrix_generating_begin) / datetime.timedelta(
         microseconds=1
     )
-    matmul_time = (matmul_end - matmul_begin) / datetime.timedelta(microseconds=1)
+    compute_time = (compute_end - compute_begin) / datetime.timedelta(microseconds=1)

57-58: Prefix unused variables with underscores.

A_out and B_out are unused. Prefix them to signal intentional discard.

-    A_out, B_out, gpu_ms = kernel_jacobi2d(A, B, iters=50)
+    _A_out, _B_out, gpu_ms = kernel_jacobi2d(A, B, iters=50)
benchmarks/400.inference/413.image-classification/input.py (2)

18-27: Misplaced docstring—move inside the function.

The docstring appears as a standalone string literal at module level rather than as the function's actual docstring. Move it inside generate_input immediately after the def line.

-"""
-    Generate test, small and large workload for compression test.
-
-    :param data_dir: directory where benchmark data is placed
-    :param size: workload size
-    :param input_buckets: input storage containers for this benchmark
-    :param output_buckets:
-    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
-"""
-
-
 def generate_input(
     data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
 ):
-
+    """
+    Generate test, small and large workload for compression test.
+
+    :param data_dir: directory where benchmark data is placed
+    :param size: workload size
+    :param input_buckets: input storage containers for this benchmark
+    :param output_buckets:
+    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
+    """
     # upload model

8-15: Prefer os.path.join for path consistency.

Line 13 uses string concatenation (prefix + "/" + file) while line 14 correctly uses os.path.join. This inconsistency could cause issues on non-Unix systems and reduces readability.

 def upload_files(data_root, data_dir, upload_func):
-
-    for root, dirs, files in os.walk(data_dir):
+    for root, _, files in os.walk(data_dir):
         prefix = os.path.relpath(root, data_root)
         for file in files:
-            file_name = prefix + "/" + file
+            file_name = os.path.join(prefix, file)
             filepath = os.path.join(root, file)
             upload_func(0, file_name, filepath)
benchmarks/500.scientific/5xx.compute_jax_npbench/python/function.py (1)

48-51: Silent exception swallowing obscures failures.

The bare except Exception: pass hides potential issues. If device_get fails for reasons other than "already on host," the error is silently ignored. Consider catching a more specific exception or logging when an exception occurs.

     try:
         results = jax.device_get(results)
-    except Exception:
-        pass
+    except (TypeError, AttributeError):
+        # results already on host or not a device array
+        pass
benchmarks/600.linearalgebra/605.lu/python/function.py (1)

18-26: Python loops defeat GPU acceleration.

_kernel_lu uses explicit Python for loops with element-wise tensor indexing. This executes on CPU with GPU round-trips, negating any CUDA performance benefit. For GPU-accelerated LU, use torch.linalg.lu_factor or similar.

-def _kernel_lu(B: torch.Tensor) -> torch.Tensor:
-    n = B.shape[0]
-    for i in range(n):
-        for j in range(i):
-            B[i, j] = B[i, j] - (B[i, :j] @ B[:j, j])
-            B[i, j] = B[i, j] / B[j, j]
-        for j in range(i, n):
-            B[i, j] = B[i, j] - (B[i, :i] @ B[:i, j])
-    return B
+def _kernel_lu(A: torch.Tensor) -> torch.Tensor:
+    LU, pivots = torch.linalg.lu_factor(A)
+    return LU

If the custom algorithm is intentional for educational purposes, consider documenting this and using CPU tensors to avoid the GPU overhead.

benchmarks/400.inference/413.recommendation/python/function.py (1)

121-131: Add strict=True to zip for safety.

If payloads and scores have mismatched lengths due to a bug, the current code silently truncates. Using strict=True (Python 3.10+) would catch this.

-    for req, score in zip(payloads, scores):
+    for req, score in zip(payloads, scores, strict=True):
benchmarks/400.inference/411.image-recognition/python/function.py (1)

55-55: Deprecated pretrained parameter.

The pretrained parameter is deprecated in recent torchvision versions. Consider using weights=None instead.

-        model = resnet50(pretrained=False)
+        model = resnet50(weights=None)
benchmarks/500.scientific/5xx.channel_flow_jax_npbench/python/function.py (1)

265-268: Bare except clause suppresses all errors.

Catching Exception and silently passing can hide bugs. If device_get isn't needed for this result type, consider removing the try-except or logging the exception.

     try:
         results = jax.device_get(results)
-    except Exception:
-        pass
+    except TypeError:
+        # results is already a Python scalar, not a JAX array
+        pass

Comment on lines +9 to +17
"""
Generate test, small and large workload for thumbnailer.
:param data_dir: directory where benchmark data is placed
:param size: workload size
:param input_buckets: input storage containers for this benchmark
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
for file in glob.glob(os.path.join(data_dir, '*.mp4')):
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix docstring: incorrect benchmark name and outdated parameter documentation.

The docstring was not properly updated from the thumbnailer benchmark. It references "thumbnailer" and documents parameters as input_buckets / output_buckets, but the actual function uses input_paths and output_paths.

Apply this diff to correct the docstring:

-"""
-    Generate test, small and large workload for thumbnailer.
-
-    :param data_dir: directory where benchmark data is placed
-    :param size: workload size
-    :param input_buckets: input storage containers for this benchmark
-    :param output_buckets:
-    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
-"""
+"""
+    Generate test, small and large workload for video processing.
+
+    :param data_dir: directory where benchmark data is placed
+    :param size: workload size
+    :param benchmarks_bucket: bucket name for benchmark data
+    :param input_paths: input storage paths for this benchmark
+    :param output_paths: output storage paths
+    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
+    :param nosql_func: NoSQL function (currently unused)
+"""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""
Generate test, small and large workload for thumbnailer.
:param data_dir: directory where benchmark data is placed
:param size: workload size
:param input_buckets: input storage containers for this benchmark
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
for file in glob.glob(os.path.join(data_dir, '*.mp4')):
"""
"""
Generate test, small and large workload for video processing.
:param data_dir: directory where benchmark data is placed
:param size: workload size
:param benchmarks_bucket: bucket name for benchmark data
:param input_paths: input storage paths for this benchmark
:param output_paths: output storage paths
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
:param nosql_func: NoSQL function (currently unused)
"""
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/220.video-processing/input.py around lines 9 to 17,
the docstring still refers to the "thumbnailer" benchmark and lists parameters
input_buckets/output_buckets which are outdated; update the docstring to
reference the video-processing benchmark (or the correct benchmark name),
replace input_buckets/output_buckets with input_paths/output_paths, and ensure
the param descriptions match the actual function signature (data_dir, size,
input_paths, output_paths, upload_func) and their meanings.

Comment on lines +42 to +44
with tarfile.open(archive_path, "r:gz") as tar:
tar.extractall(MODEL_DIRECTORY)
os.remove(archive_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -name "function.py" -path "*412.language-bert*" | head -5

Repository: spcl/serverless-benchmarks

Length of output: 196


🏁 Script executed:

cat -n "benchmarks/400.inference/412.language-bert/python/function.py" | sed -n '35,50p'

Repository: spcl/serverless-benchmarks

Length of output: 974


🌐 Web query:

Python tarfile extractall filter parameter security best practice

💡 Result:

Short answer — treat tarfile.extractall(filter=...) as unsafe for untrusted archives until patched. Do not rely on the built‑in "data"/"tar" filters alone; instead validate/sanitize members yourself, extract into a temporary directory, and enforce path/symlink checks. See Python security advisories and CVE notes. [2][3][0]

Recommended minimal best practices

  • Avoid extracting untrusted tarballs when possible; prefer not accepting them at all. [3]
  • If you must extract, extract to an empty temp directory and only move vetted files into place after checks. [3][0]
  • Do not rely solely on the built‑in filter strings ("data"/"tar"); they have had bypasses (CVE advisories). [0][3][6]
  • Skip/deny special members: absolute paths, path traversal ("../"), hardlinks, symlinks, device nodes, and members that would change metadata outside target. [2][8]
  • Verify final extracted path stays inside intended directory using realpath/resolve checks before writing. [2]

Minimal safe extraction pattern (concept)

  • Open tar file (prefer streaming).
  • For each TarInfo member:
    • compute dest_path = (target_dir / member.name).resolve()
    • if not str(dest_path).startswith(str(target_dir.resolve()) + os.sep): skip/raise (prevents traversal).
    • if member.issym() or member.isdev() or member.islnk(): skip/deny.
    • for regular files: extract to a temp file, set safe permissions, then atomically rename into place.
  • After all members vetted and extracted to temp dir, move accepted files into final destination.
    (These steps implement the checks the filter parameter was intended to help with but must be done explicitly.) [2][8][0]

Example (short pseudo/Python sketch — adapt for your use):

  • Use tarfile.open(..., "r:*") and iterate tar.getmembers(). For each member:
    • target = (target_dir / member.name).resolve()
    • if not str(target).startswith(str(target_dir.resolve()) + os.sep): continue # skip traversal
    • if member.issym() or member.isdev() or member.islnk(): continue # skip dangerous types
    • if member.isreg(): with tar.extractfile(member) as src, write to a temp file, fsync, then rename.
      This pattern enforces path and type checks before creating filesystem entries. [2][8]

Patching and monitoring

  • Track Python security releases and apply patches that fix tarfile filter bypasses; CVEs were published for these issues. If a Python release explicitly fixes the issue, upgrade. Meanwhile, implement the manual checks above. [3][0][5]

Sources

  • Python tarfile docs (extraction filters, defaults). [2]
  • NVD / CVE advisory for tarfile extraction filter bypass. [3]
  • Public writeups/CVE summaries describing filter bypasses and recommended mitigations. [0][6]
  • PEP 706 (design of filter parameter / how filters are intended to work). [8]

If you want, I can produce a ready-to-run safe-extract Python function (with tests) that implements the checks above and handles symlinks/permissions correctly.


Add comprehensive validation to tarfile.extractall() to prevent path traversal and other extraction attacks.

Using extractall() without member validation allows malicious archives to write files outside the target directory via absolute paths, .. sequences, symlinks, or device files. Simply adding filter="data" is insufficient—security advisories document bypasses of the built-in filters.

The recommended approach is to extract to an empty temporary directory, validate each member (rejecting absolute paths, .. traversal, symlinks, device nodes), use realpath() checks to confirm extracted paths stay within the target directory, and move only validated files into place.

import tarfile
import tempfile
import os
from pathlib import Path

def safe_extract_tar(archive_path, target_dir):
    with tempfile.TemporaryDirectory() as temp_dir:
        with tarfile.open(archive_path, "r:gz") as tar:
            for member in tar.getmembers():
                # Reject dangerous member types
                if member.issym() or member.isdev() or member.islnk():
                    continue
                
                # Validate member path stays within target
                target_path = (Path(target_dir) / member.name).resolve()
                if not str(target_path).startswith(str(Path(target_dir).resolve()) + os.sep):
                    continue  # Skip path traversal attempts
                
                # Extract to temp location first
                tar.extract(member, temp_dir)
        
        # Move validated files to target
        temp_path = Path(temp_dir)
        for item in temp_path.rglob("*"):
            rel_path = item.relative_to(temp_path)
            dest = Path(target_dir) / rel_path
            dest.parent.mkdir(parents=True, exist_ok=True)
            if item.is_file():
                item.rename(dest)
🧰 Tools
🪛 Ruff (0.14.8)

43-43: Uses of tarfile.extractall()

(S202)

🤖 Prompt for AI Agents
In benchmarks/400.inference/412.language-bert/python/function.py around lines
42-44, the current tarfile.extractall() usage is vulnerable to path traversal
and extraction attacks; replace it with a safe extraction flow: open the archive
into a temporary empty directory (use tempfile.TemporaryDirectory()), iterate
tar.getmembers(), skip/ignore dangerous members (absolute paths, members with
leading "..", symlinks, hard links, and device nodes), for each allowed member
compute the intended target Path(target_dir)/member.name and call resolve() to
ensure its realpath starts with the resolved target_dir path before extracting
that member into the temp dir, then after all validated members are extracted
move/rename only those validated files from the temp dir into MODEL_DIRECTORY
(creating parent dirs as needed); ensure proper cleanup and preserve file
permissions where appropriate.

echo "Original size $(du -sh $1 | cut -f1)"

CUR_DIR=$(pwd)
cd $1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle cd failure to prevent destructive operations in wrong directory.

If cd $1 fails, subsequent rm -rf and find -delete commands will execute in the current directory, potentially deleting unintended files.

-cd $1
+cd "$1" || exit 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cd $1
cd "$1" || exit 1
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 7-7: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🤖 Prompt for AI Agents
In benchmarks/400.inference/413.image-classification/python/package.sh around
line 7, the script unconditionally runs "cd $1" which, if it fails, allows
subsequent destructive rm -rf and find -delete commands to run in the wrong
directory; update the script to quote the argument and guard the cd by testing
its exit status (e.g., if ! cd "$1"; then print an error to stderr and exit with
non‑zero status) or enable safe failure (set -e) before the cd so the script
exits immediately on failure.

Comment on lines +3 to +4
torch==2.5.1
torchvision==0.20.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

PyTorch 2.5.1 torchvision 0.20.1 compatibility version matrix

💡 Result:

Short answer: PyTorch 2.5.1 is compatible with torchvision 0.20.1 (and torchaudio 2.5.1). Installation commands and supported CUDA/ROCm/CPU wheels are listed on the PyTorch previous-versions page. [1][2]

Relevant references:

  • PyTorch docs — Previous versions (install commands for torch==2.5.1 + torchvision==0.20.1). [1]
  • PyTorch GitHub — PyTorch versions compatibility matrix (maps torch 2.5.x to torchvision 0.20.x). [2]

Would you like the exact pip/conda install commands for CUDA 11.8, 12.1, 12.4, ROCm, or CPU?


🌐 Web query:

PyTorch 2.5.1 ARM architecture support availability

💡 Result:

Short answer: Official PyTorch 2.5.1 builds exist for ARM64 in some vendor/distribution channels (e.g., AWS Deep Learning ARM64 AMIs include PyTorch 2.5.1), plus community/third‑party ARM builds; however, PyTorch upstream does not ship broad platform PyPI/Windows‑ARM wheels for every ARM target — support is mixed and often provided via vendor images, conda packages, or community builds. [1][2][3]

Sources:

  • AWS Deep Learning AMI release notes showing PyTorch 2.5.1 on ARM64 AMIs. [1]
  • PyTorch docs on Apple Silicon (M1/M2) support (macOS/MPS). [2]
  • Community third‑party ARM wheel project (example: maxisoft/pytorch-arm). [3]

🌐 Web query:

torchvision 0.20.1 ARM wheel available

💡 Result:

Yes — torchvision 0.20.1 has ARM wheels available:

  • manylinux (aarch64) and macOS ARM64 wheels listed for the 0.20.1 release. [1]
  • piwheels provides Raspberry Pi / aarch64 builds for 0.20.1. [2]
  • conda packages for linux-aarch64 and osx-arm64 are available from the PyTorch/Anaconda channels. [3]

References:
[1] Release file list for torchvision 0.20.1 (manylinux2014_aarch64, macosx_11_0_arm64). (simple-repository.app.cern.ch)
[2] piwheels entry showing torchvision 0.20.1 (2024-10-29). (piwheels.org)
[3] PyTorch/Anaconda package files including linux-aarch64 and osx-arm64 builds. (anaconda.org)

If you want, I can give the exact pip install command or a direct wheel URL for your platform.

Citations:


PyTorch 2.5.1 is not officially available for ARM64 on PyPI.

While torch 2.5.1 and torchvision 0.20.1 are compatible, PyTorch 2.5.1 lacks official ARM64 wheels on PyPI. The upstream PyTorch project only provides vendor/community builds (e.g., AWS Deep Learning AMIs). torchvision 0.20.1 does have official manylinux2014_aarch64 and osx-arm64 wheels, but the torch dependency cannot be satisfied via pip on generic ARM systems. Consider using an ARM-compatible PyTorch version with available wheels, or specify an alternative installation method (e.g., conda, vendor-specific builds).

🤖 Prompt for AI Agents
In
benchmarks/400.inference/413.image-classification/python/requirements.txt.arm.3.9
around lines 3-4, the pinned torch==2.5.1 has no official ARM64 wheels on PyPI
so pip installs will fail on generic ARM systems; update the requirement to an
ARM-compatible PyTorch release that provides aarch64 wheels (e.g., a supported
manylinux2014_aarch64 build) or replace the pip pin with installation guidance:
either switch to a conda/Miniforge package line that installs torch for aarch64,
point to a vendor/community wheel URL, or add a comments/instructions block in
this file explaining the alternative installation method and the exact command
to install a compatible torch build for ARM64.

Comment on lines +47 to +51
def _select_device():
if torch.cuda.is_available():
return torch.device("cuda")
raise RuntimeError("CUDA is not available")
return torch.device("cpu")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unreachable code after raise.

Line 51 (return torch.device("cpu")) is unreachable since line 50 unconditionally raises an exception. Either remove the dead code or restructure if CPU fallback was intended.

 def _select_device():
     if torch.cuda.is_available():
         return torch.device("cuda")
-    raise RuntimeError("CUDA is not available")
-    return torch.device("cpu")
+    # Fallback to CPU if CUDA unavailable
+    return torch.device("cpu")

Or if CUDA is truly required:

 def _select_device():
     if torch.cuda.is_available():
         return torch.device("cuda")
     raise RuntimeError("CUDA is not available")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _select_device():
if torch.cuda.is_available():
return torch.device("cuda")
raise RuntimeError("CUDA is not available")
return torch.device("cpu")
def _select_device():
if torch.cuda.is_available():
return torch.device("cuda")
# Fallback to CPU if CUDA unavailable
return torch.device("cpu")
🧰 Tools
🪛 Ruff (0.14.8)

50-50: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In benchmarks/400.inference/413.recommendation/python/function.py around lines
47 to 51, the return torch.device("cpu") on line 51 is unreachable because the
preceding raise RuntimeError("CUDA is not available") always aborts execution;
either remove the dead return or change the control flow so CPU is a valid
fallback (e.g., only raise when you truly require CUDA, otherwise return
torch.device("cpu") instead of raising), and ensure the function returns a
torch.device in all code paths.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

CVE-2025-32434 PyTorch torch.load vulnerability

💡 Result:

Summary — CVE-2025-32434

  • What it is: A Remote Command Execution (RCE) deserialization vulnerability in PyTorch’s torch.load when called with weights_only=True that can execute attacker-supplied code from a malicious .pt file. [1][2][3]
  • Affected versions: torch ≤ 2.5.1. Fixed in PyTorch 2.6.0. [1][2]
  • Severity: Critical (CVSS ~9.8). No privileges or user interaction required; remote/network attack vector when loading untrusted model files. [1][2]

Mitigation / recommended actions

  1. Upgrade PyTorch to 2.6.0 or later as soon as possible. [1][2]
  2. Never call torch.load (even with weights_only=True) on model files from untrusted sources. Treat .pt files as untrusted binary input. [1][3]
  3. If you must inspect untrusted models, do so in an isolated environment (sandbox, container, VM) or use tools that extract tensors without executing pickled objects. Consider file-format checks/conversion to known-safe formats before loading. [1][3]

References
[1] NVD CVE-2025-32434 entry. (nvd.nist.gov)
[2] PyTorch / GHSA advisory and vendor notes referenced in Ubuntu advisory. (ubuntu.com)
[3] Writeups / technical analysis (example: CVE news writeup). (cve.news)

If you want, I can: (a) fetch the official PyTorch advisory link, (b) show a minimal example exploit and a safe tensor-only loader pattern, or (c) scan your requirements file for vulnerable torch versions. Which would you like?

Citations:


🌐 Web query:

PyTorch 2.4.1 security vulnerabilities CVE 2025

💡 Result:

Short answer — yes.

Known 2025 CVEs that affect PyTorch 2.4.1 (summary):

  • CVE-2025-32434 — Remote command execution when loading untrusted model files with torch.load(weights_only=True). Affects PyTorch up to 2.5.1 (so includes 2.4.1); patched in 2.6.0. [1][2]

  • CVE-2025-2953 — local Denial‑of‑Service in torch.mkldnn_max_pool2d; affects versions < 2.7.1-rc1 (so includes 2.4.1); patched in 2.7.1-rc1. [3]

Other 2025 CVEs reported against PyTorch (examples) target 2.6.0+ builds (e.g., CVE-2025-2148, CVE-2025-2149) and therefore do not apply to 2.4.1. [4][5]

Recommended action: upgrade to a patched PyTorch release (at minimum 2.6.0 to fix CVE-2025-32434 and 2.7.1-rc1 or later to fix CVE-2025-2953), and treat all model files from untrusted sources as hostile (validate/sandbox before loading). [1][3]

Sources:
[1] NVD / CVE-2025-32434 (PyTorch torch.load weights_only RCE).
[2] CVE news / writeups on CVE-2025-32434.
[3] GitHub Advisory / GHSA (CVE-2025-2953) and related NVD record.
[4] NVD CVE-2025-2148.
[5] NVD/CNA entries for other 2025 PyTorch CVEs.


🌐 Web query:

GHSA-53q9-r3pm-6pq6 PyTorch torch.load RCE

💡 Result:

Short summary

  • Advisory GHSA-53q9-r3pm-6pq6 / CVE-2025-32434: torch.load(..., weights_only=True) in PyTorch ≤ 2.5.1 can still lead to remote code execution (deserialization RCE). The issue is fixed in PyTorch 2.6.0. [1][2]

Impact

  • Loading a malicious .pt model with torch.load(weights_only=True) may execute arbitrary code; treat as critical. [1][2]

Mitigation / action

  • Upgrade PyTorch to >= 2.6.0. [1][2]
  • Do not load model files from untrusted sources; if you must, load them in an isolated/sandboxed environment (container/VM) or inspect/convert them offline. [1][3]

References


Upgrade torch to version 2.6.0 or later to address critical RCE vulnerability.

PyTorch versions ≤2.5.1 are vulnerable to CVE-2025-32434 (GHSA-53q9-r3pm-6pq6), a critical remote code execution vulnerability in torch.load with weights_only=True that was fixed in version 2.6.0. The vulnerability carries a CVSS score of 9.8, posing severe risks to systems using this benchmark suite.

Apply this fix:

-torch==2.4.1
+torch==2.6.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
torch==2.4.1
torch==2.6.0
🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/601.matmul/python/requirements.txt lines 1-1,
the pinned torch version is 2.4.1 which is vulnerable; update the requirement to
a non-vulnerable release by changing the spec to at least 2.6.0 (for example use
"torch>=2.6.0,<3.0" or pin to a specific safe patch like "torch==2.6.0" or
later), then run the benchmark's CI/tests to ensure compatibility and update any
code that relies on deprecated APIs if tests fail.

Comment on lines +32 to +43
def handler(event):
size = event.get("size")
if "seed" in event:
import random

random.seed(event["seed"])

seed = event.get("seed", 42)
seed = int(seed)

gen_begin = datetime.datetime.now()
alpha, x, y = initialize_torch(size, dtype=torch.float32, device="cuda", seed=seed)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: seed variable is undefined when "seed" is not in event.

When "seed" is not present in the event, the code skips the if block but still references seed on line 43, causing an UnboundLocalError.

 def handler(event):
     size = event.get("size")
+    seed = 42  # default seed
     if "seed" in event:
         import random
-
         random.seed(event["seed"])
-
-        seed = event.get("seed", 42)
         seed = int(seed)

     gen_begin = datetime.datetime.now()
     alpha, x, y = initialize_torch(size, dtype=torch.float32, device="cuda", seed=seed)

Note: The random.seed() call appears unused since the function uses torch.manual_seed() in initialize_torch(). Consider removing it if Python's random module isn't used elsewhere.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/602.axpy/python/function.py around lines 32 to
43, the variable `seed` is only set inside the `if "seed" in event:` block but
is referenced later unconditionally, causing an UnboundLocalError when the event
has no seed; fix by ensuring `seed` is always defined (e.g., set seed =
int(event.get("seed", 42)) before or outside the if-block), remove the unused
`random.seed(event["seed"])` and `import random` if Python's random module is
not used elsewhere, or otherwise only call and import random when it is needed.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check PyTorch version and security advisories

echo "Checking latest PyTorch version from PyPI..."
curl -s https://pypi.org/pypi/torch/json | jq -r '.info.version' 2>/dev/null || echo "Failed to fetch from PyPI"

echo -e "\nChecking available torch versions (last 20)..."
curl -s https://pypi.org/pypi/torch/json | jq -r '.releases | keys[]' | tail -20

Repository: spcl/serverless-benchmarks

Length of output: 288


🌐 Web query:

PyTorch 2.4.1 security vulnerabilities CVE

💡 Result:

Short answer — yes.

Known CVEs affecting PyTorch 2.4.1 (or earlier) include:

  • CVE-2024-48063 — RemoteModule deserialization RCE (affects torch <= 2.4.1). [1][3]
  • CVE-2025-32434 — RCE when loading models with torch.load(..., weights_only=True) (affects <= 2.5.1; 2.4.1 is vulnerable). [2]

Recommendation: upgrade to a patched release (CVE-2024-48063 was addressed in 2.5.0+; CVE-2025-32434 was patched in 2.6.0). Verify changelogs and your deployment (especially any use of torch.distributed/RemoteModule or torch.load with untrusted inputs) before upgrading. [1][2][3]

Sources:
[1] NVD — CVE-2024-48063.
[2] NVD — CVE-2025-32434.
[3] CVE Details / vulnerability listings for CVE-2024-48063.


🌐 Web query:

PyTorch latest version 2025 current release

💡 Result:

The latest stable PyTorch release as of December 15, 2025 is PyTorch 2.9 (released October 27, 2025). [1]

Sources:

Citations:


Upgrade PyTorch to 2.6.0 or later to address critical security vulnerabilities.

PyTorch 2.4.1 is outdated and contains two critical RCE vulnerabilities:

  • CVE-2024-48063 (RemoteModule deserialization RCE) — patched in 2.5.0+
  • CVE-2025-32434 (torch.load RCE with weights_only=True) — patched in 2.6.0

The latest version is 2.9.1. Update the pinned version to at least 2.6.0 (or later).

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/602.axpy/python/requirements.txt lines 1–1, the
pinned PyTorch version torch==2.4.1 contains critical RCE vulnerabilities;
update the requirement to a secure release (at least 2.6.0). Replace the exact
pin with a safe constraint such as torch>=2.6.0,<3.0 or pin to a specific
patched release (e.g., torch==2.9.1), then run your project's dependency
resolver/build/tests to ensure compatibility and update any lockfiles or CI
images that rely on the old version.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n benchmarks/600.linearalgebra/604.cholesky/python/requirements.txt

Repository: spcl/serverless-benchmarks

Length of output: 89


🌐 Web query:

CVE-2025-32434 PyTorch torch.load vulnerability

💡 Result:

Short summary — CVE-2025-32434 (critical RCE)

  • What it is: a remote code‑execution vulnerability in PyTorch’s torch.load when called with weights_only=True; legacy/alternate deserialization paths could still execute pickled Python objects. [1][2]
  • Affected versions: PyTorch <= 2.5.1. Patched in 2.6.0. [1][2]
  • Impact: loading a malicious .pt/.pth file with torch.load(..., weights_only=True) on a vulnerable install can execute arbitrary commands (full RCE). [1][2]
  • Immediate mitigation: upgrade PyTorch to >= 2.6.0; do not load model artifacts from untrusted sources; if you must, inspect/convert files offline or load inside isolated/sandboxed environments (containers, VMs) and follow least privilege. [1][2]

Sources:
[1] PyTorch GitHub security advisory GHSA-53q9-r3pm-6pq6 (CVE-2025-32434).
[2] NVD entry for CVE-2025-32434.


🏁 Script executed:

# Find all requirements.txt files mentioning torch to check for consistency with other files
find . -name "requirements.txt" -type f | head -20 | xargs grep -l "torch" 2>/dev/null

Repository: spcl/serverless-benchmarks

Length of output: 556


🏁 Script executed:

grep "torch" ./benchmarks/300.utilities/320.monte-carlo-pi/python/requirements.txt ./benchmarks/400.inference/411.image-recognition/python/requirements.txt ./benchmarks/400.inference/413.image-classification/python/requirements.txt ./benchmarks/400.inference/413.recommendation/python/requirements.txt ./benchmarks/600.linearalgebra/601.matmul/python/requirements.txt ./benchmarks/600.linearalgebra/602.axpy/python/requirements.txt ./benchmarks/600.linearalgebra/603.jacobi2d/python/requirements.txt

Repository: spcl/serverless-benchmarks

Length of output: 1092


🌐 Web query:

CVE-2025-32434 CVSS score v4

💡 Result:

CVE-2025-32434 — CVSS v4.0 base score: 9.3 (CRITICAL). Vector: CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N. [1][2]

Sources: NVD and INCIBE. [1] NVD entry for CVE-2025-32434. [2] INCIBE advisory for CVE-2025-32434.


Upgrade torch to version 2.6.0 or later to address critical RCE vulnerability.

PyTorch versions ≤2.5.1 are vulnerable to CVE-2025-32434, a critical remote code execution vulnerability in torch.load that was fixed in version 2.6.0. The vulnerability carries a CVSS v4.0 score of 9.3, posing severe risks to systems using this benchmark suite.

-torch==2.4.1
+torch==2.6.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
torch==2.4.1
torch==2.6.0
🧰 Tools
🪛 OSV Scanner (2.3.0)

[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2024-259)


[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2025-41)


[CRITICAL] 1-1: torch 2.4.1: PyTorch susceptible to local Denial of Service

(GHSA-3749-ghw9-m3mg)


[CRITICAL] 1-1: torch 2.4.1: PyTorch: torch.load with weights_only=True leads to remote code execution

(GHSA-53q9-r3pm-6pq6)


[CRITICAL] 1-1: torch 2.4.1: PyTorch Improper Resource Shutdown or Release vulnerability

(GHSA-887c-mr87-cxwp)

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/604.cholesky/python/requirements.txt around line
1, the project pins torch to 2.4.1 which is vulnerable (CVE-2025-32434); update
the requirement to torch>=2.6.0 (or a specific 2.6.x/2.7.x release) to ensure
the patched version is used, run your dependency install/test matrix to confirm
compatibility, and update any CI/environment constraints or lockfiles so the new
minimum version is enforced.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and read its content
cat benchmarks/600.linearalgebra/605.lu/python/requirements.txt

Repository: spcl/serverless-benchmarks

Length of output: 82


🌐 Web query:

CVE-2025-32434 PyTorch torch.load RCE vulnerability

💡 Result:

Short summary

  • CVE-2025-32434 is a critical RCE in PyTorch’s torch.load when called with weights_only=True; an attacker-supplied .pt/.pth (pickle/tar) file can execute arbitrary code during load. Published Apr 17–18, 2025. [1][2]

Affected versions

  • torch / PyTorch <= 2.5.1. Patched in 2.6.0. [1][3]

Impact

  • Remote code execution on any system that loads an attacker-controlled model with torch.load(weights_only=True). Exploitation requires only loading the crafted file (no privileges or user interaction). [1][2]

Mitigation / recommended actions

  1. Upgrade to PyTorch 2.6.0 or later immediately (pip/conda upgrade your torch package). [1][3]
    • pip: pip install --upgrade torch
  2. Avoid loading model files from untrusted sources. Treat .pt/.pth/.tar files as untrusted pickles. [1][2]
  3. If you must inspect untrusted models, do so in isolated/sandboxed environments (containers, VMs) or convert/inspect files offline before loading. [2][3]
  4. Where feasible, validate model files or prefer secure, non-pickle formats and signature/verification workflows for third‑party models.

References

  • PyTorch GitHub security advisory (GHSA-53q9-r3pm-6pq6). [1]
  • NVD / CVE-2025-32434 entry. [2]
  • Ubuntu / INCIBE advisories summarizing affected versions and CVSS. [3][4]

🌐 Web query:

PyTorch 2.6.0 security vulnerability fix torch.load

💡 Result:

Short answer: Yes — a critical RCE in torch.load(weights_only=True) (CVE-2025-32434) affecting torch < 2.6.0 was disclosed in April 2025 and was patched in PyTorch 2.6.0. Upgrade to >=2.6.0 (or avoid loading untrusted .pt/.pth files) to mitigate. [1][2][3]

Details and links:

  • Vulnerability: torch.load with weights_only=True can be exploited for remote code execution (affects <=2.5.1). Disclosed Apr 17–18, 2025. [2][1]
  • Patched version: 2.6.0. Upgrade (pip): pip install --upgrade torch. [2][3]
  • Mitigations if you cannot upgrade immediately: do not load untrusted model files; use safer formats (e.g., safetensors) or isolate loading in a sandbox. [4][5]

Sources:
[1] GitHub Advisory / PyTorch security advisory (GHSA-53q9-r3pm-6pq6).
[2] CVE summary pages for CVE-2025-32434.
[3] IBM security bulletin noting the patch in 2.6.0.
[4] News coverage & vulnerability trackers recommending safetensors and upgrades.
[5] Vendor advisory / PyTorch security page.


Upgrade torch to version 2.6.0 or later to address critical RCE vulnerability.

PyTorch versions ≤2.5.1 are vulnerable to CVE-2025-32434, a critical remote code execution flaw in torch.load with weights_only=True that was patched in version 2.6.0. The vulnerability allows arbitrary code execution when loading attacker-controlled .pt/.pth files.

-torch==2.4.1
+torch==2.6.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
torch==2.4.1
torch==2.6.0
🧰 Tools
🪛 OSV Scanner (2.3.0)

[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2024-259)


[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2025-41)


[CRITICAL] 1-1: torch 2.4.1: PyTorch susceptible to local Denial of Service

(GHSA-3749-ghw9-m3mg)


[CRITICAL] 1-1: torch 2.4.1: PyTorch: torch.load with weights_only=True leads to remote code execution

(GHSA-53q9-r3pm-6pq6)


[CRITICAL] 1-1: torch 2.4.1: PyTorch Improper Resource Shutdown or Release vulnerability

(GHSA-887c-mr87-cxwp)

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/605.lu/python/requirements.txt around lines 1 to
1, the pinned dependency torch==2.4.1 is vulnerable; update the requirement to
torch>=2.6.0 (or a specific safe release like 2.6.0 or later) to address
CVE-2025-32434, then regenerate any dependency lockfiles/virtualenvs, run the
test suite to ensure compatibility, and update any CI images or docs referencing
the old version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants