Skip to content
This repository was archived by the owner on Aug 15, 2025. It is now read-only.
This repository was archived by the owner on Aug 15, 2025. It is now read-only.

aarch64 linux: torch.compile performance is 2x slow with nightly torch wheel compared to the wheel built with 'build_aarch64_wheel.py' script #1774

@snadampal

Description

@snadampal

For torchbench benchmarks with dynamo backend, the aarch64 linux nightly wheel performance is 2x slow compared to the wheel I've built using the pytorch/builder/build_aarch64_wheel.py script for the same pytorch commit.

The difference seems to be coming from
the https://github.com/pytorch/builder/blob/main/aarch64_linux/aarch64_ci_build.sh used for nightly builds. I suspect it's with the libomp.

How to reproduce?

git clone https://github.com/pytorch/benchmark.git
cd benchmark

# apply this PR: https://github.com/pytorch/benchmark/pull/2187

# setting omp threads =16, because i'm using c7g.4xl instance

OMP_NUM_THREADS=16 python3 run_benchmark.py cpu --model hf_DistilBert --test eval --torchdynamo inductor --freeze_prepack_weights --metrics="latencies,cpu_peak_mem"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions