PSAL-POSTECH · YWHyuk · May 22, 2026
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -19,7 +19,7 @@ If the issue occurs while running a Python workload or involves a simulator cras
 
 For example:
 ```
-python3 tests/test_add.py
+python3 tests/ops/elementwise/test_add.py
 ...
 [SpikeSimulator] cmd> spike --isa rv64gcv --varch=vlen:256,elen:64 --vectorlane-size=128 \
   -m0x80000000:0x1900000000,0x2000000000:0x1000000 \

diff --git a/.github/workflows/pytorchsim_test.yml b/.github/workflows/pytorchsim_test.yml
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -23,7 +23,7 @@ The pipeline runs in that order on every `torch.compile` invocation; you'll see
 | `TOGSim/` | C++ TOGSim source. `src/Simulator.cc`, `Core.cc`, `Dram.cc`, `Interconnect.cc`, `L2Cache.cc`, `Tile.cc`, `TileGraph.cc` are the core models. Externals: ramulator2, booksim, stonneCore, onnx, protobuf, spdlog, yaml-cpp |
 | `AsmParser/` | `tog_generator.py`, `onnx_utility.py` — TOG generation from ONNX/ASM |
 | `configs/` | TOGSim hardware configs (YAML). The default is `systolic_ws_128x128_c1_simple_noc_tpuv3.yml`. Naming pattern: `systolic_ws_<size>_c<cores>_<noc>_<target>.yml` |
-| `tests/` | ~36 op- and model-level tests. Subdirs `DeepSeek/`, `Diffusion/`, `Llama/`, `MLP/`, `Mixtral_8x7B/`, `MoE/`, `Yolov5/`, `Fusion/` for whole-model workloads |
+| `tests/` | Op- and model-level tests organized under `ops/<family>/` (elementwise, reduce, gemm, conv, attention, view, sort, sparsity, misc, fusion), `models/<name>/` (Llama, Mixtral8x7B, DeepSeek, Diffusion, MoE, MLP, MobileNet, Yolov5) plus single-file model tests (test_resnet, test_transformer, test_vit, test_mlp, test_single_perceptron), and `system/` (scheduler, eager, hetro, stonne, vectorops). Shared helper: `tests/_utils.py` |
 | `experiments/artifact/` | Paper reproduction scripts (`cycle_validation/run_cycle.sh`, `speedup/run_speedup.sh`) |
 | `scripts/` | One-off experiment runners (CompilerOpt, ILS, batch, chiplet, sparsity, stonne, end2end). `build_from_source.sh` builds gem5/llvm/spike |
 | `gem5_script/` | gem5 wrapper scripts called by `CycleSimulator` |
@@ -36,16 +36,16 @@ The pipeline runs in that order on every `torch.compile` invocation; you'll see
 Most tests follow the same pattern: build CPU reference, compile via `torch.compile` on `npu:0`, compare with `torch.allclose` (rtol=atol=1e-4). They all have `if __name__ == "__main__"` blocks.
 
 ```bash
-python tests/test_add.py        # vector add (smoke test, fastest)
-python tests/test_matmul.py     # GEMM
-python tests/test_mlp.py        # MLP forward + backward (training path)
-python tests/test_scheduler.py  # multi-tenant launch_model
-python tests/test_eager.py      # eager-fallback registration
+python tests/ops/elementwise/test_add.py        # vector add (smoke test, fastest)
+python tests/ops/gemm/test_matmul.py     # GEMM
+python tests/models/test_mlp.py        # MLP forward + backward (training path)
+python tests/system/test_scheduler.py  # multi-tenant launch_model
+python tests/system/test_eager.py      # eager-fallback registration
 ```
 
-Run a model from `tests/Llama/`, `tests/DeepSeek/`, etc. similarly.
+Run a model from `tests/models/Llama/`, `tests/models/DeepSeek/`, etc. similarly.
 
-**CI coverage:** the GitHub Actions workflow `.github/workflows/pytorchsim_test.yml` runs an **explicit allowlist** of `tests/*.py` files (~40 jobs, one Docker container per test). Adding a new file under `tests/` does *not* automatically gate PRs — register it in `pytorchsim_test.yml` if you want CI to exercise it. Conversely, files like `tests/test_gqa.py`, `tests/test_gqa_decode.py`, and `tests/test_eager.py` exist in the repo but are *not* in CI, so local validation is the only safety net for them.
+**CI coverage:** the GitHub Actions workflow `.github/workflows/pytorchsim_test.yml` runs an **explicit allowlist** of `tests/*.py` files (~40 jobs, one Docker container per test). Adding a new file under `tests/` does *not* automatically gate PRs — register it in `pytorchsim_test.yml` if you want CI to exercise it. Conversely, files like `tests/ops/attention/test_gqa.py`, `tests/ops/attention/test_gqa_decode.py`, and `tests/system/test_eager.py` exist in the repo but are *not* in CI, so local validation is the only safety net for them.
 
 **For fast iteration** (skip functional check):
 ```bash
@@ -123,7 +123,7 @@ Conan deps for TOGSim: `boost/1.79.0`, `robin-hood-hashing/3.11.5`, `spdlog/1.11
 - **Adding a PyTorch device op:** `PyTorchSimDevice/csrc/aten/native/*` (Minimal/Extra split mirrors `torch_openreg`).
 - **TOGSim hardware model changes:** `TOGSim/src/{Core,Dram,Interconnect,L2Cache,Tile,TileGraph}.cc` + matching `include/*.h`.
 - **TOG generation:** `AsmParser/tog_generator.py` builds the raw graph and serializes it via `AsmParser/onnx_utility.py` to **ONNX, which is the on-disk TOG format** consumed by TOGSim.
-- **Eager fallback registration:** `torch.npu.register_eager_to_compile([...])` — see `tests/test_eager.py`.
+- **Eager fallback registration:** `torch.npu.register_eager_to_compile([...])` — see `tests/system/test_eager.py`.
 - **Per-run results:** `togsim_results/<YYYYMMDD_HHMMSS_<hash>>.log` (stats) and `.trace` (instruction trace). The path is also printed at the end of every run.
 - **Wrapper codegen path:** printed as `Wrapper Codegen Path = /tmp/torchinductor_<user>/<hash>/...py` — useful for inspecting generated kernel code and tensor names for `SRAM_BUFFER_PLAN_PATH`.
 

diff --git a/README.md b/README.md
@@ -40,15 +40,15 @@ PyTorchSim **supports**:
 |---|:-:|:-:|---|
 | ResNet-18 | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | channel last format |
 | ResNet-50 | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | channel last format |
-| MobileNet-v2 | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | `tests/MobileNet/` (torchvision) |
-| YOLOv5 | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | `tests/Yolov5/` |
+| MobileNet-v2 | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | `tests/models/MobileNet/` (torchvision) |
+| YOLOv5 | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | `tests/models/Yolov5/` |
 | BERT | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ |  |
 | GPT-2 | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ |  |
-| ViT | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | `tests/test_vit.py` |
+| ViT | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | `tests/models/test_vit.py` |
 | Mistral | <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" width="20"/> | ✅ | |
 | Stable-diffusion v1 | 🤗 | ✅ |  |
-| Llama 2/3 | 🤗 | ✅ | `tests/Llama/` (blocks & decode-style paths) |
-| DeepSeek-V3 (base) | 🤗 | ✅ | `tests/DeepSeek/` — several ops(e.g., gate ops) are not cycle-modeled |
+| Llama 2/3 | 🤗 | ✅ | `tests/models/Llama/` (blocks & decode-style paths) |
+| DeepSeek-V3 (base) | 🤗 | ✅ | `tests/models/DeepSeek/` — several ops(e.g., gate ops) are not cycle-modeled |
 | Llama-4 | 🤗 | ⏳ | In development |
 | Broader model support | — | ⏳ | In development |
 <!-- ## Requirements
@@ -104,7 +104,7 @@ The script clones each dep at the tag pinned in [`thirdparty/github-releases.jso
 ### Run Examples
 The `tests` directory contains several AI workload examples.
 ```bash
-python tests/test_matmul.py 
+python tests/ops/gemm/test_matmul.py 
 ```
 The result is written to `${TORCHSIM_LOG_PATH}/togsim_result/XXX.log`. The log file contains detailed core, memory, and interconnect stats.
 
@@ -201,7 +201,7 @@ optimizer.zero_grad()
 loss.backward()
 compiled_step()
 ```
-`tests/test_mlp.py` provides an example of MLP training.
+`tests/models/test_mlp.py` provides an example of MLP training.
 
 ## One TOGSim session, one continuous log
 
@@ -243,7 +243,7 @@ with TOGSimulator(config_path=config):
 Here `synchronize()` acts as a barrier: it does not return until every `launch_model` issued **above** it has finished in the simulator. The later pair of `launch_model` calls therefore runs only after those earlier models have fully completed—so the sync is the point in the timeline where **all preceding launches are done**.
 
 ```bash
-python tests/test_scheduler.py
+python tests/system/test_scheduler.py
 ```
 
 Use a TOGSim config(`.yml`) that defines **partitions** when mapping queues to cores, for example:

diff --git a/scripts/sparsity_experiment/run.sh b/scripts/sparsity_experiment/run.sh
@@ -6,48 +6,48 @@ export TORCHSIM_FORCE_TIME_N=8
 
 OUTPUT_DIR="12GB"
 export TOGSIM_CONFIG="/workspace/PyTorchSim/configs/systolic_ws_8x8_c1_12G_simple_noc.yml"
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
 
 OUTPUT_DIR="24GB"
 export TOGSIM_CONFIG="/workspace/PyTorchSim/configs/systolic_ws_8x8_c1_24G_simple_noc.yml"
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
 
 OUTPUT_DIR="48GB"
 export TOGSIM_CONFIG="/workspace/PyTorchSim/configs/systolic_ws_8x8_c1_48G_simple_noc.yml"
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
 
 OUTPUT_DIR="12GB_2core"
 export TOGSIM_CONFIG="/workspace/PyTorchSim/configs/systolic_ws_8x8_c2_12G_simple_noc.yml"
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
 
 OUTPUT_DIR="24GB_2core"
 export TOGSIM_CONFIG="/workspace/PyTorchSim/configs/systolic_ws_8x8_c2_24G_simple_noc.yml"
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
 
 OUTPUT_DIR="48GB_2core"
 export TOGSIM_CONFIG="/workspace/PyTorchSim/configs/systolic_ws_8x8_c2_48G_simple_noc.yml"
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
-python3 /workspace/PyTorchSim/tests/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.0  > ${OUTPUT_DIR}/0.0
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.2  > ${OUTPUT_DIR}/0.2
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.4  > ${OUTPUT_DIR}/0.4
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.6  > ${OUTPUT_DIR}/0.6
+python3 /workspace/PyTorchSim/tests/ops/sparsity/test_sparsity.py --sparsity  0.8  > ${OUTPUT_DIR}/0.8
diff --git a/scripts/stonne_experiment/run.sh b/scripts/stonne_experiment/run.sh
@@ -2,8 +2,8 @@
 export TORCHSIM_FORCE_TIME_M=1024
 export TORCHSIM_FORCE_TIME_K=1024
 export TORCHSIM_FORCE_TIME_N=1024
-python3 ../../tests/test_hetro.py --M 1024 --N 1024 --K 1024 --sparsity 0.9 --config stonne_big_c1_simple_noc.yml --mode 0 > hetero/big_sparse.log
-python3 ../../tests/test_hetro.py --M 1024 --N 1024 --K 1024 --sparsity 0.9 --config systolic_ws_128x128_c1_simple_noc_tpuv3_half.yml --mode 1 > hetero/big.log
-python3 ../../tests/test_hetro.py --M 1024 --N 1024 --K 1024 --sparsity 0.9 --config heterogeneous_c2_simple_noc.yml --mode 2 > hetero/hetero.log
+python3 ../../tests/system/test_hetro.py --M 1024 --N 1024 --K 1024 --sparsity 0.9 --config stonne_big_c1_simple_noc.yml --mode 0 > hetero/big_sparse.log
+python3 ../../tests/system/test_hetro.py --M 1024 --N 1024 --K 1024 --sparsity 0.9 --config systolic_ws_128x128_c1_simple_noc_tpuv3_half.yml --mode 1 > hetero/big.log
+python3 ../../tests/system/test_hetro.py --M 1024 --N 1024 --K 1024 --sparsity 0.9 --config heterogeneous_c2_simple_noc.yml --mode 2 > hetero/hetero.log
 
 echo "All processes completed!"
diff --git a/scripts/stonne_experiment/run_trace.sh b/scripts/stonne_experiment/run_trace.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 
-SCRIPT="/workspace/PyTorchSim/tests/test_stonne.py"
+SCRIPT="/workspace/PyTorchSim/tests/system/test_stonne.py"
 
 SIZES=(32 64 128)
 SPARSITIES=(0.0 0.2 0.4 0.6 0.8)

diff --git a/tests/DeepSeek/test_deepseek_v3_base.py → .../models/DeepSeek/test_deepseek_v3_base.py b/tests/DeepSeek/test_deepseek_v3_base.py → .../models/DeepSeek/test_deepseek_v3_base.py
diff --git a/tests/Diffusion/test_diffusion.py → tests/models/Diffusion/test_diffusion.py b/tests/Diffusion/test_diffusion.py → tests/models/Diffusion/test_diffusion.py
diff --git a/tests/Llama/test_llama.py → tests/models/Llama/test_llama.py b/tests/Llama/test_llama.py → tests/models/Llama/test_llama.py
diff --git a/tests/MLP/test_mlp.py → tests/models/MLP/test_mlp.py b/tests/MLP/test_mlp.py → tests/models/MLP/test_mlp.py
diff --git a/tests/MLP/test_mlp_cpu.py → tests/models/MLP/test_mlp_cpu.py b/tests/MLP/test_mlp_cpu.py → tests/models/MLP/test_mlp_cpu.py
diff --git a/tests/Mixtral_8x7B/model.py → tests/models/Mixtral8x7B/model.py b/tests/Mixtral_8x7B/model.py → tests/models/Mixtral8x7B/model.py
diff --git a/tests/Mixtral_8x7B/test_attention.py → tests/models/Mixtral8x7B/test_attention.py b/tests/Mixtral_8x7B/test_attention.py → tests/models/Mixtral8x7B/test_attention.py
diff --git a/tests/MoE/test_moe.py → tests/models/MoE/test_moe.py b/tests/MoE/test_moe.py → tests/models/MoE/test_moe.py
diff --git a/tests/MoE/test_moe_cpu.py → tests/models/MoE/test_moe_cpu.py b/tests/MoE/test_moe_cpu.py → tests/models/MoE/test_moe_cpu.py
diff --git a/tests/MobileNet/test_mobilenet.py → tests/models/MobileNet/test_mobilenet.py b/tests/MobileNet/test_mobilenet.py → tests/models/MobileNet/test_mobilenet.py
diff --git a/tests/Yolov5/test_yolov5.py → tests/models/Yolov5/test_yolov5.py b/tests/Yolov5/test_yolov5.py → tests/models/Yolov5/test_yolov5.py
diff --git a/tests/Fusion/__init__.py → tests/models/__init__.py b/tests/Fusion/__init__.py → tests/models/__init__.py
diff --git a/tests/test_mlp.py → tests/models/test_mlp.py b/tests/test_mlp.py → tests/models/test_mlp.py
diff --git a/tests/test_resnet.py → tests/models/test_resnet.py b/tests/test_resnet.py → tests/models/test_resnet.py
diff --git a/tests/test_single_perceptron.py → tests/models/test_single_perceptron.py b/tests/test_single_perceptron.py → tests/models/test_single_perceptron.py
diff --git a/tests/test_transformer.py → tests/models/test_transformer.py b/tests/test_transformer.py → tests/models/test_transformer.py
diff --git a/tests/test_vit.py → tests/models/test_vit.py b/tests/test_vit.py → tests/models/test_vit.py
diff --git a/tests/ops/__init__.py b/tests/ops/__init__.py
diff --git a/tests/ops/attention/__init__.py b/tests/ops/attention/__init__.py
diff --git a/tests/test_gqa.py → tests/ops/attention/test_gqa.py b/tests/test_gqa.py → tests/ops/attention/test_gqa.py
diff --git a/tests/test_gqa_decode.py → tests/ops/attention/test_gqa_decode.py b/tests/test_gqa_decode.py → tests/ops/attention/test_gqa_decode.py
diff --git a/tests/test_sdpa.py → tests/ops/attention/test_sdpa.py b/tests/test_sdpa.py → tests/ops/attention/test_sdpa.py
diff --git a/tests/ops/conv/__init__.py b/tests/ops/conv/__init__.py
diff --git a/tests/test_cnn.py → tests/ops/conv/test_cnn.py b/tests/test_cnn.py → tests/ops/conv/test_cnn.py
diff --git a/tests/test_conv2d.py → tests/ops/conv/test_conv2d.py b/tests/test_conv2d.py → tests/ops/conv/test_conv2d.py
diff --git a/tests/test_group_conv.py → tests/ops/conv/test_group_conv.py b/tests/test_group_conv.py → tests/ops/conv/test_group_conv.py
diff --git a/tests/test_pool.py → tests/ops/conv/test_pool.py b/tests/test_pool.py → tests/ops/conv/test_pool.py
diff --git a/tests/ops/elementwise/__init__.py b/tests/ops/elementwise/__init__.py
diff --git a/tests/test_activation.py → tests/ops/elementwise/test_activation.py b/tests/test_activation.py → tests/ops/elementwise/test_activation.py
diff --git a/tests/test_add.py → tests/ops/elementwise/test_add.py b/tests/test_add.py → tests/ops/elementwise/test_add.py
diff --git a/tests/test_exponent.py → tests/ops/elementwise/test_exponent.py b/tests/test_exponent.py → tests/ops/elementwise/test_exponent.py
diff --git a/tests/test_transcendental.py → tests/ops/elementwise/test_transcendental.py b/tests/test_transcendental.py → tests/ops/elementwise/test_transcendental.py
diff --git a/tests/ops/fusion/__init__.py b/tests/ops/fusion/__init__.py
diff --git a/tests/Fusion/test_addmm_residual.py → tests/ops/fusion/test_addmm_residual.py b/tests/Fusion/test_addmm_residual.py → tests/ops/fusion/test_addmm_residual.py
diff --git a/tests/Fusion/test_attention_fusion.py → tests/ops/fusion/test_attention_fusion.py b/tests/Fusion/test_attention_fusion.py → tests/ops/fusion/test_attention_fusion.py
diff --git a/tests/Fusion/test_bmm_reduction.py → tests/ops/fusion/test_bmm_reduction.py b/tests/Fusion/test_bmm_reduction.py → tests/ops/fusion/test_bmm_reduction.py
diff --git a/tests/Fusion/test_conv_fusion.py → tests/ops/fusion/test_conv_fusion.py b/tests/Fusion/test_conv_fusion.py → tests/ops/fusion/test_conv_fusion.py
diff --git a/tests/Fusion/test_matmul_activation.py → tests/ops/fusion/test_matmul_activation.py b/tests/Fusion/test_matmul_activation.py → tests/ops/fusion/test_matmul_activation.py
diff --git a/tests/Fusion/test_matmul_reduction.py → tests/ops/fusion/test_matmul_reduction.py b/tests/Fusion/test_matmul_reduction.py → tests/ops/fusion/test_matmul_reduction.py
diff --git a/tests/Fusion/test_matmul_scalar.py → tests/ops/fusion/test_matmul_scalar.py b/tests/Fusion/test_matmul_scalar.py → tests/ops/fusion/test_matmul_scalar.py
diff --git a/tests/Fusion/test_matmul_vector.py → tests/ops/fusion/test_matmul_vector.py b/tests/Fusion/test_matmul_vector.py → tests/ops/fusion/test_matmul_vector.py
diff --git a/tests/Fusion/test_prologue_fusion.py → tests/ops/fusion/test_prologue_fusion.py b/tests/Fusion/test_prologue_fusion.py → tests/ops/fusion/test_prologue_fusion.py
diff --git a/tests/Fusion/test_transformer_fusion.py → tests/ops/fusion/test_transformer_fusion.py b/tests/Fusion/test_transformer_fusion.py → tests/ops/fusion/test_transformer_fusion.py
diff --git a/tests/ops/gemm/__init__.py b/tests/ops/gemm/__init__.py
diff --git a/tests/test_bmm.py → tests/ops/gemm/test_bmm.py b/tests/test_bmm.py → tests/ops/gemm/test_bmm.py
diff --git a/tests/test_matmul.py → tests/ops/gemm/test_matmul.py b/tests/test_matmul.py → tests/ops/gemm/test_matmul.py
diff --git a/tests/ops/misc/__init__.py b/tests/ops/misc/__init__.py
diff --git a/tests/test_expert_mask.py → tests/ops/misc/test_expert_mask.py b/tests/test_expert_mask.py → tests/ops/misc/test_expert_mask.py
diff --git a/tests/test_indirect_access.py → tests/ops/misc/test_indirect_access.py b/tests/test_indirect_access.py → tests/ops/misc/test_indirect_access.py
diff --git a/tests/ops/reduce/__init__.py b/tests/ops/reduce/__init__.py
diff --git a/tests/test_batchnorm.py → tests/ops/reduce/test_batchnorm.py b/tests/test_batchnorm.py → tests/ops/reduce/test_batchnorm.py
diff --git a/tests/test_layernorm.py → tests/ops/reduce/test_layernorm.py b/tests/test_layernorm.py → tests/ops/reduce/test_layernorm.py
diff --git a/tests/test_reduce.py → tests/ops/reduce/test_reduce.py b/tests/test_reduce.py → tests/ops/reduce/test_reduce.py
diff --git a/tests/test_softmax.py → tests/ops/reduce/test_softmax.py b/tests/test_softmax.py → tests/ops/reduce/test_softmax.py
diff --git a/tests/ops/sort/__init__.py b/tests/ops/sort/__init__.py
diff --git a/tests/test_sort.py → tests/ops/sort/test_sort.py b/tests/test_sort.py → tests/ops/sort/test_sort.py
diff --git a/tests/test_topk.py → tests/ops/sort/test_topk.py b/tests/test_topk.py → tests/ops/sort/test_topk.py
diff --git a/tests/ops/sparsity/__init__.py b/tests/ops/sparsity/__init__.py
diff --git a/tests/test_sparse_core.py → tests/ops/sparsity/test_sparse_core.py b/tests/test_sparse_core.py → tests/ops/sparsity/test_sparse_core.py
diff --git a/tests/test_sparsity.py → tests/ops/sparsity/test_sparsity.py b/tests/test_sparsity.py → tests/ops/sparsity/test_sparsity.py
@@ -7,9 +7,10 @@
 import torch
 import torch._dynamo
 import torch.utils.cpp_extension
-sys.path.append(os.environ.get('TORCHSIM_DIR', default='/workspace/PyTorchSim'))
-from test_transformer import EncoderBlock, test_result
-from test_mlp import MLP
+sys.path.insert(0, os.path.join(os.environ.get('TORCHSIM_DIR', default='/workspace/PyTorchSim'), 'tests'))
+from _pytorchsim_utils import test_result
+from models.test_transformer import EncoderBlock
+from models.test_mlp import MLP
 
 def apply_random_zero(tensor, zero_prob, block_size=8):
     if not 0 <= zero_prob <= 1:

diff --git a/tests/ops/view/__init__.py b/tests/ops/view/__init__.py
diff --git a/tests/test_cat.py → tests/ops/view/test_cat.py b/tests/test_cat.py → tests/ops/view/test_cat.py
diff --git a/tests/test_transpose2D.py → tests/ops/view/test_transpose2D.py b/tests/test_transpose2D.py → tests/ops/view/test_transpose2D.py
diff --git a/tests/test_transpose3D.py → tests/ops/view/test_transpose3D.py b/tests/test_transpose3D.py → tests/ops/view/test_transpose3D.py
diff --git a/tests/test_view3D_2D.py → tests/ops/view/test_view3D_2D.py b/tests/test_view3D_2D.py → tests/ops/view/test_view3D_2D.py
diff --git a/tests/system/__init__.py b/tests/system/__init__.py
diff --git a/tests/test_eager.py → tests/system/test_eager.py b/tests/test_eager.py → tests/system/test_eager.py
diff --git a/tests/test_hetro.py → tests/system/test_hetro.py b/tests/test_hetro.py → tests/system/test_hetro.py
@@ -3,10 +3,10 @@
 import torch
 import argparse
 
-sys.path.append(os.environ.get("TORCHSIM_DIR", default="/workspace/PyTorchSim"))
+sys.path.insert(0, os.path.join(os.environ.get("TORCHSIM_DIR", default="/workspace/PyTorchSim"), "tests"))
 
 from Simulator.simulator import TOGSimulator
-from test_stonne import sparse_matmul
+from system.test_stonne import sparse_matmul
 
 
 def custom_matmul(a, b):

diff --git a/tests/test_scheduler.py → tests/system/test_scheduler.py b/tests/test_scheduler.py → tests/system/test_scheduler.py
@@ -1,10 +1,13 @@
 import os
+import sys
 import torch
 from torchvision.models import resnet18 as model1
-from test_transformer import EncoderBlock as model2
-from Simulator.simulator import TOGSimulator
 
 base_path = os.environ.get('TORCHSIM_DIR', default='/workspace/PyTorchSim')
+sys.path.append(base_path)
+from models.test_transformer import EncoderBlock as model2
+from Simulator.simulator import TOGSimulator
+
 config = f'{base_path}/configs/systolic_ws_128x128_c2_simple_noc_tpuv3_partition.yml'
 
 target_model1 = model1().eval()

diff --git a/tests/test_stonne.py → tests/system/test_stonne.py b/tests/test_stonne.py → tests/system/test_stonne.py
diff --git a/tests/test_vectorops.py → tests/system/test_vectorops.py b/tests/test_vectorops.py → tests/system/test_vectorops.py
@@ -1,16 +1,21 @@
+import os
+import sys
+
 import torch
 
+sys.path.insert(0, os.path.join(os.environ.get("TORCHSIM_DIR", default="/workspace/PyTorchSim"), "tests"))
+
 if __name__ == "__main__":
     device = torch.device("npu:0")
-    
+
     # Target shape
     seq_list = [1,128,512,2048,8192]
     d_model = 768
-    from tests.test_add import test_vectoradd
-    from tests.test_activation import test_GeLU
-    from tests.test_reduce import test_reduce_sum2
-    from tests.test_layernorm import test_LayerNorm
-    from tests.test_softmax import test_softmax
+    from ops.elementwise.test_add import test_vectoradd
+    from ops.elementwise.test_activation import test_GeLU
+    from ops.reduce.test_reduce import test_reduce_sum2
+    from ops.reduce.test_layernorm import test_LayerNorm
+    from ops.reduce.test_softmax import test_softmax
     func_list = [test_vectoradd, test_GeLU, test_reduce_sum2, test_LayerNorm, test_softmax]
     for test_func in func_list:
         for seq in seq_list: