feat: add xlogy_outscalar_other base#588
feat: add xlogy_outscalar_other base#588voltjia wants to merge 15 commits intofeat/torch-codegenfrom
xlogy_outscalar_other base#588Conversation
Frees the `infini::ops::Sigmoid` name for the auto-generated PyTorch operator class emitted by the upcoming `generate_torch_ops.py`.
For each entry in `scripts/torch_ops.yaml`, the script finds the matching `.out` variant in PyTorch's `native_functions.yaml` (fetched from GitHub on first invocation, cached under `generated/.cache/`), parses its schema, and emits an InfiniOps base class plus a PyTorch backend specialization at slot 8 that wraps `at::<op>_out`. Key strategies: - Overload-aware lookup: prefers `<name>.out` then any `<name>.<overload>_out`, picking the variant with the most tensor inputs (so `pow.Tensor_Tensor_out` wins over `pow.Tensor_Scalar_out`). - Hidden-parameter pattern: optional types (`Scalar?`, `int[]?`, `ScalarType?`, `Generator?`, …), `bool` defaults, numeric `int`/`float` defaults, `int[N]=[]` defaults, and ATen enum symbols (`Mean`, `Sum`) are filtered from the user-facing API and substituted at the ATen call site. Unlocks reductions, scans, comparisons, losses, and multi-scalar activations from a single mechanism. - Slot 8: reserved for PyTorch backends; native/vendor implementations use 0–7. Also avoids a partial-specialization-after-instantiation conflict with `Operator<Op>` at index 0. - Per-op metadata (`generated/torch_ops_metadata.json`): records the full parameter list per op for the test harness, so adding a new op to the allowlist requires no code changes.
…ources When `WITH_TORCH=ON`, `src/CMakeLists.txt` runs the generator at configure time and globs `generated/torch/**/*.cc` into the `infiniops` target. `generated/` is added to the public include path so the emitted wrappers can include `"base/<op>.h"` and `"torch/<op>/<op>.h"`. `scripts/generate_wrappers.py` (the existing pybind binding generator) is taught to scan both `src/base/` and `generated/base/` so the auto-generated InfiniOps classes get Python bindings. The `__call__` lambda's `Self&` parameter is renamed to `op` to avoid colliding with ATen's typical `self` argument name.
A single parametrized test reads `generated/torch_ops_metadata.json`, builds inputs from the per-parameter info (tensor → `randn_strided` with op-specific shape if listed in `_TENSOR_SHAPES`, scalar → per-op or type default), runs the torch reference to discover output shape / dtype / arity, calls the InfiniOps wrapper, and compares each output tensor. No signature-kind classification — multi-output, ternary, multi-scalar, matrix, and everything in between fall out of the same code path. Per-op overrides live in flat dicts (`_TENSOR_SHAPES`, `_SCALAR_VALUES`). Vendor-specific runtime errors and bool outputs (InfiniOps `DataType` has no `kBool`) skip cleanly. `conftest.py` switches to `torch.allclose(..., equal_nan=True)` for floating outputs and `torch.equal` for bool/int outputs so domain violations producing matched NaNs and integer-output ops both work.
- Generator: emit one wrapper per ATen overload (e.g. `pow.Tensor_Tensor_out`
→ `PowTensorTensor`, `pow.Tensor_Scalar_out` → `PowTensorScalar`,
`pow.Scalar_out` → `PowScalar`). Class name = base + overload.
- Add type support: `SymInt`, `SymInt[]`, `Tensor?`, `Tensor?[]`,
`str?`/`str` (when defaulted), `int[N]` / `SymInt[N]` with non-empty
defaults (replicated `{0,0,...}` for `int[N]=0`). Optional Tensor and
Tensor list optionals hardcode to `at::nullopt`.
- `is_testable` relaxed to "has at least one out tensor" — generators
like `arange.out` / `linspace.out` (no tensor input) are now in scope.
- Allowlist auto-discovered from the YAML: every base op name with at
least one parsable `.out` overload (390 names → 486 wrappers).
- Test: handle `int[N]` / `SymInt[N]` defaults via `_LIST_SIZE_RE`-driven
`_list_default`; pass `[0, 0, …]` of the right length. Per-op
`_TENSOR_SHAPES` and `_SCALAR_VALUES` overrides keyed by `aten_name`
(so all overloads of an op share the same overrides).
- Generators now wipe their output dirs (`generated/{base,torch,bindings,
src,include}/`) before regenerating, so files for ops we no longer emit
do not linger and break the next build.
- Filter `Tensor[]` outputs (`split_copy`, `unbind_copy`,
`split_with_sizes_copy`): would have emitted `at::<op>_out(at::Tensor,
...)` against the actual `at::TensorList` signature.
- Filter ops whose first non-out argument is not a Tensor
(`pow.Scalar_out`, generators like `arange`/`empty`): `Operator::Make`
dispatches on the first tensor's device, so these need a separate path.
- Spell out typed empty optionals (`c10::optional<at::Tensor>{}`,
`c10::optional<at::Scalar>{}`, …) instead of bare `at::nullopt`: the
latter is ambiguous on ops where overloads exist for both `optional<
Scalar>` and `optional<Tensor>` (e.g. `clamp_out`).
- Convert YAML single-quoted string defaults (`'none'`) to C++
double-quoted literals (`"none"`); the former parses as a char literal.
- `generate_wrappers.py::_find_vector_tensor_params` now uses the
shared `_find_base_header` helper, which checks `generated/base/`
alongside `src/base/` (was hard-coded to `src/base/`).
Test improvements:
- Skip ops whose tensors use a dtype InfiniOps does not enumerate
(`bool`, `complex64`, `complex128`, …); `DataTypeFromString` aborts
the process on these.
- Catch a wider exception set (`ValueError`, `IndexError`,
`NotImplementedError`) when the torch reference rejects our generic
random inputs (`adaptive_avg_pool2d` needs at least 3 dims, etc.).
- Skip non-deterministic ops (`bernoulli`, `normal`, `multinomial`,
`rand*`, `randperm`, `rrelu_with_noise`): independent draws diverge.
- Skip when the Python-facing function returns fewer outputs than the
ATen `_out` schema declares (`adaptive_max_pool2d` hides `indices`
behind `return_indices=True`).
- Add "Trying to resize storage that is not resizable" to the runtime
skip patterns: ATen kernels for some loss ops use `out` as
intermediate scratch and resize it before the final reduction; our
`from_blob` outputs are non-resizable.
Final state: 433 generated + 4 hand-written torch ops, full build
succeeds, `pytest tests/test_torch_ops.py --devices cpu` reports
1663 passed, 2234 skipped, 0 failed.
- Skip ops whose torch reference triggers a CUDA device-side assert on random fp32 inputs (`binary_cross_entropy` requires inputs in [0, 1]; pooling/conv ops divide by `[0, 0]` placeholder kernel sizes our harness substitutes). The Python-side `RuntimeError` is catchable, but the CUDA context is left poisoned and every subsequent test errors at setup, which masks the rest of the suite. - Skip ops whose reference produces a 0-element output: on cuda, `torch.empty_like(zero_numel)` returns a tensor whose `data_ptr()` is unregistered with the device, so the wrapper trips on "pointer resides on host memory". Final state: `pytest tests/test_torch_ops.py` (cpu + cuda) reports 3263 passed, 4531 skipped, 0 failed.
- Support `str` C++ type (`std::string`) for required string params, unlocking `index_reduce`, `scatter_reduce`, `scatter_reduce_two`. - Relax `_find_out_entries` so it also matches multi-output schemas whose overload name reflects an output tensor instead of `_out` (`kthvalue.values`, `mode.values`). Detection is now: name is `<op>.out`, ends in `_out`, or carries a `Tensor(<letter>!)` mutability annotation. - Strip both `_out` suffix and `out_` prefix from the InfiniOps name derived from an overload (`div.out_mode` → `div_mode`, instead of `div_out_mode`). - Add per-op test values for the new ops (`reduce` modes, `k`/`dim` for `kthvalue`/`mode`). - `scripts/torch_ops.yaml`: list `kthvalue`, `mode`, `index_reduce`, `scatter_reduce`. Final state: 447 generated ops (up from 433). `pytest tests/test_torch_ops.py` (cpu + cuda) reports 3353 passed, 4693 skipped, 0 failed.
3f39716 to
cd1427c
Compare
xlogy_outscalar_other base
|
|
||
| namespace infini::ops { | ||
|
|
||
| class XlogyOutscalarOther : public Operator<XlogyOutscalarOther> { |
dc3b3b0 to
156e83f
Compare
|
Closing as superseded by the latest The torch codegen has been updated to drop ATen overload-name suffixes ( This PR added The original review feedback on this PR (naming, parameter exposure, scalar storage) has been incorporated directly into the codegen — the generated class uses the canonical name, stores visible scalars as members, and exposes default-valued bool / int / float parameters that were previously hidden. No action required. |
Summary
xlogy_outscalar_otherinsrc/base/xlogy_outscalar_other.h.src/base/xlogy_outscalar_other.hinstead of emittinggenerated/base/xlogy_outscalar_other.h.scripts/check_conventions.py.Motivation
This PR is part of the
feat/torch-codegenbase-header migration. The generatedXlogyOutscalarOtherbase declaration is moved intosrc/baseso code generation can reuse a reviewed hand-written header.N/A: no linked issue.
Type of Change
feat- new feature / new operator / new platformfix- bug fixperf- performance improvement (no behavioral change)refactor- code restructuring without behavior changetest- adding or fixing tests onlydocs- documentation onlybuild/ci- build system or CI configurationchore- tooling, formatting, or other non-code changes!in the Conventional Commits prefix or aBREAKING CHANGE:footer)Platforms Affected
WITH_CPU)WITH_NVIDIA)WITH_ILUVATAR)WITH_METAX)WITH_CAMBRICON)WITH_MOORE)WITH_ASCEND)WITH_TORCH)Test Results on Supported Platforms
pytestResultmasterfeat/torch-codegenbase-header PR; no runtime implementation is added.masterfeat/torch-codegenbase-header PR; no runtime implementation is added.masterfeat/torch-codegenbase-header PR; no runtime implementation is added.masterfeat/torch-codegenbase-header PR; no runtime implementation is added.masterfeat/torch-codegenbase-header PR; no runtime implementation is added.masterfeat/torch-codegenbase-header PR; no runtime implementation is added.Full `pytest` output (optional)
Benchmark / Performance Impact
N/A. This PR only adds a base operator declaration for torch codegen reuse and does not add a runtime implementation.
Notes for Reviewers
feat/torch-codegen, notmaster.feat/torch-codegencontains onlysrc/base/xlogy_outscalar_other.h.clang-format21 passing onsrc/base/xlogy_outscalar_other.h; the follow-up formatting commit applies the class member spacing required byscripts/check_conventions.py.Checklist
Title, Branch, and Commits
feat(nvidia): …,fix(cuda/gemm): …).codex/add-xlogy_outscalar_other-basePR branches targetingfeat/torch-codegen; branch renaming is intentionally out of scope.feat/torch-codegen, notmaster; nomasterrebase is required for this integration target.fixup!/squash!/wipcommits remain.Scope and Design
CONTRIBUTING.md§Code/General).printf/std::cout/print(...)left behind, orTODOwithout an owner and issue link.XlogyOutscalarOtherbase operator declaration used by torch codegen.General Code Hygiene (applies to all languages)
CONTRIBUTING.md§Code/General).CONTRIBUTING.md§Code/General).the `seqlens_k` tensor) (CONTRIBUTING.md§Code/General).CONTRIBUTING.md§Code/General).CONTRIBUTING.md§Code/General; §Python).C++ Specific (if C++ files changed)
clang-format(version 21, per.github/workflows/clang-format.yml) has been run against all modified.h,.cc,.cuh, and.mlufiles; the diff is clean.clang-tidywas not run because this PR only adds a base declaration header forfeat/torch-codegen; no runtime implementation is added.CONTRIBUTING.md§C++).CONTRIBUTING.md§C++).CONTRIBUTING.md§C++).CONTRIBUTING.md§C++).CONTRIBUTING.md§C++).src/base/xlogy_outscalar_other.hfor torch codegen reuse; platform implementations are out of scope.new/delete; RAII / smart pointers / existing allocators are used.Python Specific (if Python files changed)
N/A: no Python files changed.
Testing
pytestwas intentionally not run because this PR targetsfeat/torch-codegen, notmaster, and only adds a reusable base header declaration.tests/coverage is required.Payload-returning test was added.dtype/deviceparameterization was added.Build, CI, and Tooling
feat/torch-codegen, notmaster, and only adds a reusable base header declaration.compile_commands.jsonbehavior was not changed.clang-format21 passing onsrc/base/xlogy_outscalar_other.h.Documentation
README.md,CONTRIBUTING.md, and developer workflow are unchanged.XlogyOutscalarOtheris an internal base declaration for torch codegen reuse; no user-facing documentation is required.Security and Safety