opentensor · ppolewicz · Mar 8, 2026 · Feb 12, 2026 · Mar 25, 2026 · Mar 25, 2026
diff --git a/.claude/skills/fix/SKILL.md b/.claude/skills/fix/SKILL.md
@@ -0,0 +1,33 @@
+---
+name: fix
+description: Commit current changes, run Rust autofix/lint/format, run pallet-subtensor tests, amend with any fixes.
+---
+
+# Fix Skill
+
+Create or reuse one commit, run the Rust fix pipeline in order, run unit tests, and fold all resulting changes into that same commit.
+
+## Steps
+
+1. Run /format
+2. In a subagent (subagent_type: `general-purpose`, model: `sonnet`) run:
+   - `cargo test -p pallet-subtensor --lib` and capture full output
+   - If any tests fail, analyze the failures
+     - Read the failing test code AND the source code it tests
+     - Determine the root cause
+     - Apply fixes using Edit tools
+     - Re-run the tests to confirm the fix works
+     - After fixing, if there are further failures, repeat (up to 3 fix-and-retest cycles)
+   - Summarize:
+     - Which tests failed, if any
+     - What was fixed and how
+     - Whether all tests pass now
+3. Amend commit with test fixes, if any, then /format
+4. Run `git show -s` for user to review
+
+## Important
+
+- Do NOT run `scripts/fix_rust.sh` — let /format take care of it
+- Do NOT skip any steps
+- The test subagent must fix source code to make tests pass, NOT modify tests to make them pass (unless the test itself is clearly wrong)
+- If the test subagent cannot fix all failures after 3 cycles, it must return the remaining failures so the main agent can report them to the user
diff --git a/.claude/skills/format/SKILL.md b/.claude/skills/format/SKILL.md
@@ -0,0 +1,25 @@
+---
+name: format
+description: Commit current changes, run Rust autofix/lint/format, amend with any fixes.
+---
+
+# Format Skill
+
+Create or reuse one commit, run the Rust fix pipeline in order and fold all resulting changes into that same commit.
+
+## Steps
+
+1. Stage all changes and create a commit with a descriptive message summarizing the changes (unless there are none)
+2. Do this:
+   a. Run `cargo check --workspace`
+   b. Run `cargo clippy --fix --workspace --all-features --all-targets --allow-dirty`
+   c. Run `cargo fix --workspace --all-features --all-targets --allow-dirty`
+   d. Run `cargo fmt --all`
+   e. Amend the commit with any changes
+3. Run `git show -s` for user to review
+
+## Important
+
+- If a fix tool fails in step 2, stop and report the error to the user rather than continuing
+- Do NOT run `scripts/fix_rust.sh` itself — run the individual commands listed above instead
+- Do NOT skip any steps
diff --git a/.claude/skills/ship/SKILL.md b/.claude/skills/ship/SKILL.md
@@ -0,0 +1,77 @@
+---
+name: ship
+description: Ship current branch end-to-end: run /fix, push, open/update PR, triage CI failures, then deliver review findings for approval.
+---
+
+# Ship Skill
+
+Ship the branch through CI and review without force-pushes, and never apply review fixes without explicit user approval.
+
+Run the following skill in a subagent to prevent context pollution. Make the subagent return a short summary to the main agent.
+
+1. Run `/fix`
+2. Push the branch to origin
+3. Create a PR with a comprehensive description if none exists yet
+   - Update the description if PR exists already
+   - Add label `skip-cargo-audit` to the PR
+4. Poll CI status in a loop:
+   - Run: `gh pr checks --json name,state,conclusion,link --watch --fail-fast 2>/dev/null || gh pr checks`
+   - If `--watch` is not available, poll manually every 90 seconds using `gh pr checks --json name,state,conclusion,link` until all checks have completed (no checks with state "pending" or conclusion "").
+   - **Ignore these known-flaky/irrelevant checks** — treat them as passing even if they fail:
+     - `validate-benchmarks` (benchmark CI — not relevant)
+     - Any `Contract E2E Tests` check that failed only due to a timeout (look for timeout in the failure link/logs)
+     - `cargo-audit`
+5. **If there are real CI failures** (failures NOT in the ignore list above):
+   - For EACH distinct failing check, launch a **separate Task subagent** (subagent_type: `general-purpose`, model: `sonnet`) in parallel. Each subagent must:
+     - Fetch the failed check's logs: use `gh run view <run-id> --log-failed` or the check link to get failure details.
+     - Investigate the root cause by reading relevant source files.
+     - Return a **fix plan**: a description of what needs to change and in which files, with specific code snippets showing the fix.
+   - **Wait for all subagents** to return their fix plans.
+6. **Aggregate and apply fixes**:
+   - Review all returned fix plans for conflicts or overlaps.
+   - Apply the fixes using Edit/Write tools.
+   - Invoke the /fix skill
+   - `git push`
+7. **Re-check CI**: Go back to step 4 and poll again. Repeat the fix cycle up to **3 times**. If CI still fails after 3 rounds, report the remaining failures to the user and stop.
+8. **Once CI is green** (or only ignored checks are failing), perform a thorough code review.
+   - **Launch a single Opus subagent** (subagent_type: `general-purpose`, model: `opus`) for the review:
+     - It must get the full PR diff: `git diff main...HEAD`.
+     - It must read every changed file in full.
+     - It must produce a numbered list of **issues** found, where each issue has:
+       - A unique sequential ID (e.g., `R-1`, `R-2`, ...).
+       - **Severity**: critical / major / minor / nit.
+       - **File and line(s)** affected.
+       - **Description** of the problem.
+     - The review must check for: correctness, safety (no panics, no unchecked arithmetic, no indexing), edge cases, naming, documentation gaps, test coverage, and adherence to Substrate/Rust best practices.
+     - Return the full list of issues.
+9. **For each issue**, run fix designer then fix reviewer in sequence; run all issues concurrently with each other:
+    - **Fix designer** (subagent_type: `general-purpose`, model: `sonnet`): Given the issue description and relevant code context, design a concrete proposed fix with exact code changes (old code -> new code). Return the fix as a structured plan.
+    - **Fix reviewer** (subagent_type: `general-purpose`, model: `opus`): Given the issue description, the relevant code context, and the proposed fix (once the fix designer returns — so the reviewer runs AFTER the designer, but reviewers for different issues run in parallel with each other). The reviewer must check:
+      - Does the fix actually solve the issue?
+      - Does it introduce new problems?
+      - Is it the simplest correct fix?
+      - Return: approved / rejected with reasoning.
+
+    Implementation note: For each issue, first launch the fix designer. Once the fix designer for that issue returns, launch the fix reviewer for that issue. But all issues should be processed in parallel — i.e., launch all fix designers at once, then as each designer returns, launch its corresponding reviewer. You may batch reviewers if designers finish close together.
+
+10. **Report to user**: Present a formatted summary:
+    ```
+    ## Code Review Results
+
+    ### R-1: <title> [severity]
+    **File**: path/to/file.rs:42
+    **Issue**: <description>
+    **Proposed fix**: <summary of fix>
+    **Review**: Approved / Rejected — <reasoning>
+
+    ### R-2: ...
+    ```
+    Ask the user which fixes to apply (all approved ones, specific ones by ID, or none).
+
+## Important Rules
+
+- Never force-push. Always use regular `git push`.
+- All CI polling must have a maximum total wall-clock timeout of 45 minutes. If CI hasn't finished by then, report current status and stop waiting.
+- When fetching CI logs, use a subagent to isolate the relevant part. If `gh run view` output is very long, focus on the failed step output only.
+- Do NOT apply code review fixes automatically — always present them for user approval first.
+- Use HEREDOC syntax for PR body and commit messages to preserve formatting.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,4 @@
+- never use slice indexing like `arr[n..]` or `arr[i]`; use `.get(n..)`, `.get(i)` etc. instead to avoid panics (clippy::indexing_slicing)
+- never use `*`, `+`, `-`, `/` for arithmetic; use `.saturating_mul()`, `.saturating_add()`, `.saturating_sub()`, `.saturating_div()` or checked variants instead (clippy::arithmetic_side_effects)
+- if you are creating a PR to `subtensor` add a `skip-cargo-audit` label
+- no `Co-Authored-By` in commits or attribution in PRs
diff --git a/pallets/subtensor/src/extensions/subtensor.rs b/pallets/subtensor/src/extensions/subtensor.rs
@@ -243,7 +243,8 @@ where
                         *protocol,
                         *placeholder1,
                         *placeholder2,
-                    ),
+                    )
+                    .map(|_| ()),
                     0u64,
                 )
                 .map(|validity| (validity, (), origin.clone()))

diff --git a/pallets/subtensor/src/macros/hooks.rs b/pallets/subtensor/src/macros/hooks.rs
@@ -170,7 +170,10 @@ mod hooks {
                 // Migrate fix bad hk swap
                 .saturating_add(migrations::migrate_fix_bad_hk_swap::migrate_fix_bad_hk_swap::<T>())
                 // Fix RootClaimed overclaim caused by single-subnet hotkey swap bug
-                .saturating_add(migrations::migrate_fix_root_claimed_overclaim::migrate_fix_root_claimed_overclaim::<T>());
+                .saturating_add(migrations::migrate_fix_root_claimed_overclaim::migrate_fix_root_claimed_overclaim::<T>())
+                // Remove orphaned axon/prometheus/certificate entries (follow-up to v1,
+                // accumulated while serve_axon checked registration on any network)
+                .saturating_add(migrations::migrate_remove_orphan_axon_prom_cert_v2::migrate_remove_orphan_axon_prom_cert_v2::<T>());
             weight
         }
 

diff --git a/pallets/subtensor/src/migrations/migrate_remove_orphan_axon_prom_cert_v2.rs b/pallets/subtensor/src/migrations/migrate_remove_orphan_axon_prom_cert_v2.rs
@@ -0,0 +1,93 @@
+use super::*;
+use crate::HasMigrationRun;
+use frame_support::{traits::Get, weights::Weight};
+use scale_info::prelude::string::String;
+use sp_std::collections::btree_set::BTreeSet;
+
+/// Remove Axon, Prometheus, and NeuronCertificate entries for hotkeys that are not
+/// registered on the respective subnet.
+///
+/// This is a follow-up to `migrate_remove_neuron_axon_cert_prom`.  The bug in
+/// `serve_axon` / `serve_prometheus` (checking registration on *any* network instead
+/// of the target netuid) allowed new orphaned entries to accumulate after that first
+/// migration ran.  This migration clears those entries.
+pub fn migrate_remove_orphan_axon_prom_cert_v2<T: Config>() -> Weight {
+    let migration_name = b"migrate_remove_orphan_axon_prom_cert_v2".to_vec();
+    let mut weight: Weight = T::DbWeight::get().reads(1);
+
+    // Skip if already executed.
+    if HasMigrationRun::<T>::get(&migration_name) {
+        log::info!(
+            target: "runtime",
+            "Migration '{}' already run - skipping.",
+            String::from_utf8_lossy(&migration_name)
+        );
+        return weight;
+    }
+    log::info!(
+        target: "runtime",
+        "Running migration '{}'",
+        String::from_utf8_lossy(&migration_name)
+    );
+
+    for network in NetworksAdded::<T>::iter_keys() {
+        weight.saturating_accrue(T::DbWeight::get().reads(1));
+
+        let hotkeys = BTreeSet::from_iter(Uids::<T>::iter_key_prefix(network));
+        weight.saturating_accrue(T::DbWeight::get().reads(hotkeys.len() as u64));
+
+        // Axons
+        let axons = Axons::<T>::iter_key_prefix(network).collect::<Vec<_>>();
+        weight.saturating_accrue(T::DbWeight::get().reads(axons.len() as u64));
+        let mut cleaned_axons: u32 = 0;
+        for axon_hotkey in axons {
+            if !hotkeys.contains(&axon_hotkey) {
+                Axons::<T>::remove(network, &axon_hotkey);
+                cleaned_axons = cleaned_axons.saturating_add(1);
+            }
+        }
+        weight.saturating_accrue(T::DbWeight::get().writes(cleaned_axons as u64));
+
+        // Prometheus
+        let prometheus = Prometheus::<T>::iter_key_prefix(network).collect::<Vec<_>>();
+        weight.saturating_accrue(T::DbWeight::get().reads(prometheus.len() as u64));
+        let mut cleaned_prometheus: u32 = 0;
+        for prometheus_hotkey in prometheus {
+            if !hotkeys.contains(&prometheus_hotkey) {
+                Prometheus::<T>::remove(network, &prometheus_hotkey);
+                cleaned_prometheus = cleaned_prometheus.saturating_add(1);
+            }
+        }
+        weight.saturating_accrue(T::DbWeight::get().writes(cleaned_prometheus as u64));
+
+        // NeuronCertificates
+        let certificates = NeuronCertificates::<T>::iter_key_prefix(network).collect::<Vec<_>>();
+        weight.saturating_accrue(T::DbWeight::get().reads(certificates.len() as u64));
+        let mut cleaned_certificates: u32 = 0;
+        for certificate_hotkey in certificates {
+            if !hotkeys.contains(&certificate_hotkey) {
+                NeuronCertificates::<T>::remove(network, &certificate_hotkey);
+                cleaned_certificates = cleaned_certificates.saturating_add(1);
+            }
+        }
+        weight.saturating_accrue(T::DbWeight::get().writes(cleaned_certificates as u64));
+
+        if cleaned_axons > 0 || cleaned_prometheus > 0 || cleaned_certificates > 0 {
+            log::info!(
+                target: "runtime",
+                "Cleaned {cleaned_axons} axons, {cleaned_prometheus} prometheus, \
+                 {cleaned_certificates} neuron certificates for network {network}"
+            );
+        }
+    }
+
+    HasMigrationRun::<T>::insert(&migration_name, true);
+    weight = weight.saturating_add(T::DbWeight::get().writes(1));
+
+    log::info!(
+        "Migration '{}' completed successfully.",
+        String::from_utf8_lossy(&migration_name)
+    );
+
+    weight
+}
diff --git a/pallets/subtensor/src/migrations/mod.rs b/pallets/subtensor/src/migrations/mod.rs
@@ -38,6 +38,7 @@ pub mod migrate_rate_limiting_last_blocks;
 pub mod migrate_remove_commitments_rate_limit;
 pub mod migrate_remove_network_modality;
 pub mod migrate_remove_old_identity_maps;
+pub mod migrate_remove_orphan_axon_prom_cert_v2;
 pub mod migrate_remove_stake_map;
 pub mod migrate_remove_tao_dividends;
 pub mod migrate_remove_total_hotkey_coldkey_stakes_this_interval;

diff --git a/pallets/subtensor/src/subnets/serving.rs b/pallets/subtensor/src/subnets/serving.rs
@@ -70,8 +70,8 @@ impl<T: Config> Pallet<T> {
         // We check the callers (hotkey) signature.
         let hotkey_id = ensure_signed(origin)?;
 
-        // Validate user input
-        Self::validate_serve_axon(
+        // Validate user input and build the axon struct.
+        let mut prev_axon = Self::validate_serve_axon(
             &hotkey_id,
             netuid,
             version,
@@ -90,24 +90,8 @@ impl<T: Config> Pallet<T> {
             NeuronCertificates::<T>::insert(netuid, hotkey_id.clone(), certificate)
         }
 
-        // We insert the axon meta.
-        let mut prev_axon = Self::get_axon_info(netuid, &hotkey_id);
+        // Record the block at insert time.
         prev_axon.block = Self::get_current_block_as_u64();
-        prev_axon.version = version;
-        prev_axon.ip = ip;
-        prev_axon.port = port;
-        prev_axon.ip_type = ip_type;
-        prev_axon.protocol = protocol;
-        prev_axon.placeholder1 = placeholder1;
-        prev_axon.placeholder2 = placeholder2;
-
-        // Validate axon data with delegate func
-        let axon_validated = Self::validate_axon_data(&prev_axon);
-        ensure!(
-            axon_validated.is_ok(),
-            axon_validated.err().unwrap_or(Error::<T>::InvalidPort)
-        );
-
         Axons::<T>::insert(netuid, hotkey_id.clone(), prev_axon);
 
         // We deposit axon served event.
@@ -144,11 +128,8 @@ impl<T: Config> Pallet<T> {
     ///     - On successfully serving the axon info.
     ///
     /// # Raises:
-    /// * 'MechanismDoesNotExist':
-    ///     - Attempting to set weights on a non-existent network.
-    ///
-    /// * 'NotRegistered':
-    ///     - Attempting to set weights from a non registered account.
+    /// * 'HotKeyNotRegisteredInNetwork':
+    ///     - Attempting to serve prometheus from a hotkey not registered on the target network.
     ///
     /// * 'InvalidIpType':
     ///     - The ip type is not 4 or 6.
@@ -170,19 +151,19 @@ impl<T: Config> Pallet<T> {
         // We check the callers (hotkey) signature.
         let hotkey_id = ensure_signed(origin)?;
 
+        // Ensure the hotkey is registered on this specific network.
+        ensure!(
+            Self::is_hotkey_registered_on_network(netuid, &hotkey_id),
+            Error::<T>::HotKeyNotRegisteredInNetwork
+        );
+
         // Check the ip signature validity.
         ensure!(Self::is_valid_ip_type(ip_type), Error::<T>::InvalidIpType);
         ensure!(
             Self::is_valid_ip_address(ip_type, ip, false),
             Error::<T>::InvalidIpAddress
         );
 
-        // Ensure the hotkey is registered somewhere.
-        ensure!(
-            Self::is_hotkey_registered_on_any_network(&hotkey_id),
-            Error::<T>::HotKeyNotRegisteredInNetwork
-        );
-
         // We get the previous axon info assoicated with this ( netuid, uid )
         let mut prev_prometheus = Self::get_prometheus_info(netuid, &hotkey_id);
         let current_block: u64 = Self::get_current_block_as_u64();
@@ -328,10 +309,10 @@ impl<T: Config> Pallet<T> {
         protocol: u8,
         placeholder1: u8,
         placeholder2: u8,
-    ) -> Result<(), Error<T>> {
-        // Ensure the hotkey is registered somewhere.
+    ) -> Result<AxonInfoOf, Error<T>> {
+        // Ensure the hotkey is registered on this specific network.
         ensure!(
-            Self::is_hotkey_registered_on_any_network(hotkey_id),
+            Self::is_hotkey_registered_on_network(netuid, hotkey_id),
             Error::<T>::HotKeyNotRegisteredInNetwork
         );
 
@@ -351,8 +332,8 @@ impl<T: Config> Pallet<T> {
             Error::<T>::ServingRateLimitExceeded
         );
 
-        // Validate axon data with delegate func
-        prev_axon.block = Self::get_current_block_as_u64();
+        // Assemble and validate the updated axon state.
+        prev_axon.block = current_block;
         prev_axon.version = version;
         prev_axon.ip = ip;
         prev_axon.port = port;
@@ -367,6 +348,6 @@ impl<T: Config> Pallet<T> {
             axon_validated.err().unwrap_or(Error::<T>::InvalidPort)
         );
 
-        Ok(())
+        Ok(prev_axon)
     }
 }