Skip to content

AI+engine heavy board performance optimizations #10507

Draft
MostCromulent wants to merge 1 commit intoCard-Forge:masterfrom
MostCromulent:perf-fixes-for-pr
Draft

AI+engine heavy board performance optimizations #10507
MostCromulent wants to merge 1 commit intoCard-Forge:masterfrom
MostCromulent:perf-fixes-for-pr

Conversation

@MostCromulent
Copy link
Copy Markdown
Contributor

Caveat: Not a programmer, not a coder, so I can't verify this myself. However I'm presenting this for consideration by devs in case there are things here you want to adopt or cherrypick.

On https://github.com/MostCromulent/forge/tree/tokenperf I set Claude AI up on a loop profiling performance of 4-player AI vs AI games with token heavy board states to identify potential performance optimizations.

The loop went through an process of hypothesis, testing, and verification to confirm any optimization did not cause regression in current AI decision making or engine results: outputs must be the same before and after code changes. This was done by running master code and optimizations concurrently and comparing outputs.

This PR reflects 6x fixes identified as resulting in improved performance with no regression in outputs. These focus on early-exiting commonly called methods if they are going to produce redundant or empty results, as well as an improved search algorithm.

These fixes together showed a ~33% improvement in AI batch testing time with 4-player precons and ~27% improvement with 4-player commander precons.

Detailed explanation and methodology here.


🤖 Generated with Claude Code

Engine and AI hot paths doing redundant work on cards that have no
possible source of the thing being computed, and one AI loop redoing
combat simulation for each of N identical tokens. Add O(1) fast-path
guards that skip the work when it provably produces nothing, and
batch the AI loop over interchangeable blockers.

- AiAttackController.notNeededAsBlockers: detect runs of identical-
  equivalence-class blockers; batch-release with one predict call and
  binary-search on MIN_VALUE. Singleton path unchanged.
- Card.getChangedCardTraitsList: skip the Iterables.concat over two
  empty TreeBasedTables (mirrors existing pattern in
  Card.getChangedCardTypes).
- Card.hasRemoveIntrinsic: skip the IterableUtil.any Stream when all
  three changedCardTypes* tables are empty.
- CardState.getReplacementEffects / getStaticAbilities / getTriggers:
  return FCollection.getEmpty() when no intrinsic, no split sides,
  no trait overlays, no trait-contributing keywords (and for REs
  no type / subtype / counter-based synthetics).
- KeywordCollection: lazily-computed trait bitmask with mutation
  invalidation, powering the empty-source checks above.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MostCromulent MostCromulent marked this pull request as draft April 24, 2026 04:41
@Hanmac Hanmac requested review from Agetian, Hanmac and tool4ever April 24, 2026 07:52
@Hanmac Hanmac added Game Mechanics AI General AI tag labels Apr 24, 2026
@Hanmac
Copy link
Copy Markdown
Contributor

Hanmac commented Apr 24, 2026

The fast checkout methods currently might ignore hasRemoveIntrinsic

also, the order of the checks might need to be reversed from the layer, to get early return true

  • Layer 6 check for new abilities
  • Layer 4 check for remove Intrinsic, if true, return empty
  • Layer 3 check for text changes
  • Layer 1 check for default values (LeftState/RightState only if current state is default)

@tool4ever
Copy link
Copy Markdown
Contributor

  • Fix 1: sounds ok, will test that later - though 100+ new lines there will make understanding the logic a bit more complicated 😅
  • Fix 2+3: should be safe
  • Fix 4-6: seems more messy, maybe it'd be better to extend the keywordsCache approach instead so it covers all traits - then instead of yes/no for updating you'll be able to reuse each List multiple times afterwards
    @Hanmac didn't we have an issue for that lying around somewhere?

@Hanmac
Copy link
Copy Markdown
Contributor

Hanmac commented Apr 24, 2026

  • then instead of yes/no for updating you'll be able to reuse each List multiple times afterwards

i mean there is ICardTraitChanges, which passes the List through as parameter:

    default List<SpellAbility> applySpellAbility(List<SpellAbility> list) { return list;}
    default List<Trigger> applyTrigger(List<Trigger> list) { return list;}
    default List<ReplacementEffect> applyReplacementEffect(List<ReplacementEffect> list) { return list;}
    default List<StaticAbility> applyStaticAbility(List<StaticAbility> list) { return list;}

I might do KeywordCollection implement ICardTraitChanges, so it can iterate over its keywords internally

* Used by {@link #notNeededAsBlockers} to batch predict calls over
* groups of identical tokens.
*/
public static boolean isAiEquivalentBlocker(Card a, Card b) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think "indistinguishable" is strictly true, e.g. AiBlockController#changesPTWhenBlocked is based on a trigger check. Though covering only the more common cases seems enough for this heuristic
  2. However I'd still like to see some consideration for the opportunity cost introduced this way - yes people who encounter lots of token creatures currently complain, but some noise has to be taken into account. Therefore - for this to be worth it - the additional runtime of adding these attacker grouping checks for games with almost no tokens needs to be acceptable...
  3. I'm not even convinced this grouping is currently needed:
    isn't the remaining life way more often heavily influenced by the amount of remaining creatures (for chump blocking) or their total toughness (against tramplers)?
    So just always using binary search (maybe with a bias towards smaller boardstates) makes more sense? 🤔

Copy link
Copy Markdown
Contributor Author

@MostCromulent MostCromulent Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have run some tests on this, results below:

Method:
Tested two variations against linear baseline (the code prior to this PR with no algorithm enabled):
-H003: the shipped fix in this PR - equivalence-class grouping + batch binary search in notNeededAsBlockers
-H003b: the proposed simplification — drop the equivalence-class detection entirely and just binary-search the whole sorted blocker list.

Built H003b on top of the testbed branch and ran (a) an in-flight verify that runs both algorithms on identical pre-decision state at every notNeededAsBlockers call and compares release sets; (b) bucketed per-call ms-by-blocker-count timing on a 20-game precon batch + 10-game Commander batch with three variants: linear baseline, shipped H003, and H003b.

Decision-making: 1,313 in-flight verify calls across both formats, 2 divergences (99.85% agreement). Where they diverge, H003b is strictly more conservative — declines an aggressive trade H003 takes. Both choices are reasonable AI play; the grouping isn't producing weird decisions.

Performance at 20 random precon games, fixed seed:

Linear (no algorithm) H003 (PR) H003b (simplification)
Batch wall 197.6s 131.0s 202.0s
ms/call N ≤ 5 (95% of decisions) 68 ms 29 ms 42 ms
ms/call N6-15 (5% of decisions) 198 ms 203 ms 144 ms

Note: each of these tests kept the other fixes in this PR turned on; only H003/H003b was isolated.

The opportunity-cost concern — that grouping checks add runtime in non-token games — turns out empirically negative: H003 is ~34% wall-faster than the same code without it even in random precon games where token swarms don't appear. H003b is essentially tied with linear at the batch level (slight regression); it only beats H003 on the rare wide-board calls.

Why: even N ≤ 5 "heterogeneous" precon boards routinely have partial duplicates — two of the same vanilla creature drawn off a 60-card deck, two basic-stat tokens from a single ETB. H003's grouping engages on those runs and batch-probes a 2-duplicate group in 1 predict call instead of 2. H003b can't recognise duplicates so it always does log(N) binary-search probes + up to N damage-trade walks, which on small boards comes out to more predict calls than even plain linear iteration. The grouping mechanism isn't gratuitous overhead; it's a net wall-saver even where there are no token swarms.

Recommendation: keep H003 as designed. The simpler H003b doesn't recover the wins H003 gets from batch-collapsing partial duplicates on small boards, and it's roughly tied with no-algorithm-at-all on the aggregate workload. The ~50 LOC of equivalence-class machinery is doing real work in real games, not just on token swarms.

The wall clock times here aren't particularly helpful, the per call times are the key measure. Wall times get noisy over a long test cycle (even with seeded starts) because a faster algorithm means AI has more time before it hits its thinking time cap, this means it has time to make a different decision, that decision then cascades into a different game state which can mean a longer game length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI General AI tag Game Mechanics

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants