Skip to content

ANFEncoder: iterate to fixpoint to eliminate nested duplicates#1135

Open
tautschnig wants to merge 1 commit into
mainfrom
tautschnig/anf-fixedpoint
Open

ANFEncoder: iterate to fixpoint to eliminate nested duplicates#1135
tautschnig wants to merge 1 commit into
mainfrom
tautschnig/anf-fixedpoint

Conversation

@tautschnig
Copy link
Copy Markdown
Contributor

The ANF encoder extracts duplicated subexpressions into fresh variables, but the existing single-pass implementation can leave large duplicated sub-subexpressions behind.

Root cause: removeSubsumed drops candidate duplicates that are subexpressions of other (larger) candidate duplicates, to avoid creating redundant variable declarations. But this means that if only the outer expression appears at the top level, the inner dupes are hidden inside the lifted var declaration and never extracted.

Example (from PyAnalyzeLaurel benchmark check_storage_costs):

Original: assert Any_to_bool(Any_get(response, "Datapoints")) ...

After partial evaluation, Any_to_bool inlines its 7-branch body,
each branch referencing the argument. With Any_get also inlined
as an ite (is-DictStrAny response) (DictStrAny_get ...) (List_get ...), the Any_get expression ends up duplicated 62
times inside a single assert.

Old ANF output:
var $__anf.0 : bool := <9KB body with 62 duplicates of Any_get>;
assert $__anf.0

New ANF output (after iteration):
var $__anf.3 : Any := Any_get(response, "Datapoints");
var $__anf.0 : bool := <body using $__anf.3>;
assert $__anf.0

Effect: VC file size for VCs on one benchmark drops from 32KB to 11KB (~65% reduction), and another benchmark now completes verification at 36s where it previously hit a 60s timeout.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The ANF encoder extracts duplicated subexpressions into fresh variables,
but the existing single-pass implementation can leave large duplicated
sub-subexpressions behind.

Root cause: `removeSubsumed` drops candidate duplicates that are
subexpressions of other (larger) candidate duplicates, to avoid
creating redundant variable declarations. But this means that if
only the outer expression appears at the top level, the inner dupes
are hidden inside the lifted var declaration and never extracted.

Example (from PyAnalyzeLaurel benchmark check_storage_costs):

  Original: assert Any_to_bool(Any_get(response, "Datapoints")) ...

  After partial evaluation, Any_to_bool inlines its 7-branch body,
  each branch referencing the argument. With Any_get also inlined
  as an `ite (is-DictStrAny response) (DictStrAny_get ...)
  (List_get ...)`, the Any_get expression ends up duplicated 62
  times inside a single assert.

  Old ANF output:
    var $__anf.0 : bool := <9KB body with 62 duplicates of Any_get>;
    assert $__anf.0

  New ANF output (after iteration):
    var $__anf.3 : Any := Any_get(response, "Datapoints");
    var $__anf.0 : bool := <body using $__anf.3>;
    assert $__anf.0

Effect: VC file size for VCs on one benchmark drops from 32KB to
11KB (~65% reduction), and another benchmark now completes
verification at 36s where it previously hit a 60s timeout.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the Core ANF encoder so it repeatedly extracts duplicated subexpressions until reaching a fixpoint, reducing duplication that can remain hidden inside newly introduced var declarations after a single extraction pass.

Changes:

  • Update anfEncodeBody to iterate until no further ANF encoder targets are found.
  • Adjust documentation to explain why fixpoint iteration is needed for nested-duplicate elimination.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 207 to +223
@@ -209,7 +217,10 @@ def anfEncodeBody (body : Statements) (startIdx : Nat) : Statements × Nat :=
) ([], ({} : Std.HashMap UInt64 (Expression.Expr × Expression.Expr)), startIdx)
-- Single pass: replace all targets at once
let body' := Statements.mapExprs (replaceExprs replacements) body
(revDecls.reverse ++ body', nextIdx)
let newBody := revDecls.reverse ++ body'
-- Iterate: the newly-inserted var declarations may themselves contain
-- duplicated subexpressions that `removeSubsumed` dropped in this pass.
anfEncodeBody newBody nextIdx
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Clean, well-motivated change. The fixpoint iteration is a natural solution to the nested-duplicate problem, and the early-exit on targets.isEmpty guarantees termination in practice (each pass extracts at least one duplicate, strictly reducing the pool of non-leaf subexpressions).

One suggestion below about test coverage for the new iteration behavior.

partial def anfEncodeBody (body : Statements) (startIdx : Nat) : Statements × Nat :=
let targets := findANFEncoderTargets ((Statements.collectExprs body).flatMap collectSubexprs)
if targets.isEmpty then
(body, startIdx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Nit: consider adding a unit test that exercises multi-pass extraction (e.g., an expression where a large duplicate contains a smaller duplicate that only becomes extractable after the first pass). The existing tests all resolve in a single iteration, so the fixpoint loop is currently untested at the unit level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants