Skip to content

Extract map_theoryt base class from arrayst#8992

Open
tautschnig wants to merge 8 commits into
diffblue:developfrom
tautschnig:map_theoryt
Open

Extract map_theoryt base class from arrayst#8992
tautschnig wants to merge 8 commits into
diffblue:developfrom
tautschnig:map_theoryt

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

@tautschnig tautschnig commented Apr 29, 2026

Separate map-theoretic reasoning (index tracking, equality tracking, Ackermann constraints, read-over-weakeq, extensionality, CDCL(T) propagation, constraint counting) from array-specific encoding (with/if/of/comprehension constraints, lazy selects, 2D inlining).

Inheritance chain: arrayst -> map_theoryt -> equalityt -> prop_conv_solvert

Likely best reviewed commit-by-commit: the second commit introduces a lot of renaming, which impacts call sites. No (intended) functional changes in the second commit.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Apr 29, 2026
Copilot AI review requested due to automatic review settings April 29, 2026 17:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new map_theoryt base class to factor out map-/array-theoretic reasoning infrastructure from arrayst, adjusting the inheritance chain to arrayst -> map_theoryt -> equalityt -> prop_conv_solvert.

Changes:

  • Added map_theoryt (header + constructor TU) containing shared state and helpers for index/equality tracking, Ackermann constraints, and constraint counting.
  • Updated arrayst to inherit from map_theoryt and removed now-shared members from arrays.h.
  • Re-scoped several previously-arrayst helper methods to map_theoryt.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/solvers/flattening/map_theory.h Introduces the new base class API and shared state for map/array reasoning.
src/solvers/flattening/map_theory.cpp Adds map_theoryt constructor implementation.
src/solvers/flattening/arrays.h Switches arrayst to inherit from map_theoryt and removes moved members.
src/solvers/flattening/arrays.cpp Updates implementations to use map_theoryt for extracted functionality.
Comments suppressed due to low confidence (1)

src/solvers/flattening/arrays.cpp:47

  • Several map_theoryt method implementations (e.g., record_array_index, collect_indices, collect_arrays, add_array_constraint, etc.) are now defined in arrays.cpp. This undermines the stated goal of separating map-theoretic reasoning from array-specific encoding and makes linkage/dependencies less clear. Consider moving map_theoryt method definitions into map_theory.cpp and leaving arrays.cpp to only implement arrayst-specific behavior.
void map_theoryt::record_array_index(const index_exprt &index)
{
  // we are not allowed to put the index directly in the
  //   entry for the root of the equivalence class
  //   because this map is accessed during building the error trace
  std::size_t number=arrays.number(index.array());
  if(index_map[number].insert(index.index()).second)
    update_indices.insert(number);
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/solvers/flattening/arrays.h Outdated
Comment thread src/solvers/flattening/arrays.h Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 89.27445% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.58%. Comparing base (4a1cc53) to head (683aa56).

Files with missing lines Patch % Lines
src/solvers/flattening/map_theory.cpp 81.75% 27 Missing ⚠️
src/solvers/flattening/arrays.cpp 93.85% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8992      +/-   ##
===========================================
+ Coverage    80.56%   80.58%   +0.01%     
===========================================
  Files         1707     1709       +2     
  Lines       189134   189103      -31     
  Branches        73       73              
===========================================
- Hits        152385   152380       -5     
+ Misses       36749    36723      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tautschnig tautschnig requested a review from TGWDB as a code owner April 30, 2026 12:05
@tautschnig tautschnig force-pushed the map_theoryt branch 5 times, most recently from 352d242 to b1fbf98 Compare May 26, 2026 11:11
tautschnig and others added 8 commits May 27, 2026 07:13
…ions

test_json.desc was added as KNOWNBUG in 2020 (commit ecb0a5d)
asserting an aspirational JSON shape that cbmc never produced. Two
flaws:

  1. The exit code assertion '^EXIT=0$' did not match cbmc's actual
     exit. main.c contains 'assert(a[index] == 1)' with a non-deterministic
     'index', which fails verification — cbmc exits with 10. (The two
     sibling tests, test.desc and test_xml.desc, exercise '--show-array-
     constraints' without '--json-ui' / with '--xml-ui'; cbmc rejects
     those combinations and exits 1, which they correctly assert.)

  2. The regex asserted list-shaped JSON values, e.g.
     '"arrayWith": [.*]', but the implementation has always emitted
     scalar counts: '"arrayWith": 2'. The test was therefore
     unmatchable on day one.

Fix both: assert '^EXIT=10$' and rewrite the regex to match scalar
counts ('\\d+'). Drop the KNOWNBUG tag so the JSON contract for
'--show-array-constraints' is now pinned by an actually-running test.

The test only applies to the default SAT backend: under
'--cprover-smt2' or '--incremental-smt2-solver z3' cbmc bypasses the
array theory entirely and emits no 'arrayConstraints' JSON block (and
'--cprover-smt2' returns a different exit code). Tag the test
'broken-cprover-smt-backend no-new-smt' so it is skipped under those
profiles, matching the convention used by other arrayst-dependent
tests such as Array_operations2 / Bitfields1 / Computed-Goto1.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Separate map-theoretic reasoning (index tracking, equality tracking,
Ackermann constraints, read-over-weakeq, extensionality, CDCL(T)
propagation, constraint counting) from array-specific encoding
(with/if/of/comprehension constraints, lazy selects, 2D inlining).

Inheritance chain: arrayst -> map_theoryt -> equalityt -> prop_conv_solvert

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
…array'

In a map theory the tracked expressions are 'keys' and their collection
per equivalence class is the 'domain'.  Rename identifiers, type
aliases, function names, and comments accordingly:

  index_sett          → key_sett
  index_mapt          → domain_mapt
  index_map           → domain_map
  update_indices      → update_keys
  collect_indices()   → collect_keys()
  update_index_map()  → update_domain_map()
  record_array_index()→ record_array_key()

Rename all map_theoryt members, methods, types, enum values, and
comments to use map-theory terminology instead of array-specific terms:

  Types:
    array_equalityt         → map_equalityt
    array_equalitiest       → map_equalitiest
    array_constraint_countt → map_constraint_countt

  Members:
    array_equalities        → map_equalities
    arrays (union-find)     → maps
    lazy_array_constraints  → lazy_constraints
    lazy_arrays             → lazy_dispatch
    get_array_constraints   → get_constraints
    array_constraint_count  → constraint_count

  Methods:
    record_array_equality   → record_equality
    record_array_let_binding→ record_let_binding
    record_array_key        → record_key
    collect_arrays          → collect_maps
    is_unbounded_array      → is_unbounded_map
    add_array_constraint    → add_map_constraint
    add_array_Ackermann_constraints → add_Ackermann_constraints
    add_array_constraints_equality  → add_map_equality_constraints
    display_array_constraint_count  → display_constraint_count
    finish_eager_conversion_arrays  → finish_eager_conversion_maps

  Enum values: ARRAY_* → MAP_*

C-level IR names (index_exprt, ID_index, etc.) are unchanged as they
refer to the array-indexing operation, not the map-theory concept.
Array-specific IR names (array_typet, index_exprt, ID_array,
array_comprehension_args, etc.) and arrayst-specific method names
(add_array_constraints, etc.) are unchanged.

Note that the JSON output of '--show-array-constraints --json-ui'
is intentionally NOT renamed: the user-visible strings stay
'arrayConstraints', 'arrayEquality', 'arrayWith', etc. The constraints
this option reports are recorded by 'arrayst' (not by 'map_theoryt'),
the CLI option name says 'array', and any consumer of the JSON output
shouldn't be forced to track an internal terminology change. The
mapping from internal MAP_* enum tags to 'arrayX' strings lives in
'enum_to_string'; a comment there explains the asymmetry.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Document all public and protected methods, key data members, enums,
and structs in map_theory.h with Doxygen comments following the CBMC
coding standard.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
A map is unbounded by definition; the unboundedness check that the
prior commits called 'is_unbounded_map' actually distinguishes between
arrays the array theory should manage and bounded arrays that the
flattening layer bit-blasts directly. That distinction is array-
specific and has no place on a map_theoryt base.

Rename the method back to 'is_unbounded_array' and move both its
declaration and its only override down to arrayst / boolbvt.

The same applies to 'collect_keys': it walks an expression looking
specifically for 'index_exprt' over array_typet operands, tracks
'array_comprehension_args' for ID_array_comprehension, and uses
'is_unbounded_array' to decide whether to record the index. None of
that is map-theoretic. Move both overloads of 'collect_keys' (and
'array_comprehension_args') from map_theoryt to arrayst.

The remaining map_theoryt primitives ('record_key',
'update_domain_map', 'add_Ackermann_constraints',
'add_map_equality_constraints') stay where they are: they are
genuinely about maps (key sets, equivalence classes, Ackermann
implications) and do not depend on array shape.

Side-effect: 'finish_eager_conversion_maps' becomes pure virtual on
map_theoryt, since its previous default body called the now-moved
'collect_keys'. Both existing overrides ('arrayst::add_array_constraints'
and 'bv_refinementt::finish_eager_conversion_maps') already provided
their own implementation, so no caller is affected.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical, behaviour-preserving tidying of the new map_theoryt class
and its arrayst subclass:

  * Drop dead arrayst::message_handler member. The base equalityt
    already takes a message_handler via the messaget log, so storing
    it again on arrayst was redundant; nothing read the field.

  * Drop dead incremental_cache flag and the expr_map cache it
    guarded. No code path ever set incremental_cache to true, so the
    cache was unused.

  * Fix typo 'contraint_type_string' -> 'constraint_type_string' in
    display_constraint_count.

  * Align map_theory.cpp's author tag with map_theory.h (Michael
    Tautschnig). The .cpp file was extracted in this PR; the
    leftover Daniel Kroening tag came from the arrays.cpp ancestry.

  * Add explicit '#include <map>' to map_theory.h. domain_mapt and
    map_constraint_countt are std::map aliases; the include only
    came in transitively via util/union_find.h.

  * Drop spurious 'virtual' on record_key and collect_maps. Neither
    is overridden in any subclass (arrayst doesn't override either),
    so the keyword is misleading and prevents devirtualisation.

  * Rename 'update_keys' -> 'dirty_classes'. The set holds
    equivalence-class numbers (returned by union_find::number()),
    not keys; the original 'update_indices' had the same flaw and
    the rename in this PR was the chance to fix it.

  * Rename 'lazy_dispatch' -> 'defer_constraints'. The boolean
    controls whether refine=true constraints passed to
    add_map_constraint are deferred for the refinement loop;
    'defer_constraints' makes that role obvious at the call sites.

  * Rename '_get_constraints' / 'get_constraints' constructor
    parameter and member to 'collect_constraint_stats'. The old
    name read like a getter and clashed visually with the
    neighbouring 'constraint_count'. Propagated through
    arrayst, boolbvt, bv_pointerst, bv_pointers_widet, and the
    one solver_factoryt call site that still used the old
    'get_array_constraints' shape.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
The two parallel enums on map_theoryt — lazy_typet (used to classify
deferred constraints in the refinement loop) and constraint_typet
(used for per-kind statistics in --show-array-constraints) — were
near-duplicates, differing only by the additional MAP_EQUALITY entry
in constraint_typet. Maintaining them in lock-step invited
forgotten-tag bugs, and every constraint-adding call site had to
mention both names:

  lazy_constraintt lazy(lazy_typet::MAP_X, body);
  add_map_constraint(lazy, refine);
  ++constraint_count[constraint_typet::MAP_X];

Replace both with a single map_constraint_kindt enum that carries
all nine kinds (MAP_EQUALITY is never enqueued lazily, but having it
in the same enum is harmless because lazy enqueueing is decided per
add_map_constraint call, not per enum value).

Add an inline helper

  void add_constraint(
    map_constraint_kindt kind,
    const exprt &body,
    bool refine = false);

that does the lazy_constraintt construction, the add_map_constraint
call, and the constraint_count bump in one go. Rewrite the eleven
call sites (ten in arrays.cpp's add_array_constraints_* helpers,
plus the Ackermann site in map_theory.cpp::add_Ackermann_constraints)
to use it. The MAP_EQUALITY counter in add_map_equality_constraints
stays as a direct constraint_count[]++ because that constraint is
asserted via prop.lcnf, not via add_map_constraint.

Net result: one enum to keep in sync, one call per constraint
site, and ~30 lines saved across the refactored files.

The user-visible JSON output of '--show-array-constraints' is
unaffected: enum_to_string still maps MAP_X to 'arrayX' strings.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
convert_let computes 'lowered_value' (the let's bound expression with
byte_update / byte_extract operators rewritten away by
lower_byte_operators) but then dropped it on the floor and passed
the *un*-lowered 'pair.second' to record_let_binding. If the
let-bound array value happens to contain a byte_update, the array
theory's collect_maps will see it and trip

  DATA_INVARIANT(false,
    "byte_update should be removed before collect_maps")

Fix: pass 'lowered_value' so the value handed to the array theory
matches the lowering already done at the call site.

The bug was introduced by commit 14105d5 ("Connect let-bound
arrays to originals in array theory"), which added the byte-operator
lowering computation but forgot to thread the result through to
record_let_binding.

Note on regression coverage: I attempted to construct a test case
that triggers the DATA_INVARIANT, but could not. The set of CBMC
input paths that produce an array-typed let_exprt (which is the
gate at line 78 above) is currently limited to:

  * SMT2 input parsed by smt2_parser.cpp — but SMT2 has no
    byte_update construct, so the let value cannot contain one.
  * value_set_dereference.cpp — produces let-bound *pointer*-typed
    values, not array-typed; the gate above filters them out.

So the bug is currently unreachable through standard CBMC entry
points, and the fix is defensive against a future code path that
might funnel byte-operator-bearing array expressions into a let.
The existing regression/smt2_solver/let-array/let-array.smt2 test
continues to exercise the convert_let array-typed branch and passes
unchanged with the fix.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants