Skip to content

NOMERGE: Python bindings proof of concept#9

Draft
gtrepta wants to merge 11 commits into
masterfrom
python
Draft

NOMERGE: Python bindings proof of concept#9
gtrepta wants to merge 11 commits into
masterfrom
python

Conversation

@gtrepta
Copy link
Copy Markdown
Contributor

@gtrepta gtrepta commented Feb 27, 2026

Here's a proof of concept for python bindings to rust kore structures, implementing Sorts.

A rough breakdown of additions and changes in the tree:

bindings/
|   # This crate holds the build system for the python module.
|   # It was generated using a tool called 'maturin'
└── kframework_py
    ├── Cargo.toml
    ├── pyproject.toml
    ├── src
    |   |   # Defines the python module strucuture, exports the classes,
    |   |   # and does some initialization.
    │   └── lib.rs
    ├── tests
    |   |   # These tests have examples of what using the sorts will look like!
    │   └── test_kore.py
kframework/src/
├── bindings
|   |   # The main implementation of the bindings. It's compiled behind a "python" feature flag.
|   |   # Notably it uses an arena to hold all of the allocations and manage caching.
│   └── python.rs
├── kore
|   |   # The Sort::App variant now holds Box<[Arc<Sort>]> for its arguments
|   |   # The boxed slice instead of a Vec helps ensure immutability
|   |   # The Arc pointers allow sharing of subtrees
│   ├── syntax.rs

A couple of notes:

There's no logic for de-allocating anything in the arena yet. I think when the reference count to any of the smart pointers it holds onto falls to 1, then that means the arena is the only one referencing the object and it can free the slot.

This probably isn't thread safe. PySort::create constructs objects and uses a mutable borrow of the arena to update it, if two threads do this at the same time then there will be a panic. We should be able to put the arena behind a mutex. Some consideration for that and holding the GIL is needed, listed here

Copy link
Copy Markdown
Collaborator

@tothtamas28 tothtamas28 Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, these changes are necessary for two reasons:

  1. Each bindings value (from python.rs) wraps a non-bindings value.
  2. Python objects are reference counted.

So for example

  1. A Sort::App might be contained in another Sort::App (in args), but also in a SortApp (as a wrapped non-bindings object).
  2. In order to share a SortApp, the contained Sort::App also has to be sharable.

Is this correct?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, at least as a first iteration, maybe a less powerful ownership model could be sufficient for our use cases. Something like this:

struct PySort {
    wrapped: &Sort  // or *const Sort, whatever works with PyO3
}

So basically, all bindings object is just a handle to a non-bindings object, without access to its internal structure. This still lets us implement methods on PySort like KORE text/json parsing and serialization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, these changes are necessary for two reasons:

  1. Each bindings value (from python.rs) wraps a non-bindings value.
  2. Python objects are reference counted.
  1. In practical terms, yes. In reality it holds an index into an arena that holds the rust type.
  2. This is true but wasn't what made this change necessary.

So for example

  1. A Sort::App might be contained in another Sort::App (in args), but also in a SortApp (as a wrapped non-bindings object).
  2. In order to share a SortApp, the contained Sort::App also has to be sharable.

Is this correct?

  1. True
  2. Not necessarily, because the SortApp can be shared by incrementing the python object's reference count.

There is also a use case where a user constructs a SortApp using arguments to existing sorts:

SortApp('SortA', (s1, s2))

This is going to construct a rust Sort::App, and it's preferrable to get the pointers to the rust types contained by s1 and s2, instead of making a copy of them. If we're guaranteeing that these objects are immutable then that will be fine.

I wonder, at least as a first iteration, maybe a less powerful ownership model could be sufficient for our use cases. Something like this:

struct PySort {
    wrapped: &Sort  // or *const Sort, whatever works with PyO3
}

So basically, all bindings object is just a handle to a non-bindings object, without access to its internal structure. This still lets us implement methods on PySort like KORE text/json parsing and serialization.

This might work well enough... It's essentially what the PySort I implemented is, but with added logic to manage allocations. It took quite a bit of time to figure that out and maybe I was a little overzealous, but in the end I think it's a very good approach.

I can't remember now but I think I tried your suggestion early on and ran into some kind of issue, I'll time box an attempt and see what comes up.

Copy link
Copy Markdown
Contributor Author

@gtrepta gtrepta Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here's a branch python_simple in which a PySort owns a boxed rust Sort. It was relatively easy to implement, I think I'm much more familiar with Pyo3 and the domain now than when I was initially trying to implement this.

We can go forward with that approach. I think there are some arguments to be made for using an arena but it's better to shelve it.

I would at least still like to change the rust structures to use Arc pointers for their children. It's a pretty common pattern to create a new term from an existing term that reuses subterms, and we can avoid performing copies of them pretty easily with that.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would at least still like to change the rust structures to use Arc pointers for their children. It's a pretty common pattern to create a new term from an existing term that reuses subterms, and we can avoid performing copies of them pretty easily with that.

This can be done in a separate PR for the whole syntax module.

When I implemented the syntax with unique pointers, I had a rewrite engine in mind that can modify subterms locally without copying the superterm. How is this implemented in the LLVM backend?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure how the LLVM backend does it.

I was thinking along the lines of how the pyk structures have let methods which allow deriving new terms that reuse existing parts of the old term.

Does the rewriter you were thinking of retain previous terms? It sounds like this would keep a single term and modify it on every rewrite. I'm looking for something that can keep a history of execution steps reasonably well, meaning multiple terms with small diffs between them.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this would keep a single term and modify it on every rewrite.

Exactly. But I agree that the shared reference approach is also very reasonable.

keep a history of execution steps

For this, keeping a copy of each term along the trace is probably overkill though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep a history of execution steps

For this, keeping a copy of each term along the trace is probably overkill though.

Along every single execution step yes. Maybe in a debugger or in proof traces it could be useful. But for keeping KCFGs which have fewer nodes then I expect there would be some advantages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants