-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Shared/Java: Introduce a shared control flow reachability library and replace the Java Nullness implementation. #20367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new shared control flow reachability library and replaces the Java Nullness implementation with it. The shared library tracks a single variable through value-preserving SSA definitions and uses splitting on finally blocks and guard values to improve precision. The Java Nullness analysis is then updated to use this new approach, which provides more accurate tracking of null values through control flow paths.
Key changes:
- Added shared control flow reachability computation with finally block tracking
- Implemented integer range guard values and improved guard value intersection logic
- Simplified Java Nullness implementation to use the new shared library
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| shared/controlflow/codeql/controlflow/Guards.qll | Added integer range guard values and intersection logic for improved guards analysis |
| shared/controlflow/codeql/controlflow/ControlFlow.qll | New shared control flow reachability library with SSA tracking and finally block handling |
| java/ql/lib/semmle/code/java/controlflow/ControlFlow.qll | Java instantiation of the shared control flow library |
| java/ql/lib/semmle/code/java/controlflow/BasicBlocks.qll | Updated to use new successor type system from shared library |
| java/ql/lib/semmle/code/java/ControlFlowGraph.qll | Added successor type mapping for control flow edges |
| java/ql/lib/semmle/code/java/dataflow/Nullness.qll | Simplified nullness analysis to use new shared library |
| java/ql/test/query-tests/Nullness/*.java | Updated test cases reflecting precision improvements |
| java/ql/test/query-tests/Nullness/NullMaybe.expected | Expected test results with improved precision |
| TIntRange(int bound, Boolean upper) { | ||
| exists(ConstantExpr c | c.asIntegerValue() + [-1, 0, 1] = bound) and | ||
| bound != 2147483647 and | ||
| bound != -2147483648 | ||
| } or |
Copilot
AI
Sep 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The integer bounds 2147483647 and -2147483648 appear to be hardcoded limits for 32-bit integers. Consider using named constants or a more explicit representation to make the intent clearer and improve maintainability.
1c40266 to
209628f
Compare
|
Dca shows a bit too much precision regression, so taking this back into draft while I investigate. |
e534cb0 to
0b5bb6a
Compare
5ab759a to
b9f5f55
Compare
b9f5f55 to
2e075c8
Compare
f14ab6b to
e302616
Compare
| TValue(TAbstractSingleValue val, Boolean isVal) or | ||
| TIntRange(int bound, Boolean upper) { | ||
| exists(ConstantExpr c | c.asIntegerValue() + [-1, 0, 1] = bound) and | ||
| bound != 2147483647 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total cornercase / nit question: If c holds the maximum integer value of 2147483647 shouldn't we then have TIntRange(2147483646, _) and TIntRange(2147483647, _) (the latter is excluded)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For edge cases like TIntRange(2147483647, upper) we can easily run into trouble: If upper = false then this is likely a singleton value set, and quite unlikely as a useful bound. And with upper = true, the bound becomes trivial assuming that it applies to e.g. a Java int, but worse is that the default computation of its dual lower bound overflows in QL. Simply excluding the edge cases as valid bounds helps avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment in the code.
| not irrelevantFinally(finally) and | ||
| bb1.getASuccessor(t) = bb2 and | ||
| n1 = getEnclosingAstNode(bb1.getLastNode()) and | ||
| n2 = getEnclosingAstNode(bb2.getNode(0)) and | ||
| inFinally(n1, finally) and | ||
| not inFinally(n2, finally) and | ||
| if t instanceof AbruptSuccessor then abrupt = true else abrupt = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be re-factored into helper predicate(s) and re-used?
The logic tries to capture whether we are on the "boundary" of a final block (the same goes for entersFinally).
Would it be possible to do something like
private predicate auxFinally(BasicBlock bb1, BasicBlock bb2, FinallyBlock finally) {
exists(AstNode n1, AstNode n2 |
not irrelevantFinally(finally) and
bb1.getASuccessor() = bb2 and
n1 = getEnclosingAstNode(bb1.getLastNode()) and
n2 = getEnclosingAstNode(bb2.getNode(0)) and
not inFinally(n1, finally) and
inFinally(n2, finally)
)
}
private boolean isAbrupt(BasicBlock bb1, BasicBlock bb2) {
exists(SuccessorType t |
bb1.getASuccessor(t) = bb2 and
if t instanceof AbruptSuccessor then result = true else result = false
)
}
Or maybe this is just overcomplicating things 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that there's a certain overlap in the logic between entersFinally and leavesFinally, but I'm not sure that I see a nice way to share that overlap - we can't really project n1 and n2 away before we get to the part that makes the predicates different, so your auxFinally above only applies to one of the cases. And then we're stuck with a "helper" predicate with a lot of columns. Also, the most restrictive part of these predicates are likely the inFinally part, which is what's different, so I expect a pipeline to either start there or join it early, and then a "helper" predicate refactor will possibly need to be inlined and that obscures reading with join orders in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, my bad, we can't have the getASuccessor in the helper predicate - then we definitely would need to inline to avoid a combinatorial explosion.
Also, thank you for thinking about it.
| * A stack of split values to track whether entered finally blocks have | ||
| * waiting completions. | ||
| */ | ||
| private class FinallyStack extends TFinallyStack { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to try and make a parameterized module (not in this PR) for constructing finite size lists/stacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly yes. Do we know of any other use-cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the other list like implementations I could find have more complicated constraints on the Cons like branch (typically involving the list itself), so maybe there are no cases where it can be re-used.
| class GuardValue { | ||
| string toString(); | ||
|
|
||
| GuardValue getDualValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A QlDoc for this predicate would be nice.
| } | ||
|
|
||
| /** An input configuration for control flow reachability. */ | ||
| signature module ConfigSig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice that the predicates are named similar to those of dataflow configurations.
michaelnebel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks plausible to me!
As always, very beautiful!
michaelnebel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks plausible to me. 🎉
| LocationSig Location, BB::CfgSig<Location> Cfg, | ||
| InputSig<Location, Cfg::ControlFlowNode, Cfg::BasicBlock> Input> | ||
| { | ||
| private module Cfg_ = Cfg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this alias needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because imports resolve to file modules instead of local names when there's an ambiguity - and this file sits next to Cfg.qll.
| * Holds if the value of `def` at `node` is a source for the reachability | ||
| * computation. | ||
| */ | ||
| predicate source(ControlFlowNode node, SsaDefinition def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be named isSource instead like in data flow configs (and same for the other predicates)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a vacuum I think I prefer the shorter names, but I have no strong opinion here.
| /** | ||
| * Constructs a control flow reachability computation. | ||
| */ | ||
| module Flow<ConfigSig Config> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called Local or perhaps LocalFlow instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll add a Global counterpart - so I prefer the simple Flow name here.
This introduces a new shared library for computing control flow reachability. The computation follows the value of a single variable through value-preserving SSA definitions (i.e. phi nodes, plus optionally uncertain writes). Splitting on finally blocks and other variables is performed to verify the validity of the computed path.
Two output predicates are exposed:
flow, which indicates a path from a source to a sink, andescapeFlow, which indicates that a source may escape in the sense that it circumvents all sinks.This new library is then instantiated for Java and used to replace the Java Nullness analysis. This generally results in a precision improvement, although there are a few feature gaps for which we may have regressions compared to the old implementation. These gaps are
Properly observing certain assertion calls related to nullness.x == y.x instanceof Foo.Commit-by-commit review is encouraged.