Skip to content

Conversation

@aschackmull
Copy link
Contributor

@aschackmull aschackmull commented Sep 4, 2025

This introduces a new shared library for computing control flow reachability. The computation follows the value of a single variable through value-preserving SSA definitions (i.e. phi nodes, plus optionally uncertain writes). Splitting on finally blocks and other variables is performed to verify the validity of the computed path.
Two output predicates are exposed: flow, which indicates a path from a source to a sink, and escapeFlow, which indicates that a source may escape in the sense that it circumvents all sinks.

This new library is then instantiated for Java and used to replace the Java Nullness analysis. This generally results in a precision improvement, although there are a few feature gaps for which we may have regressions compared to the old implementation. These gaps are

  • Properly observing certain assertion calls related to nullness.
  • Splitting based on correlated conditions of the form x == y.
  • Splitting based on correlated conditions of the form x instanceof Foo.

Commit-by-commit review is encouraged.

@aschackmull aschackmull requested a review from a team as a code owner September 4, 2025 13:07
Copilot AI review requested due to automatic review settings September 4, 2025 13:07
@github-actions github-actions bot added the Java label Sep 4, 2025
@@ -0,0 +1,54 @@
import java
private import codeql.controlflow.ControlFlow
private import semmle.code.java.dataflow.SSA as SSA

Check warning

Code scanning / CodeQL

Names only differing by case Warning

SSA is only different by casing from Ssa that is used elsewhere for modules.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new shared control flow reachability library and replaces the Java Nullness implementation with it. The shared library tracks a single variable through value-preserving SSA definitions and uses splitting on finally blocks and guard values to improve precision. The Java Nullness analysis is then updated to use this new approach, which provides more accurate tracking of null values through control flow paths.

Key changes:

  • Added shared control flow reachability computation with finally block tracking
  • Implemented integer range guard values and improved guard value intersection logic
  • Simplified Java Nullness implementation to use the new shared library

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
shared/controlflow/codeql/controlflow/Guards.qll Added integer range guard values and intersection logic for improved guards analysis
shared/controlflow/codeql/controlflow/ControlFlow.qll New shared control flow reachability library with SSA tracking and finally block handling
java/ql/lib/semmle/code/java/controlflow/ControlFlow.qll Java instantiation of the shared control flow library
java/ql/lib/semmle/code/java/controlflow/BasicBlocks.qll Updated to use new successor type system from shared library
java/ql/lib/semmle/code/java/ControlFlowGraph.qll Added successor type mapping for control flow edges
java/ql/lib/semmle/code/java/dataflow/Nullness.qll Simplified nullness analysis to use new shared library
java/ql/test/query-tests/Nullness/*.java Updated test cases reflecting precision improvements
java/ql/test/query-tests/Nullness/NullMaybe.expected Expected test results with improved precision

Comment on lines 211 to 215
TIntRange(int bound, Boolean upper) {
exists(ConstantExpr c | c.asIntegerValue() + [-1, 0, 1] = bound) and
bound != 2147483647 and
bound != -2147483648
} or
Copy link

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The integer bounds 2147483647 and -2147483648 appear to be hardcoded limits for 32-bit integers. Consider using named constants or a more explicit representation to make the intent clearer and improve maintainability.

Copilot uses AI. Check for mistakes.
@aschackmull
Copy link
Contributor Author

Dca shows a bit too much precision regression, so taking this back into draft while I investigate.

@aschackmull aschackmull marked this pull request as draft September 5, 2025 09:55
@aschackmull aschackmull marked this pull request as ready for review September 12, 2025 13:45
TValue(TAbstractSingleValue val, Boolean isVal) or
TIntRange(int bound, Boolean upper) {
exists(ConstantExpr c | c.asIntegerValue() + [-1, 0, 1] = bound) and
bound != 2147483647 and
Copy link
Contributor

@michaelnebel michaelnebel Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total cornercase / nit question: If c holds the maximum integer value of 2147483647 shouldn't we then have TIntRange(2147483646, _) and TIntRange(2147483647, _) (the latter is excluded)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For edge cases like TIntRange(2147483647, upper) we can easily run into trouble: If upper = false then this is likely a singleton value set, and quite unlikely as a useful bound. And with upper = true, the bound becomes trivial assuming that it applies to e.g. a Java int, but worse is that the default computation of its dual lower bound overflows in QL. Simply excluding the edge cases as valid bounds helps avoid that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment in the code.

Comment on lines +233 to +239
not irrelevantFinally(finally) and
bb1.getASuccessor(t) = bb2 and
n1 = getEnclosingAstNode(bb1.getLastNode()) and
n2 = getEnclosingAstNode(bb2.getNode(0)) and
inFinally(n1, finally) and
not inFinally(n2, finally) and
if t instanceof AbruptSuccessor then abrupt = true else abrupt = false
Copy link
Contributor

@michaelnebel michaelnebel Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be re-factored into helper predicate(s) and re-used?
The logic tries to capture whether we are on the "boundary" of a final block (the same goes for entersFinally).
Would it be possible to do something like

private predicate auxFinally(BasicBlock bb1, BasicBlock bb2, FinallyBlock finally) {
    exists(AstNode n1, AstNode n2 |
      not irrelevantFinally(finally) and
      bb1.getASuccessor() = bb2 and
      n1 = getEnclosingAstNode(bb1.getLastNode()) and
      n2 = getEnclosingAstNode(bb2.getNode(0)) and
      not inFinally(n1, finally) and
      inFinally(n2, finally)
    )
  }

  private boolean isAbrupt(BasicBlock bb1, BasicBlock bb2) {
    exists(SuccessorType t |
      bb1.getASuccessor(t) = bb2 and
      if t instanceof AbruptSuccessor then result = true else result = false
    )
  }

Or maybe this is just overcomplicating things 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that there's a certain overlap in the logic between entersFinally and leavesFinally, but I'm not sure that I see a nice way to share that overlap - we can't really project n1 and n2 away before we get to the part that makes the predicates different, so your auxFinally above only applies to one of the cases. And then we're stuck with a "helper" predicate with a lot of columns. Also, the most restrictive part of these predicates are likely the inFinally part, which is what's different, so I expect a pipeline to either start there or join it early, and then a "helper" predicate refactor will possibly need to be inlined and that obscures reading with join orders in mind.

Copy link
Contributor

@michaelnebel michaelnebel Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, my bad, we can't have the getASuccessor in the helper predicate - then we definitely would need to inline to avoid a combinatorial explosion.
Also, thank you for thinking about it.

* A stack of split values to track whether entered finally blocks have
* waiting completions.
*/
private class FinallyStack extends TFinallyStack {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to try and make a parameterized module (not in this PR) for constructing finite size lists/stacks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly yes. Do we know of any other use-cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the other list like implementations I could find have more complicated constraints on the Cons like branch (typically involving the list itself), so maybe there are no cases where it can be re-used.

class GuardValue {
string toString();

GuardValue getDualValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A QlDoc for this predicate would be nice.

}

/** An input configuration for control flow reachability. */
signature module ConfigSig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice that the predicates are named similar to those of dataflow configurations.

Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks plausible to me!
As always, very beautiful!

Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks plausible to me. 🎉

@aschackmull aschackmull merged commit 57e15b9 into github:main Sep 16, 2025
42 of 43 checks passed
@aschackmull aschackmull deleted the shared/controlflow branch September 16, 2025 08:44
LocationSig Location, BB::CfgSig<Location> Cfg,
InputSig<Location, Cfg::ControlFlowNode, Cfg::BasicBlock> Input>
{
private module Cfg_ = Cfg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this alias needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because imports resolve to file modules instead of local names when there's an ambiguity - and this file sits next to Cfg.qll.

* Holds if the value of `def` at `node` is a source for the reachability
* computation.
*/
predicate source(ControlFlowNode node, SsaDefinition def);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be named isSource instead like in data flow configs (and same for the other predicates)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a vacuum I think I prefer the shorter names, but I have no strong opinion here.

/**
* Constructs a control flow reachability computation.
*/
module Flow<ConfigSig Config> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called Local or perhaps LocalFlow instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll add a Global counterpart - so I prefer the simple Flow name here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants