Skip to content

Conversation

@alexet
Copy link

@alexet alexet commented Sep 8, 2025

This adds overlay informed Dataflow.

The idea is that for non-diff-informed configurations we use the diff informed mechanism to filter the sources and sinks to the overlay and then merge those results with the base results. This can lead to FPs/FNs due to flow paths via the overlay but the accuracy seems well within what we are targetting for overlay mode and saves lots of time.

Unfortunately the merging code cannot instantiated with CPP analysis as it uses EquivalenceRelation internally. This means we need separate parametrised modules (as we can't just guard with a predicate).

The local results are also exposed for cases where the pure local results are desired (e.g. for regexes where we are only going to care about analysing new regexes in the overlay).

@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch from 4ebfdd4 to 04e8ab6 Compare September 8, 2025 16:25
@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch from 04e8ab6 to 5ee86e9 Compare September 12, 2025 17:27
private module Stage1 = ImplStage1<C>;

import Stage1::PartialFlow

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
}

predicate observeOverlayInformedIncrementalMode() { not Config::observeDiffInformedIncrementalMode() }
}

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
import MakeImplStage1<Location, DataFlowLang> as DataFlowInternalStage1

private module AddTaintDefaults<DataFlowInternal::FullStateConfigSig Config> implements
module AddTaintDefaults<DataFlowInternal::FullStateConfigSig Config> implements

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.

signature int speculationLimitSig();

module AddSpeculativeTaintSteps<

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
import DataFlowInternal::DefaultState<Config>
import Config

private predicate relevantState(Config::FlowState state) {

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
) {
Config::isAdditionalFlowStep(node1, node2) and model = "Config"
}

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
import DataFlowInternal::DefaultState<Config>
import Config

predicate isAdditionalFlowStep(

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
import AddTaintDefaults<Config0>
}

predicate isBarrierOut(DataFlowLang::Node node, FlowState state) {

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
import Config

predicate isAdditionalFlowStep(
DataFlowLang::Node node1, DataFlowLang::Node node2, string model

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
import AddTaintDefaults<Config0>
}

private module Stage1 = DataFlowInternalStage1::ImplStage1<C>;

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch 2 times, most recently from 14b7523 to c00e150 Compare September 19, 2025 15:15
@alexet alexet changed the title Dataflow: Diff informed dataflow. Dataflow: Overlay informed dataflow. Sep 19, 2025
@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch from c00e150 to 1f99e98 Compare September 19, 2025 15:27
@alexet alexet requested a review from Copilot September 19, 2025 15:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements overlay-informed dataflow analysis capabilities for CodeQL. The changes introduce support for evaluating dataflow in overlay mode, where base and overlay database results can be merged, providing incremental analysis capabilities.

Key changes:

  • Added overlay-informed filtering logic that complements the existing diff-informed mode
  • Introduced new predicate variants for determining overlay evaluation context
  • Created overlay-aware modules that merge base and overlay results for dataflow and taint tracking

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
shared/util/codeql/util/AlertFiltering.qll Added predicates to check diff information availability and file inclusion in diff ranges
shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll Implemented overlay-informed filtering logic for sources and sinks with dual evaluation modes
shared/dataflow/codeql/dataflow/internal/DataFlowImpl.qll Added OverlayImpl module that merges base and overlay dataflow results
shared/dataflow/codeql/dataflow/TaintTracking.qll Refactored modules and added overlay-aware taint tracking implementations
shared/dataflow/codeql/dataflow/DataFlow.qll Split into core and overlay-specific modules with new overlay evaluation support
java/ql/lib/semmle/code/java/dataflow/internal/DataFlowImplSpecific.qll Added Java-specific overlay evaluation predicate implementation
Comments suppressed due to low confidence (1)

shared/dataflow/codeql/dataflow/TaintTracking.qll:1

  • The condition should use Config::observeOverlayInformedIncrementalMode() without negation to match the logic in the flow predicate above. Currently this will include local results when overlay mode is disabled, which contradicts the expected behavior.
/**

@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch from 1f99e98 to 5a11aac Compare September 19, 2025 15:53
@alexet
Copy link
Author

alexet commented Sep 19, 2025

@aschackmull Can you have a quick look over this to check that the general codee organisation is reasonable. This is the least duplication I could come up with.

@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch from 4e67b8f to 14f0860 Compare September 25, 2025 16:02
*
* This is a local predicate that only has results local to the overlay/base database.
*/
predicate flowLocal(Node source, Node sink) = forceLocal(Base::flow/2)(source, sink)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
predicate flowLocal(Node source, Node sink) = forceLocal(Base::flow/2)(source, sink)
private predicate flowLocal(Node source, Node sink) = forceLocal(Base::flow/2)(source, sink)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were purposefully exported so they could be used in places like usedAsRegex without needing a new force local. (the force local could have been one predicate back but I was careful).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made both local versions private - I only meant for the binary version to be private. I think it makes sense to expose the unary flowToLocal, as that has a reasonable local semantics, whereas a the binary predicate really only makes sense as a global predicate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I understand.

C++ has (had? it's been a while since I looked at that code) some cases where I think this would potentially be useful as they have Local dataflow done using global dataflow (they put a barrier at all calls, because they select source sink pairs using things like dominace, but use global dataflow because they want field flow). However I think that you are correct in that it is likely a trap and we can easily change it if we do find it useful.

@aschackmull
Copy link
Contributor

Can you have a quick look over this to check that the general codee organisation is reasonable. This is the least duplication I could come up with.

Yeah, looks reasonable for now. I guess the plan is that the duplication can go away once all languages have an overlay-local Node?

@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch from 2a77ffa to 0f4474b Compare October 3, 2025 19:40
@alexet alexet marked this pull request as ready for review October 3, 2025 19:45
@alexet alexet requested a review from a team as a code owner October 3, 2025 19:45
@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch 2 times, most recently from 2862860 to d9de670 Compare October 6, 2025 19:06
@alexet alexet force-pushed the alexet/overlay-informed-dataflow branch from d9de670 to 193cd46 Compare October 7, 2025 16:52
@alexet alexet added the no-change-note-required This PR does not need a change note label Oct 7, 2025
@alexet alexet merged commit 825d370 into main Oct 7, 2025
42 of 44 checks passed
@alexet alexet deleted the alexet/overlay-informed-dataflow branch October 7, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DataFlow Library Java no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants