-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Dataflow: Overlay informed dataflow. #20386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ebfdd4 to
04e8ab6
Compare
04e8ab6 to
5ee86e9
Compare
| private module Stage1 = ImplStage1<C>; | ||
|
|
||
| import Stage1::PartialFlow | ||
|
|
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
| } | ||
|
|
||
| predicate observeOverlayInformedIncrementalMode() { not Config::observeDiffInformedIncrementalMode() } | ||
| } |
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
| import MakeImplStage1<Location, DataFlowLang> as DataFlowInternalStage1 | ||
|
|
||
| private module AddTaintDefaults<DataFlowInternal::FullStateConfigSig Config> implements | ||
| module AddTaintDefaults<DataFlowInternal::FullStateConfigSig Config> implements |
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
|
|
||
| signature int speculationLimitSig(); | ||
|
|
||
| module AddSpeculativeTaintSteps< |
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
| import DataFlowInternal::DefaultState<Config> | ||
| import Config | ||
|
|
||
| private predicate relevantState(Config::FlowState state) { |
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
| ) { | ||
| Config::isAdditionalFlowStep(node1, node2) and model = "Config" | ||
| } | ||
|
|
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
| import DataFlowInternal::DefaultState<Config> | ||
| import Config | ||
|
|
||
| predicate isAdditionalFlowStep( |
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
| import AddTaintDefaults<Config0> | ||
| } | ||
|
|
||
| predicate isBarrierOut(DataFlowLang::Node node, FlowState state) { |
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
| import Config | ||
|
|
||
| predicate isAdditionalFlowStep( | ||
| DataFlowLang::Node node1, DataFlowLang::Node node2, string model |
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
| import AddTaintDefaults<Config0> | ||
| } | ||
|
|
||
| private module Stage1 = DataFlowInternalStage1::ImplStage1<C>; |
Check warning
Code scanning / CodeQL
Data flow configuration module naming Warning
14b7523 to
c00e150
Compare
c00e150 to
1f99e98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements overlay-informed dataflow analysis capabilities for CodeQL. The changes introduce support for evaluating dataflow in overlay mode, where base and overlay database results can be merged, providing incremental analysis capabilities.
Key changes:
- Added overlay-informed filtering logic that complements the existing diff-informed mode
- Introduced new predicate variants for determining overlay evaluation context
- Created overlay-aware modules that merge base and overlay results for dataflow and taint tracking
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| shared/util/codeql/util/AlertFiltering.qll | Added predicates to check diff information availability and file inclusion in diff ranges |
| shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll | Implemented overlay-informed filtering logic for sources and sinks with dual evaluation modes |
| shared/dataflow/codeql/dataflow/internal/DataFlowImpl.qll | Added OverlayImpl module that merges base and overlay dataflow results |
| shared/dataflow/codeql/dataflow/TaintTracking.qll | Refactored modules and added overlay-aware taint tracking implementations |
| shared/dataflow/codeql/dataflow/DataFlow.qll | Split into core and overlay-specific modules with new overlay evaluation support |
| java/ql/lib/semmle/code/java/dataflow/internal/DataFlowImplSpecific.qll | Added Java-specific overlay evaluation predicate implementation |
Comments suppressed due to low confidence (1)
shared/dataflow/codeql/dataflow/TaintTracking.qll:1
- The condition should use
Config::observeOverlayInformedIncrementalMode()without negation to match the logic in theflowpredicate above. Currently this will include local results when overlay mode is disabled, which contradicts the expected behavior.
/**
1f99e98 to
5a11aac
Compare
|
@aschackmull Can you have a quick look over this to check that the general codee organisation is reasonable. This is the least duplication I could come up with. |
4e67b8f to
14f0860
Compare
| * | ||
| * This is a local predicate that only has results local to the overlay/base database. | ||
| */ | ||
| predicate flowLocal(Node source, Node sink) = forceLocal(Base::flow/2)(source, sink) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| predicate flowLocal(Node source, Node sink) = forceLocal(Base::flow/2)(source, sink) | |
| private predicate flowLocal(Node source, Node sink) = forceLocal(Base::flow/2)(source, sink) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were purposefully exported so they could be used in places like usedAsRegex without needing a new force local. (the force local could have been one predicate back but I was careful).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You made both local versions private - I only meant for the binary version to be private. I think it makes sense to expose the unary flowToLocal, as that has a reasonable local semantics, whereas a the binary predicate really only makes sense as a global predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I understand.
C++ has (had? it's been a while since I looked at that code) some cases where I think this would potentially be useful as they have Local dataflow done using global dataflow (they put a barrier at all calls, because they select source sink pairs using things like dominace, but use global dataflow because they want field flow). However I think that you are correct in that it is likely a trap and we can easily change it if we do find it useful.
shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll
Outdated
Show resolved
Hide resolved
shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll
Outdated
Show resolved
Hide resolved
shared/dataflow/codeql/dataflow/internal/DataFlowImplStage1.qll
Outdated
Show resolved
Hide resolved
Yeah, looks reasonable for now. I guess the plan is that the duplication can go away once all languages have an overlay-local |
2a77ffa to
0f4474b
Compare
2862860 to
d9de670
Compare
d9de670 to
193cd46
Compare
This adds overlay informed Dataflow.
The idea is that for non-diff-informed configurations we use the diff informed mechanism to filter the sources and sinks to the overlay and then merge those results with the base results. This can lead to FPs/FNs due to flow paths via the overlay but the accuracy seems well within what we are targetting for overlay mode and saves lots of time.
Unfortunately the merging code cannot instantiated with CPP analysis as it uses
EquivalenceRelationinternally. This means we need separate parametrised modules (as we can't just guard with a predicate).The local results are also exposed for cases where the pure local results are desired (e.g. for regexes where we are only going to care about analysing new regexes in the overlay).