Resolving merge conflicts from origin/main#21525
Resolving merge conflicts from origin/main#21525chanel-y wants to merge 1454 commits intogithub:mainfrom
Conversation
… the binary folder.
…nstruction references. These are needed for CIL translation.
…riables in the IR.
…ad without a prior def.
…temp variables in the IR.
…-for-time-conversion-function C++: Speed up `UncheckedReturnValueForTimeFunctions.ql`
PR #333 synced upstream (github/codeql) to commit 5a65282, which is 216 commits past the codeql-cli/v2.24.3 tag (7d30e3c). This commit reverts the changes from those extra upstream commits so that our fork is synced exactly to the codeql-cli/v2.24.3 tag. All microsoft/codeql fork-specific commits are preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r-linux fix powershell extractor on Linux
Revert upstream commits synced past codeql-cli/v2.24.3
|
Closing, opened on upstream but meant to on fork |
There was a problem hiding this comment.
Pull request overview
Resolves merge conflicts by taking upstream changes, resulting in a refreshed binary CodeQL library/extractor surface (IR + SSA/dataflow + staged instruction translation), updated packaging metadata, and CI/automation updates.
Changes:
- Syncs/introduces binary IR + SSA/dataflow internals and staged instruction translation layers.
- Adds/updates binary extractors (x86, CIL, JVM) and build/packaging scripts/config.
- Updates repository automation (workflows) and dependency versions (Bazel module deps), plus minor docs/meta tweaks.
Reviewed changes
Copilot reviewed 76 out of 5369 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| binary/ql/lib/semmle/code/binary/dataflow/internal/SsaImpl.qll | Adds SSA implementation plumbing for binary IR dataflow integration. |
| binary/ql/lib/semmle/code/binary/dataflow/internal/Node.qll | Introduces internal node wrappers (instruction/operand/SSA) for dataflow. |
| binary/ql/lib/semmle/code/binary/dataflow/internal/DataFlowImpl.qll | Implements binary dataflow InputSig wiring and local/SSA flow steps. |
| binary/ql/lib/semmle/code/binary/dataflow/internal/Content.qll | Adds Content/ContentSet abstractions for binary dataflow. |
| binary/ql/lib/semmle/code/binary/dataflow/Ssa.qll | Exposes public SSA API for binary IR definitions/phis/writes. |
| binary/ql/lib/semmle/code/binary/dataflow/DataFlow.qll | Exposes public binary DataFlow API backed by internal impl. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Tags.qll | Defines local-variable tagging for multiple binary formats. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Opcode.qll | Defines IR opcode taxonomy and string conversions. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/InstructionSig.qll | Defines shared instruction/CFG signature for staged IR. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction2/Instruction2.qll | Adds stage-2 instruction transformation layer and CFG/SSA helpers. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction2/Consistency.ql | Adds consistency checks for Instruction2 stage. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction1/Instruction1.qll | Adds stage-1 instruction transformation layer and CFG/SSA helpers. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction1/Consistency.ql | Adds consistency checks for Instruction1 stage. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Variable.qll | Defines IR variables for stage-0 translation. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Type.qll | Defines IR type representation for stage-0 translation. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/TranslatedType.qll | Adds translated type elements for multiple formats. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/TranslatedFunction.qll | Adds translated function elements and ordering semantics. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/TranslatedElement.qll | Adds base translated-element model and translation selection. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/TempVariableTag.qll | Defines temp-variable tags used by staged translations. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Operand.qll | Defines operand entities for stage-0 translation. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/InstructionTag.qll | Defines instruction/operand tag sets for stage-0 translation. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Instruction0.qll | Wires Instruction0 module to the shared InstructionSig. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Instruction.qll | Defines stage-0 IR instruction entities and subclasses. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Function.qll | Defines stage-0 IR function entities. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/Consistency.ql | Adds consistency queries for stage-0 translation surfaces. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Instruction0/BasicBlock.qll | Adds CFG/basic-block modeling and dominance utilities. |
| binary/ql/lib/semmle/code/binary/ast/ir/internal/Consistency.qll | Adds staged consistency input module used across stages. |
| binary/ql/lib/semmle/code/binary/ast/ir/PrintIR.ql | Adds graph query to visualize emitted IR. |
| binary/ql/lib/semmle/code/binary/ast/ir/IR.qll | Exposes final IR API backed by staged transformations + SSA. |
| binary/ql/lib/semmle/code/binary/ast/instructions.qll | Adds high-level AST instruction/element wrappers. |
| binary/ql/lib/semmle/code/binary/ast/Sections.qll | Adds section modeling + byte access helpers. |
| binary/ql/lib/semmle/code/binary/ast/Location.qll | Adds Location/EmptyLocation modeling for binary elements. |
| binary/ql/lib/semmle/code/binary/ast/Headers.qll | Adds optional header modeling (image base, entrypoint). |
| binary/ql/lib/semmle/code/binary/ast/File.qll | Adds filesystem container/file abstractions for locations. |
| binary/ql/lib/semmle/code/binary/Callable.qll | Adds cross-format callable/callsite abstraction layer. |
| binary/ql/lib/qlpack.yml | Defines the binary CodeQL library pack metadata. |
| binary/ql/lib/codeql-pack.lock.yml | Adds CodeQL pack lockfile for the binary lib pack. |
| binary/ql/lib/binary.qll | Adds umbrella import module for binary library surfaces. |
| binary/extractor/x86/src/main.cpp | Adds/updates x86 extractor implementation emitting TRAP. |
| binary/extractor/x86/codeql-extractor.yml | Adds extractor manifest for x86/binary. |
| binary/extractor/jvm/tools/index-files.sh | Adds JVM extractor index-files helper script. |
| binary/extractor/jvm/tools/autobuild.sh | Adds JVM extractor autobuild helper script. |
| binary/extractor/jvm/semmlecode.binary.dbscheme.stats | Adds JVM dbscheme stats file. |
| binary/extractor/jvm/codeql-extractor.yml | Adds JVM extractor manifest. |
| binary/extractor/jvm/Semmle.Extraction.Java.ByteCode/Trap/TrapWriter.cs | Adds JVM TRAP writer implementation. |
| binary/extractor/jvm/Semmle.Extraction.Java.ByteCode/Semmle.Extraction.Java.ByteCode.csproj | Adds JVM extractor project file + dependency. |
| binary/extractor/jvm/Semmle.Extraction.Java.ByteCode/Program.cs | Adds JVM extractor program entrypoint. |
| binary/extractor/cil/tools/index-files.sh | Adds CIL extractor index-files helper script. |
| binary/extractor/cil/tools/autobuild.sh | Adds CIL extractor autobuild helper script. |
| binary/extractor/cil/extractor.sln | Adds CIL extractor solution file. |
| binary/extractor/cil/codeql-extractor.yml | Adds CIL extractor manifest. |
| binary/extractor/cil/Semmle.Extraction.CSharp.IL/Trap/TrapWriter.cs | Adds CIL TRAP writer implementation. |
| binary/extractor/cil/Semmle.Extraction.CSharp.IL/Semmle.Extraction.CSharp.IL.csproj | Adds CIL extractor project + dependency. |
| binary/extractor/cil/Semmle.Extraction.CSharp.IL/Program.cs | Adds CIL extractor program entrypoint. |
| binary/extractor/cil/Semmle.Extraction.CSharp.IL/ILExtractor.cs | Adds CIL IL extraction logic emitting TRAP tuples. |
| binary/extractor/.gitignore | Adds extractor build dependency/artifact ignore patterns. |
| binary/clean.ps1 | Adds clean script to remove local build artifacts. |
| binary/build-win64.ps1 | Adds Windows build script for x86 and CIL extractors. |
| binary/build-macos.sh | Adds macOS build script (CIL supported; x86 placeholder). |
| binary/.gitignore | Adds ignore patterns for binary repo build/test artifacts. |
| SECURITY.md | Adds/updates repository security policy. |
| README.md | Minor formatting tweak. |
| MODULE.bazel | Updates Bazel module dependencies/versions. |
| .gitmodules | Adds iac submodule reference. |
| .github/workflows/sync-main.yml | Adds automation to sync main with upstream tags/branch. |
| .github/workflows/sync-main-tags.yml | Adds automation to sync tags after sync PR merge. |
| .github/workflows/powershell-pr-check.yml | Adds PowerShell PR CI check workflow. |
| .github/workflows/microsoft-codeql-pack-publish.yml | Adds workflow for publishing Microsoft CodeQL packs. |
| .gitattributes | Forces CRLF for specific upgrade scripts/downgrades. |
| if (ZYAN_FAILED(ZydisDecoderInit(&decoder, ZYDIS_MACHINE_MODE_LONG_64, ZYDIS_STACK_WIDTH_64))) | ||
| { | ||
| throw std::exception("Failed to initialize Zydis decoder"); | ||
| } |
There was a problem hiding this comment.
std::exception doesn’t have a standard constructor that takes a message; this will fail to compile on typical standard libraries. Replace with an exception type that accepts a message (for example std::runtime_error) and include the appropriate header (e.g., <stdexcept>).
| if (ZYAN_FAILED(ZydisFormatterInit(&formatter, ZYDIS_FORMATTER_STYLE_INTEL))) | ||
| { | ||
| throw std::exception("Failed to initialize Zydis formatter"); | ||
| } |
There was a problem hiding this comment.
std::exception doesn’t have a standard constructor that takes a message; this will fail to compile on typical standard libraries. Replace with an exception type that accepts a message (for example std::runtime_error) and include the appropriate header (e.g., <stdexcept>).
| Archiver() : archive_dir(getenv("CODEQL_EXTRACTOR_BINARY_SOURCE_ARCHIVE_DIR")) | ||
| { | ||
| } |
There was a problem hiding this comment.
getenv(...) can return nullptr. Constructing a std::filesystem::path from a null C-string is undefined behavior and may crash. Read the env vars into const char*, validate non-null (and optionally non-empty), and produce a clear error/early exit if missing.
| extract_exports(writer, *export_table); | ||
| } | ||
|
|
||
| std::filesystem::path trap_dir(getenv("CODEQL_EXTRACTOR_BINARY_TRAP_DIR")); |
There was a problem hiding this comment.
getenv(...) can return nullptr. Constructing a std::filesystem::path from a null C-string is undefined behavior and may crash. Read the env vars into const char*, validate non-null (and optionally non-empty), and produce a clear error/early exit if missing.
| std::filesystem::path trap_dir(getenv("CODEQL_EXTRACTOR_BINARY_TRAP_DIR")); | |
| const char* trap_dir_env = std::getenv("CODEQL_EXTRACTOR_BINARY_TRAP_DIR"); | |
| if (!trap_dir_env || !*trap_dir_env) | |
| { | |
| std::cerr << "Environment variable CODEQL_EXTRACTOR_BINARY_TRAP_DIR is not set or is empty\n"; | |
| return; | |
| } | |
| std::filesystem::path trap_dir(trap_dir_env); |
| for (size_t offset = 0; offset < code.size(); ++offset) | ||
| { | ||
| std::vector<Entry> entries; | ||
| ZydisDecodedInstruction instr; | ||
| ZydisDecodedOperand operands[ZYDIS_MAX_OPERAND_COUNT]; | ||
| char buffer[256]; | ||
| if (decoder.decode(code, offset, &instr, operands)) | ||
| { |
There was a problem hiding this comment.
The disassembly loop advances offset by 1 even when decoding succeeds, which will attempt decoding at every byte position and can emit a very large number of (mostly unintended) instructions/operands. Typically, on successful decode you should advance by instr.length (and only advance by 1 on decode failure) to perform a linear sweep.
|
|
||
| // Extract fields | ||
| foreach (var field in type.Fields) { | ||
| var fieldId = trap.GetId(); |
There was a problem hiding this comment.
fieldId is assigned but never used; ExtractField generates its own ID. Remove the unused local (or pass/use it inside ExtractField) to avoid misleading readers and unnecessary ID consumption.
| var fieldId = trap.GetId(); |
| var outputPath = Path.Combine( | ||
| trapDir, Path.GetFileNameWithoutExtension(dllPath) + ".trap"); |
There was a problem hiding this comment.
Using only the base filename for TRAP output can cause collisions/overwrites when extracting multiple assemblies with the same name from different directories. Consider deriving the output path from a relative path (sanitized) or including a stable hash of the full path.
| */ | ||
| private predicate shouldTranslateCilParameter(Raw::CilParameter p) { any() } | ||
|
|
||
| private predicate shouldTranslatedCilType(Raw::CilType t) { any() } |
There was a problem hiding this comment.
Identifier typo: shouldTranslatedCilType reads like a grammatical mistake and is inconsistent with the surrounding naming pattern (shouldTranslateX86Instr, shouldTranslateCilInstr, etc.). Renaming to shouldTranslateCilType (and updating its use) would improve clarity and avoid propagating the typo to other modules.
| TTranslatedCilStoreIndirect(Raw::CilStoreIndirectInstruction stind) { | ||
| shouldTranslateCilInstr(stind) | ||
| } or | ||
| TTranslatedCilType(Raw::CilType type) { shouldTranslatedCilType(type) } or |
There was a problem hiding this comment.
Identifier typo: shouldTranslatedCilType reads like a grammatical mistake and is inconsistent with the surrounding naming pattern (shouldTranslateX86Instr, shouldTranslateCilInstr, etc.). Renaming to shouldTranslateCilType (and updating its use) would improve clarity and avoid propagating the typo to other modules.
| rm *.obj | ||
| rm *.exe No newline at end of file |
There was a problem hiding this comment.
rm will error if no files match (depending on shell settings), which can make clean steps flaky. Prefer Remove-Item -ErrorAction SilentlyContinue (and consider -Force) to make cleaning idempotent.
| rm *.obj | |
| rm *.exe | |
| Remove-Item *.obj -ErrorAction SilentlyContinue -Force | |
| Remove-Item *.exe -ErrorAction SilentlyContinue -Force |
Accepted upstream changes for everything
General summary: