Skip to content

ISSUE #3028 - Eliminate Double Iteration on Entity Filtering and Push Regex Filters to API#26709

Open
TeddyCr wants to merge 8 commits intoopen-metadata:mainfrom
TeddyCr:ISSUE-3028
Open

ISSUE #3028 - Eliminate Double Iteration on Entity Filtering and Push Regex Filters to API#26709
TeddyCr wants to merge 8 commits intoopen-metadata:mainfrom
TeddyCr:ISSUE-3028

Conversation

@TeddyCr
Copy link
Collaborator

@TeddyCr TeddyCr commented Mar 24, 2026

Describe your changes:

Summary

  • Push database/schema/table filter patterns from the profiler ingestion client to the backend API as regex query parameters, reducing data transferred over the wire
  • Add new databaseRegex, databaseSchemaRegex, and tableRegex query parameters to the Database, DatabaseSchema, and Table list endpoints with regexMode (include/exclude) and regexFilterByFqn support
  • Implement getFqnRegexCondition in ListFilter for MySQL (REGEXP) and PostgreSQL (~) regex matching on JSON-extracted fields
  • Handle conflicting filter modes (e.g., schema include + table exclude) by pushing the include to the backend and deferring the exclude to client-side filtering

Changes by file

Backend (Java)
File Changes
openmetadata-spec/.../type/regexMode.json (new) New JSON Schema enum defining RegexMode with values include and exclude, used across all regex-filtered list endpoints
openmetadata-service/.../jdbi3/ListFilter.java Extended getDatabaseCondition and getDatabaseSchemaCondition to support regex params alongside existing exact filters. Added getTableNameRegexCondition for table name regex. Added core getFqnRegexCondition method that builds database-specific regex SQL — JSON_UNQUOTE(JSON_EXTRACT(...)) REGEXP for MySQL, json #>> '...' ~ for PostgreSQL. Supports include/exclude via RegexMode and FQN-based matching via regexFilterByFqn
openmetadata-service/.../databases/DatabaseResource.java Added databaseRegex, regexFilterByFqn, and regexMode query parameters to the list endpoint. Maps databaseRegex to ListFilter with field hint databaseRegexField=name
openmetadata-service/.../databases/DatabaseSchemaResource.java Added databaseSchemaRegex, regexFilterByFqn, and regexMode query parameters to the list endpoint. Maps databaseSchemaRegex to ListFilter with field hint databaseSchemaRegexField=name
openmetadata-service/.../databases/TableResource.java Added databaseSchemaRegex, tableRegex regexFilterByFqn, and regexMode query parameters to the list endpoint. Maps databaseSchemaRegex with field hint databaseSchema.name and tableRegex as tableParamRegex with field hint name
Ingestion (Python)
File Changes
ingestion/.../fetcher/fetcher_strategy.py Added RegexFilter model, _combine_patterns, _build_regex_from_filter, _build_database_params, _build_table_params, _has_conflicting_filter_modes, _filter_deferred_excludes. Refactored _get_database_entities and _get_table_entities from eager list + client-side filtering to generators that delegate filtering to the backend via regex params. Removed _filter_databases, _filter_schemas, _filter_tables, _filter_entities, _filter_column_metrics_computation — replaced by server-side filtering
Tests
File Changes
ingestion/tests/.../test_entity_fetcher.py Added TestGetDatabaseEntities (6 tests): include/exclude regex param building, multiple includes combined with OR, FQN filtering, no-filter passthrough, zero-result error. Added TestGetTableEntities (11 tests): schema+table regex, view filtering, classification include/exclude, combined filters, FQN filtering, conflicting modes. Added TestFetch (3 tests): end-to-end pipeline, client-side filters, per-database error handling
ingestion/tests/.../test_workflow.py Replaced test_filter_entities with test_build_regex_from_filter, test_build_database_params, test_build_table_params. Refactored test_filter_classifications to test filter_classifications directly
.../it/tests/DatabaseResourceIT.java Added 4 tests: regex include, no-match, regex-only without service filter, exclude mode
.../it/tests/DatabaseSchemaResourceIT.java Added 4 tests: regex include, no-match, regex-only without database filter, exclude mode
.../it/tests/TableResourceIT.java Added 8 tests: schema regex, table regex, no-match, combined schema+table regex, regex-only without exact filter, exclude mode for table and schema

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Improvement

  • I have added tests around the new logic.
  • For connector/ingestion changes: I updated the documentation.

@TeddyCr TeddyCr requested a review from a team as a code owner March 24, 2026 01:50
Copilot AI review requested due to automatic review settings March 24, 2026 01:50
@github-actions github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 24, 2026
@github-actions
Copy link
Contributor

⚠️ TypeScript Types Need Update

The generated TypeScript types are out of sync with the JSON schema changes.

Since this is a pull request from a forked repository, the types cannot be automatically committed.
Please generate and commit the types manually:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true
git add src/generated/
git commit -m "Update generated TypeScript types"
git push

After pushing the changes, this check will pass automatically.

Comment on lines +208 to +209
schema_filter = _build_regex_from_filter(self.source_config.schemaFilterPattern)
table_filter = _build_regex_from_filter(self.source_config.tableFilterPattern)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Performance: Redundant _build_regex_from_filter calls in hot path

_build_regex_from_filter is called multiple times with the same filter patterns: once in _build_table_params, once in _has_conflicting_filter_modes, and once in _filter_deferred_excludes (which runs per-table). These could be computed once and cached on the instance during __init__ to avoid rebuilding RegexFilter objects on every table iteration.

Suggested fix:

# In DatabaseFetcherStrategy.__init__, compute once:
def __init__(self, ...):
    super().__init__(...)
    self.source_config = cast(EntityFilterConfigInterface, self.source_config)
    self._schema_filter = _build_regex_from_filter(self.source_config.schemaFilterPattern)
    self._table_filter = _build_regex_from_filter(self.source_config.tableFilterPattern)
    self._conflicting_modes = (
        self._schema_filter is not None
        and self._table_filter is not None
        and self._schema_filter.mode != self._table_filter.mode
    )
# Then reference self._schema_filter, self._table_filter, self._conflicting_modes

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link

gitar-bot bot commented Mar 24, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Optimizes entity filtering by pushing regex filters to the API and eliminating double iteration in the ingestion pipeline. Consider reducing redundant _build_regex_from_filter calls in the hot path by caching results across _build_table_params, _has_conflicting_filter_modes, and _filter_deferred_excludes.

💡 Performance: Redundant _build_regex_from_filter calls in hot path

📄 ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py:208-209 📄 ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py:233-235 📄 ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py:245-246

_build_regex_from_filter is called multiple times with the same filter patterns: once in _build_table_params, once in _has_conflicting_filter_modes, and once in _filter_deferred_excludes (which runs per-table). These could be computed once and cached on the instance during __init__ to avoid rebuilding RegexFilter objects on every table iteration.

Suggested fix
# In DatabaseFetcherStrategy.__init__, compute once:
def __init__(self, ...):
    super().__init__(...)
    self.source_config = cast(EntityFilterConfigInterface, self.source_config)
    self._schema_filter = _build_regex_from_filter(self.source_config.schemaFilterPattern)
    self._table_filter = _build_regex_from_filter(self.source_config.tableFilterPattern)
    self._conflicting_modes = (
        self._schema_filter is not None
        and self._table_filter is not None
        and self._schema_filter.mode != self._table_filter.mode
    )
# Then reference self._schema_filter, self._table_filter, self._conflicting_modes
🤖 Prompt for agents
Code Review: Optimizes entity filtering by pushing regex filters to the API and eliminating double iteration in the ingestion pipeline. Consider reducing redundant _build_regex_from_filter calls in the hot path by caching results across _build_table_params, _has_conflicting_filter_modes, and _filter_deferred_excludes.

1. 💡 Performance: Redundant _build_regex_from_filter calls in hot path
   Files: ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py:208-209, ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py:233-235, ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py:245-246

   `_build_regex_from_filter` is called multiple times with the same filter patterns: once in `_build_table_params`, once in `_has_conflicting_filter_modes`, and once in `_filter_deferred_excludes` (which runs per-table). These could be computed once and cached on the instance during `__init__` to avoid rebuilding `RegexFilter` objects on every table iteration.

   Suggested fix:
   # In DatabaseFetcherStrategy.__init__, compute once:
   def __init__(self, ...):
       super().__init__(...)
       self.source_config = cast(EntityFilterConfigInterface, self.source_config)
       self._schema_filter = _build_regex_from_filter(self.source_config.schemaFilterPattern)
       self._table_filter = _build_regex_from_filter(self.source_config.tableFilterPattern)
       self._conflicting_modes = (
           self._schema_filter is not None
           and self._table_filter is not None
           and self._schema_filter.mode != self._table_filter.mode
       )
   # Then reference self._schema_filter, self._table_filter, self._conflicting_modes

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes profiler ingestion entity listing by pushing database/schema/table filter patterns down to the backend list APIs as regex query parameters, reducing client-side filtering and network transfer.

Changes:

  • Added databaseRegex, databaseSchemaRegex, and tableRegex query params (plus regexMode and regexFilterByFqn) to Database/DatabaseSchema/Table list endpoints.
  • Implemented backend regex filtering in ListFilter for MySQL and PostgreSQL.
  • Updated ingestion fetcher strategy to translate FilterPattern config into API query params and added integration/unit tests for the new behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
openmetadata-spec/src/main/resources/json/schema/type/regexMode.json Adds RegexMode schema to support include/exclude regex behavior.
openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/DatabaseResource.java Adds databaseRegex, regexMode, regexFilterByFqn to DB list endpoint.
openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/DatabaseSchemaResource.java Adds databaseSchemaRegex, regexMode, regexFilterByFqn to schema list endpoint.
openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/TableResource.java Adds databaseSchemaRegex, tableRegex, regexMode, regexFilterByFqn to table list endpoint.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java Adds JSON-field regex condition generation for MySQL/Postgres and wires into list filtering.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DatabaseResourceIT.java Adds integration tests for databaseRegex include/exclude behavior.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DatabaseSchemaResourceIT.java Adds integration tests for databaseSchemaRegex include/exclude behavior.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TableResourceIT.java Adds integration tests for table listing with schema/table regex and exclude mode.
ingestion/src/metadata/profiler/source/fetcher/fetcher_strategy.py Translates ingestion filter patterns into server-side regex params; defers conflicting excludes to client-side.
ingestion/tests/unit/observability/profiler/test_workflow.py Adds unit tests for regex param building helpers in profiler workflow.
ingestion/tests/unit/observability/profiler/test_entity_fetcher.py Adds unit tests validating end-to-end param passing and deferred client-side filters.

Comment on lines +640 to +670
private String getFqnRegexCondition(String tableName, String regex, String paramName) {
String fieldPath = queryParams.get(paramName + "RegexField");
if (nullOrEmpty(fieldPath)) {
fieldPath = "name";
}
if (Boolean.parseBoolean(queryParams.get("regexFilterByFqn"))) {
int lastDot = fieldPath.lastIndexOf(".name");
if (lastDot == -1) {
fieldPath = "fullyQualifiedName";
} else {
fieldPath = fieldPath.substring(0, lastDot) + ".fullyQualifiedName";
}
}
boolean exclude = RegexMode.EXCLUDE.value().equalsIgnoreCase(queryParams.get("regexMode"));
queryParams.put(paramName + "Regex", regex);
if (Boolean.TRUE.equals(DatasourceConfig.getInstance().isMySQL())) {
String expr =
tableName == null
? String.format("JSON_UNQUOTE(JSON_EXTRACT(json, '$.%s'))", fieldPath)
: String.format("JSON_UNQUOTE(JSON_EXTRACT(%s.json, '$.%s'))", tableName, fieldPath);
String operator = exclude ? "NOT REGEXP" : "REGEXP";
return String.format("%s %s :%s", expr, operator, paramName + "Regex");
} else {
String pgPath = "{" + fieldPath.replace(".", ",") + "}";
String expr =
tableName == null
? String.format("json #>> '%s'", pgPath)
: String.format("%s.json #>> '%s'", tableName, pgPath);
String operator = exclude ? "!~" : "~";
return String.format("%s %s :%s", expr, operator, paramName + "Regex");
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL regex matching uses ~/!~, which is case-sensitive and also behaves like a search (matches anywhere) rather than Python’s re.match semantics used in ingestion filters (anchored at the start and IGNORECASE). If this is intended to preserve ingestion filter behavior when pushing patterns to the API, consider switching to case-insensitive operators (~*/!~*) and anchoring the pattern (e.g., prefixing with ^ when not already anchored) so server-side results match client-side filtering.

Copilot uses AI. Check for mistakes.
Comment on lines 250 to 265
public String getDatabaseCondition(String tableName) {
String database = queryParams.get("database");
return database == null ? "" : getFqnPrefixCondition(tableName, database, "database");
String databaseRegex = queryParams.get("databaseRegex");
if (nullOrEmpty(database) && nullOrEmpty(databaseRegex)) {
return "";
}
String hashCondition = "True";
String regexCondition = "True";
if (!nullOrEmpty(database)) {
hashCondition = getFqnPrefixCondition(tableName, database, "database");
}
if (!nullOrEmpty(databaseRegex)) {
regexCondition = getFqnRegexCondition(tableName, databaseRegex, "database");
}
return String.format("(%s AND %s)", hashCondition, regexCondition);
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashCondition/regexCondition use the literal string "True". Elsewhere this class uses "TRUE" (e.g., WHERE TRUE). For consistency and to avoid any dialect quirks, prefer "TRUE" here as well.

Copilot uses AI. Check for mistakes.
Comment on lines +145 to +146
"Filter schemas by regex pattern. For better performance use in combination with database query filter",
schema = @Schema(type = "string", example = "snowflakeWestCoast.financeDB.*"))
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The databaseSchemaRegex parameter’s example looks like a fully-qualified pattern (snowflakeWestCoast.financeDB.*), but by default this filter is applied to the schema name field unless regexFilterByFqn=true is also set. Consider adjusting the example and/or description to reflect the default behavior (name-only) and when FQN matching applies.

Suggested change
"Filter schemas by regex pattern. For better performance use in combination with database query filter",
schema = @Schema(type = "string", example = "snowflakeWestCoast.financeDB.*"))
"Filter schemas by regex pattern applied to the schema `name` by default. To apply the regex to the fullyQualifiedName instead, set `regexFilterByFqn=true`. For better performance use in combination with the `database` query filter.",
schema = @Schema(type = "string", example = "finance_.*"))

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +142
"Filter database by regex pattern. For better performance use in combination with service query filter",
schema = @Schema(type = "string", example = "snowflakeWestCoast.financeDB.*"))
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The databaseRegex parameter’s example looks like a fully-qualified pattern (snowflakeWestCoast.financeDB.*), but by default this filter is applied to the database name field unless regexFilterByFqn=true is also set. Consider adjusting the example and/or description to reflect the default behavior (name-only) and when FQN matching applies.

Suggested change
"Filter database by regex pattern. For better performance use in combination with service query filter",
schema = @Schema(type = "string", example = "snowflakeWestCoast.financeDB.*"))
"Filter databases by regex pattern. By default, the pattern is applied to the database name field "
+ "(for example, 'financeDB.*'). To filter by fullyQualifiedName instead, set 'regexFilterByFqn=true' "
+ "and use an FQN-style pattern (for example, 'snowflakeWestCoast.financeDB.*'). For better performance, "
+ "use in combination with the 'service' query filter.",
schema = @Schema(type = "string", example = "financeDB.*"))

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +211
"Filter tables by database schema regex pattern. For better performance use in combination with database query filter",
schema = @Schema(type = "string", example = "snowflakeWestCoast.financeDB.*"))
@QueryParam("databaseSchemaRegex")
String databaseSchemaParamRegex,
@Parameter(
description =
"Filter tables by table regex pattern. For better performance use in combination with database and/or databaseSchema query filter",
schema = @Schema(type = "string", example = "snowflakeWestCoast.financeDB.*"))
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both databaseSchemaRegex and tableRegex examples look like fully-qualified patterns (snowflakeWestCoast.financeDB.*), but by default databaseSchemaRegex matches databaseSchema.name and tableRegex matches name unless regexFilterByFqn=true is also set. Consider adjusting examples/descriptions so they match the default behavior and clarify how to use FQN matching.

Suggested change
"Filter tables by database schema regex pattern. For better performance use in combination with database query filter",
schema = @Schema(type = "string", example = "snowflakeWestCoast.financeDB.*"))
@QueryParam("databaseSchemaRegex")
String databaseSchemaParamRegex,
@Parameter(
description =
"Filter tables by table regex pattern. For better performance use in combination with database and/or databaseSchema query filter",
schema = @Schema(type = "string", example = "snowflakeWestCoast.financeDB.*"))
"Filter tables by database schema regex pattern applied to databaseSchema.name by default. "
+ "To apply the regex to the fully qualified name, set regexFilterByFqn=true. "
+ "For better performance, use this in combination with the database query filter.",
schema = @Schema(type = "string", example = "finance_schema_.*"))
@QueryParam("databaseSchemaRegex")
String databaseSchemaParamRegex,
@Parameter(
description =
"Filter tables by table regex pattern applied to the table name by default. "
+ "To apply the regex to the table fully qualified name, set regexFilterByFqn=true. "
+ "For better performance, use this in combination with the database and/or databaseSchema query filters.",
schema = @Schema(type = "string", example = "orders_.*"))

Copilot uses AI. Check for mistakes.
Comment on lines +275 to +276
filter.addQueryParam("tableParamRegex", tableParamRegex);
filter.addQueryParam("tableParamRegexField", "name");
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table regex is received as query param tableRegex, but is stored into the ListFilter under the key tableParamRegex/tableParamRegexField. This is inconsistent with databaseRegex/databaseSchemaRegex naming and makes tracing request params harder. Consider renaming the internal keys to tableRegex/tableRegexField and updating ListFilter.getTableNameRegexCondition() accordingly.

Suggested change
filter.addQueryParam("tableParamRegex", tableParamRegex);
filter.addQueryParam("tableParamRegexField", "name");
filter.addQueryParam("tableRegex", tableParamRegex);
filter.addQueryParam("tableRegexField", "name");

Copilot uses AI. Check for mistakes.
Comment on lines 196 to 199
raise ValueError(
"databaseFilterPattern returned 0 result. At least 1 database must be returned by the filter pattern."
f"\n\t- includes: {self.source_config.databaseFilterPattern.includes if self.source_config.databaseFilterPattern else None}" # pylint: disable=line-too-long
f"\n\t- excludes: {self.source_config.databaseFilterPattern.excludes if self.source_config.databaseFilterPattern else None}" # pylint: disable=line-too-long
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no databases are returned, this always raises an error saying "databaseFilterPattern returned 0 result" even when databaseFilterPattern is not configured (i.e., the service legitimately has no databases or the service filter is wrong). Consider emitting a different message when databaseFilterPattern is null vs when a configured pattern filtered everything out, so the error is actionable.

Suggested change
raise ValueError(
"databaseFilterPattern returned 0 result. At least 1 database must be returned by the filter pattern."
f"\n\t- includes: {self.source_config.databaseFilterPattern.includes if self.source_config.databaseFilterPattern else None}" # pylint: disable=line-too-long
f"\n\t- excludes: {self.source_config.databaseFilterPattern.excludes if self.source_config.databaseFilterPattern else None}" # pylint: disable=line-too-long
db_filter_pattern = self.source_config.databaseFilterPattern
if db_filter_pattern:
raise ValueError(
"databaseFilterPattern returned 0 result. At least 1 database must be returned by the filter pattern."
f"\n\t- includes: {db_filter_pattern.includes}" # pylint: disable=line-too-long
f"\n\t- excludes: {db_filter_pattern.excludes}" # pylint: disable=line-too-long
)
raise ValueError(
"No databases were returned for the configured service. "
"Either the service has no databases, or the service configuration is incorrect."

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (38)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.12.7 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.13.4 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.15.2 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (13)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6, 2.11.1
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.5 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.5 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.5 3.1.8
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Contributor

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpam-modules CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-modules-bin CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-runtime CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam0g CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (39)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.12.7 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.13.4 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.15.2 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.16.1 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
Authlib CVE-2026-27962 🔥 CRITICAL 1.6.6 1.6.9
Authlib CVE-2026-28490 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28498 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28802 🚨 HIGH 1.6.6 1.6.7
PyJWT CVE-2026-32597 🚨 HIGH 2.10.1 2.12.0
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6, 2.11.1
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.5 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.5 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.5 3.1.8
apache-airflow-providers-http CVE-2025-69219 🚨 HIGH 5.6.0 6.0.0
azure-core CVE-2026-21226 🚨 HIGH 1.37.0 1.38.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
google-cloud-aiplatform CVE-2026-2472 🚨 HIGH 1.130.0 1.131.0
google-cloud-aiplatform CVE-2026-2473 🚨 HIGH 1.130.0 1.133.0
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
pyasn1 CVE-2026-23490 🚨 HIGH 0.6.1 0.6.2
pyasn1 CVE-2026-30922 🚨 HIGH 0.6.1 0.6.3
python-multipart CVE-2026-24486 🚨 HIGH 0.0.20 0.0.22
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
tornado CVE-2026-31958 🚨 HIGH 6.5.3 6.5.5
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
stdlib CVE-2025-68121 🔥 CRITICAL v1.25.5 1.24.13, 1.25.7, 1.26.0-rc.3
stdlib CVE-2025-61726 🚨 HIGH v1.25.5 1.24.12, 1.25.6
stdlib CVE-2025-61728 🚨 HIGH v1.25.5 1.24.12, 1.25.6
stdlib CVE-2026-25679 🚨 HIGH v1.25.5 1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@sonarqubecloud
Copy link

@github-actions
Copy link
Contributor

🟡 Playwright Results — all passed (16 flaky)

✅ 3417 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 183 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 453 0 2 2
🟡 Shard 2 304 0 1 1
🟡 Shard 3 674 0 7 33
🟡 Shard 4 705 0 1 41
✅ Shard 5 672 0 0 73
🟡 Shard 6 609 0 5 33
🟡 16 flaky test(s) (passed on retry)
  • Features/DataAssetRulesDisabled.spec.ts › Verify the Database entity item action after rules disabled (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from welcome screen (shard 1, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Directory (shard 2, 1 retry)
  • Features/BulkImport.spec.ts › Database (shard 3, 1 retry)
  • Features/DataQuality/TestCaseIncidentPermissions.spec.ts › User with TEST_CASE.EDIT_ALL can see edit icon on incidents (shard 3, 1 retry)
  • Features/DataQuality/TestCaseResultPermissions.spec.ts › User with only VIEW cannot PATCH results (shard 3, 1 retry)
  • Features/EntitySummaryPanel.spec.ts › should display summary panel for databaseSchema (shard 3, 1 retry)
  • Features/EntitySummaryPanel.spec.ts › should display summary panel for dashboard (shard 3, 1 retry)
  • Features/Glossary/GlossaryP3Tests.spec.ts › should handle special characters in search (shard 3, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel.spec.ts › Should allow Data Consumer to edit tier for mlmodel (shard 6, 1 retry)
  • Pages/Glossary.spec.ts › Glossary & terms creation for reviewer as team (shard 6, 1 retry)
  • Pages/HyperlinkCustomProperty.spec.ts › should display URL when no display text is provided (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Import ODCS with SLA, modify SLA via UI, export and verify SLA changes (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants