Skip to content

Enhancement Search Index #26669

Open
mohityadav766 wants to merge 34 commits intomainfrom
reindex-logger
Open

Enhancement Search Index #26669
mohityadav766 wants to merge 34 commits intomainfrom
reindex-logger

Conversation

@mohityadav766
Copy link
Member

@mohityadav766 mohityadav766 commented Mar 21, 2026

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • App logging system:
    • New AppRunLogAppender and RunLogBuffer classes capture per-run app logs to JSON files with thread-name prefix matching and MDC-based filtering
    • OmAppJobListener integrated to start/stop log capture with cleanup on job veto or completion
  • Search index improvements:
    • SearchIndexApp.uninstall() now purges all related state tables and force-stops active distributed jobs
    • SearchIndexExecutor adds periodic sink stats synchronization and adjusts entity totals when success+failed exceed initial counts
    • DistributedIndexingStrategy and DistributedSearchIndexExecutor add proper shutdown handling and interrupt ordering for lock/heartbeat threads
    • ReindexingOrchestrator persists run record before WebSocket broadcast to ensure UI sees terminal status
  • Database layer:
    • Added deleteAll() methods to SearchIndexJobDAO, SearchIndexPartitionDAO, SearchIndexServerStatsDAO, and AppExtensionTimeSeriesDao

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings March 21, 2026 04:56
@github-actions github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Mar 21, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces infrastructure to capture and persist per-application-run logs (notably for reindex/SearchIndex runs) and improves SearchIndex stats accuracy/refresh behavior, backed by new unit tests.

Changes:

  • Add a Logback Appender + buffered file writer to capture app-run logs into logs/app-runs/{appName}/{timestamp}-{serverId}.log.
  • Wire log-capture lifecycle into Quartz job execution via OmAppJobListener (start/stop capture, MDC setup/cleanup, thread-prefix capture for reindex workers).
  • Improve SearchIndex stats consistency by adjusting totals when processed counts exceed initial totals and periodically syncing sink stats; add tests for these cases.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
openmetadata-service/src/main/java/org/openmetadata/service/apps/scheduler/OmAppJobListener.java Starts/stops app-run log capture around Quartz job execution and manages MDC/thread-prefix capture.
openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java New Logback appender that routes matching events to per-run buffers/files and performs retention cleanup.
openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/RunLogBuffer.java New buffered writer with periodic flushing and max-line cap for app-run log persistence.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexExecutor.java Periodic sink stat sync and total-records adjustments to keep stats internally consistent.
openmetadata-service/src/test/java/org/openmetadata/service/apps/logging/AppRunLogAppenderTest.java New tests covering capture behavior, cleanup, listing timestamps, and concurrency safety of app-run logging.
openmetadata-service/src/test/java/org/openmetadata/service/apps/logging/RunLogBufferTest.java New tests for buffer append/flush/close behavior and max-line handling.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexStatsTest.java New tests validating adjusted totals and reader/job totals consistency.

Comment on lines +67 to +72
for (ThreadPrefixBinding binding : threadPrefixBindings) {
if (threadName.startsWith(binding.prefix)) {
binding.buffer.append(formatLine(event));
return;
}
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thread-prefix capture path returns on the first matching ThreadPrefixBinding. If two captures are active with the same prefix (e.g., overlapping SearchIndex runs both using "reindex-"), all worker-thread logs will be routed to whichever binding was added first, and the other run will miss logs. Consider making prefixes unique per run or changing the logic to disambiguate (e.g., include run/job id in the prefix binding) rather than first-match wins.

Suggested change
for (ThreadPrefixBinding binding : threadPrefixBindings) {
if (threadName.startsWith(binding.prefix)) {
binding.buffer.append(formatLine(event));
return;
}
}
ThreadPrefixBinding bestMatch = null;
int bestPrefixLength = -1;
for (ThreadPrefixBinding binding : threadPrefixBindings) {
if (threadName.startsWith(binding.prefix)) {
int prefixLength = binding.prefix.length();
if (prefixLength > bestPrefixLength) {
bestMatch = binding;
bestPrefixLength = prefixLength;
}
}
}
if (bestMatch != null) {
bestMatch.buffer.append(formatLine(event));
}

Copilot uses AI. Check for mistakes.
Comment on lines +229 to +232
static void resetForTest() {
registered = false;
threadPrefixBindings.clear();
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resetForTest() only clears registered and threadPrefixBindings but leaves other static state mutated by tests (e.g., logDirectory, maxRunsPerApp, maxLinesPerRun) and does not remove the appender from the root logger if it was registered. This can leak state across test classes within the same JVM; consider resetting static defaults and detaching/stopping the root appender in resetForTest().

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +125
String appRunId = String.valueOf(runRecord.getTimestamp());
String serverId = ServerIdentityResolver.getInstance().getServerId();
MDC.put(AppRunLogAppender.MDC_APP_RUN_ID, appRunId);
MDC.put(AppRunLogAppender.MDC_APP_NAME, appName);
MDC.put(AppRunLogAppender.MDC_SERVER_ID, serverId);
MDC.put(AppRunLogAppender.MDC_APP_ID, jobApp.getId().toString());
String[] threadPrefixes = getThreadPrefixesForApp(appName);
AppRunLogAppender.startCapture(
appRunId, jobApp.getId().toString(), appName, serverId, threadPrefixes);
dataMap.put(APP_RUN_LOG_ID, appRunId);

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appRunLogId is stored in jobExecutionContext.getJobDetail().getJobDataMap(), and later read back from the same JobDetail map in cleanupLogCapture(). If Quartz ever allows overlapping executions of the same scheduled job (same JobDetail), the value can be overwritten by a newer run and the older run will stop the wrong capture (leaking the old buffer and truncating the new one). Store this per-execution (e.g., jobExecutionContext.put(...) / get(...) or jobExecutionContext.getMergedJobDataMap()), not on the shared JobDetail map.

Copilot uses AI. Check for mistakes.
@AfterEach
void tearDown() {
AppRunLogAppender.getActiveBuffers().clear();
AppRunLogAppender.resetForTest();
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setUp() mutates static state via AppRunLogAppender.setLogDirectoryForTest(...), but tearDown() does not restore it (and resetForTest() currently doesn’t reset logDirectory). This can leak configuration into other test classes in the same JVM. Restore the defaults in tearDown() (or enhance resetForTest() to reset logDirectory/limits and detach any root-logger appender registered by startCapture).

Suggested change
AppRunLogAppender.resetForTest();
AppRunLogAppender.resetForTest();
AppRunLogAppender.setLogDirectoryForTest(null);

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +93
String batchText = String.join("\n", batch);
writeToFile(batchText);
AppRunLogMetrics.recordFlush(appName, batch.size());
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush() calls AppRunLogMetrics.recordFlush(...), but there is no AppRunLogMetrics class in the codebase (repo-wide search only finds these references). This will fail compilation; either add the missing metrics implementation or remove/replace these calls with an existing metrics facility.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +72
void startFlusher() {
try {
Files.createDirectories(logFile.getParent());
writer =
Files.newBufferedWriter(
logFile,
StandardCharsets.UTF_8,
StandardOpenOption.CREATE,
StandardOpenOption.APPEND);
} catch (IOException e) {
LOG.error("Failed to open log file {}: {}", logFile, e.getMessage());
return;
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If startFlusher() fails to open the log file (IOException), it logs and returns, but the buffer remains usable and will keep accumulating pending lines (up to maxLines) without ever flushing. Consider returning a success/failure signal (or throwing) so the caller can stop capture/remove the buffer, or mark the buffer as closed/disabled on failure.

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +172
public static RunLogBuffer startCapture(
String appRunId, String appId, String appName, String serverId, String... threadPrefixes) {
ensureRegistered();
cleanupOldRuns(appName);
Path logFile = resolveLogFile(appName, Long.parseLong(appRunId), serverId);
RunLogBuffer buffer =
new RunLogBuffer(
appId, appName, serverId, Long.parseLong(appRunId), maxLinesPerRun, logFile);
activeBuffers.put(bufferKey(appName, appRunId), buffer);

for (String prefix : threadPrefixes) {
threadPrefixBindings.add(new ThreadPrefixBinding(prefix, buffer));
}

buffer.startFlusher();
AppRunLogMetrics.recordRunStarted(appName, serverId);
return buffer;
}

public static void stopCapture(String appName, String appRunId) {
RunLogBuffer buffer = activeBuffers.remove(bufferKey(appName, appRunId));
if (buffer != null) {
threadPrefixBindings.removeIf(b -> b.buffer == buffer);
long durationMs = System.currentTimeMillis() - buffer.getRunTimestamp();
AppRunLogMetrics.recordRunCompleted(appName, buffer.getServerId(), durationMs);
AppRunLogMetrics.recordLinesCapture(appName, buffer.getTotalLineCount());
buffer.close();
}
}

public static RunLogBuffer getBuffer(String appName, String runTimestamp) {
return activeBuffers.get(bufferKey(appName, runTimestamp));
}

static void cleanupOldRuns(String appName) {
List<Long> timestamps = listRunTimestamps(appName);
if (timestamps.size() <= maxRunsPerApp) {
return;
}
List<Long> toDelete = timestamps.subList(maxRunsPerApp, timestamps.size());
Path appDir = resolveAppDir(appName);
for (long ts : toDelete) {
try (DirectoryStream<Path> stream = Files.newDirectoryStream(appDir, ts + "-*.log")) {
for (Path entry : stream) {
Files.deleteIfExists(entry);
}
} catch (IOException e) {
// best-effort cleanup
}
}
AppRunLogMetrics.recordCleanup(appName, toDelete.size());
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startCapture/stopCapture/cleanupOldRuns invoke AppRunLogMetrics.*, but there is no AppRunLogMetrics class in the repository (only these references). This makes the module uncompilable; add the missing metrics class or remove/replace these calls.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 23, 2026 04:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines +135 to +137
buffer.startFlusher();
AppRunLogMetrics.recordRunStarted(appName, serverId);
return buffer;
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppRunLogAppender references AppRunLogMetrics (e.g., recordRunStarted) but there is no AppRunLogMetrics class in the codebase, so this will not compile. Either add the missing metrics implementation (and wiring) or remove these calls / replace with the existing metrics facility used elsewhere in the service.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +29
* <p>Configured via the {@code logging:} section in {@code openmetadata.yaml}. Uses two-tier
* matching: MDC for the scheduler thread, thread name prefixes for worker threads.
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class-level Javadoc says this appender is configured via openmetadata.yaml, but the implementation self-registers programmatically via ensureRegistered() and does not read Dropwizard logging config. Please update the Javadoc to reflect the actual configuration/registration mechanism (and how operators can disable/enable it).

Suggested change
* <p>Configured via the {@code logging:} section in {@code openmetadata.yaml}. Uses two-tier
* matching: MDC for the scheduler thread, thread name prefixes for worker threads.
* <p>This appender is registered programmatically and is not wired through Dropwizard's
* {@code logging:} section in {@code openmetadata.yaml}. Call {@link #ensureRegistered()} once
* during application startup to attach it to the appropriate {@link LoggerContext}. Operators
* who do not want per-run app logs can avoid invoking {@code ensureRegistered()} (or remove this
* appender from the {@link LoggerContext} at runtime).

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +92
void flush() {
List<String> batch = drainPending();
if (batch.isEmpty()) {
return;
}
String batchText = String.join("\n", batch);
writeToFile(batchText);
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush() drains pending even when the file writer is not available (e.g., if startFlusher() failed to open the log file or was never called). Since writeToFile() returns early when writer == null, this causes buffered log lines to be dropped silently. Consider short-circuiting flush() when writer is null (or lazily opening/retrying the writer) so pending lines are preserved until they can be written.

Copilot uses AI. Check for mistakes.
Comment on lines +1580 to +1581
if (readerStats != null && totalRecords > readerStats.getTotalRecords()) {
readerStats.setTotalRecords(totalRecords);
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readerStats.getTotalRecords() can be null (e.g., if reader stats were created elsewhere without initializing totalRecords). The comparison totalRecords > readerStats.getTotalRecords() will then throw an NPE. Please null-guard (treat null as 0) before comparing/updating the reader total.

Suggested change
if (readerStats != null && totalRecords > readerStats.getTotalRecords()) {
readerStats.setTotalRecords(totalRecords);
if (readerStats != null) {
Integer readerTotalRecords = readerStats.getTotalRecords();
int safeReaderTotalRecords = readerTotalRecords != null ? readerTotalRecords : 0;
if (totalRecords > safeReaderTotalRecords) {
readerStats.setTotalRecords(totalRecords);
}

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 23, 2026 09:39
Copilot AI review requested due to automatic review settings March 24, 2026 06:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Comment on lines +230 to +233
static void resetForTest() {
registered = false;
threadPrefixBindings.clear();
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resetForTest() clears flags but doesn't remove the registered APP_RUN_LOG appender from the Logback root logger. Since startCapture() calls ensureRegistered() (using LoggerFactory's global context), unit tests can leave a global appender attached for subsequent tests, causing cross-test interference. Consider having resetForTest() also detach/stop the root appender (if present) and restore any mutated static settings (e.g., logDirectory/maxRunsPerApp).

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +89
void flush() {
List<String> batch = drainPending();
if (batch.isEmpty()) {
return;
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush() can be invoked by the scheduled flusher thread while close() also calls flush() from the caller thread, but writer access isn't synchronized. This can lead to concurrent writes/flushes on the same BufferedWriter (and races with closeWriter()), potentially corrupting output or throwing IO errors. Consider guarding flush/writeToFile/closeWriter with a lock, or ensuring the final flush/close happens on the flusher thread after cancelling further executions.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

Comment on lines +90 to +93
for (String jobIdStr : runningJobIds) {
LOG.info("Stopping distributed job {} via coordinator fallback", jobIdStr);
coordinator.requestStop(java.util.UUID.fromString(jobIdStr));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Edge Case: Exception in loop aborts stopping remaining distributed jobs

In tryStopOutsideQuartz(), the loop over runningJobIds calls UUID.fromString(jobIdStr) and coordinator.requestStop() without per-iteration exception handling. If any single job ID is malformed or requestStop throws, the remaining jobs in the list will not be stopped.

While the outer tryStopViaApp catch block prevents an unhandled exception from propagating further, it means a single failure silently prevents cleanup of all subsequent running jobs.

Suggested fix:

for (String jobIdStr : runningJobIds) {
  try {
    LOG.info("Stopping distributed job {} via coordinator fallback", jobIdStr);
    coordinator.requestStop(java.util.UUID.fromString(jobIdStr));
  } catch (Exception e) {
    LOG.warn("Failed to stop distributed job {}", jobIdStr, e);
  }
}

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Comment on lines +70 to +90
static String formatLine(ILoggingEvent event) {
return String.format(
"{\"timestamp\":%d,\"level\":\"%s\",\"thread\":\"%s\",\"logger\":\"%s\",\"message\":\"%s\"}",
event.getTimeStamp(),
event.getLevel(),
escapeJson(event.getThreadName()),
escapeJson(event.getLoggerName()),
escapeJson(event.getFormattedMessage()));
}

private static String escapeJson(String value) {
if (value == null) {
return "";
}
return value
.replace("\\", "\\\\")
.replace("\"", "\\\"")
.replace("\n", "\\n")
.replace("\r", "\\r")
.replace("\t", "\\t");
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatLine()/escapeJson() builds JSON manually but only escapes a small subset of characters. This can produce invalid JSON for other control characters (e.g., backspace/formfeed or other 0x00–0x1F chars) and is easy to drift from the server’s canonical JSON log format. Consider using the existing JsonUtils/ObjectMapper (or a Logback JSON encoder/layout) to serialize a structured object/map so escaping and formatting remain correct and consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +108
AppRunRecord latestRun = appRepository.getLatestAppRuns(app);
if (latestRun != null && latestRun.getStatus() == AppRunRecord.Status.RUNNING) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: Null check on getLatestAppRuns is dead code — method throws

AppRepository.getLatestAppRuns(App) throws an exception (via orElseThrow) when no run record exists, rather than returning null. The latestRun != null guard on line 108 is therefore dead code — the absence case is instead handled by the catch (Exception e) block, which logs a warning.

This works correctly in practice since the catch block is resilient, but the null check is misleading to future readers who may assume the method can return null.

Suggested fix:

Either:
1. Use the Optional-returning variant (`getLatestExtensionByIdOptional`) instead so the null check is meaningful, or
2. Remove the null check and add a comment noting the exception is intentionally caught below.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link

gitar-bot bot commented Mar 24, 2026

Code Review ⚠️ Changes requested 6 resolved / 9 findings

Search index enhancement with 8 commits addressing logging and job management, but 3 important issues remain unresolved: formatLine drops exception stack traces, exception in the job-stopping loop aborts remaining distributed jobs, and dead null check on getLatestAppRuns. Address these blocking issues before merge.

⚠️ Bug: formatLine drops exception stack traces from log events

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java:70-78

The formatLine method captures timestamp, level, thread, logger, and message, but never calls event.getThrowableProxy(). When application code logs exceptions (e.g., LOG.error("Reindex failed", exception)), the stack trace—arguably the most important debugging information—is silently discarded from the captured log files.

For a logging system designed to help debug app run failures, losing stack traces significantly reduces its value.

Suggested fix
static String formatLine(ILoggingEvent event) {
  StringBuilder sb = new StringBuilder();
  sb.append(String.format(
      "{"timestamp":%d,"level":"%s","thread":"%s","logger":"%s","message":"%s"",
      event.getTimeStamp(), event.getLevel(),
      escapeJson(event.getThreadName()),
      escapeJson(event.getLoggerName()),
      escapeJson(event.getFormattedMessage())));
  IThrowableProxy tp = event.getThrowableProxy();
  if (tp != null) {
    ThrowableProxyConverter converter = new ThrowableProxyConverter();
    converter.start();
    sb.append(","exception":"").append(escapeJson(converter.convert(event))).append(""");
  }
  sb.append("}");
  return sb.toString();
}
⚠️ Edge Case: Exception in loop aborts stopping remaining distributed jobs

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexApp.java:90-93

In tryStopOutsideQuartz(), the loop over runningJobIds calls UUID.fromString(jobIdStr) and coordinator.requestStop() without per-iteration exception handling. If any single job ID is malformed or requestStop throws, the remaining jobs in the list will not be stopped.

While the outer tryStopViaApp catch block prevents an unhandled exception from propagating further, it means a single failure silently prevents cleanup of all subsequent running jobs.

Suggested fix
for (String jobIdStr : runningJobIds) {
  try {
    LOG.info("Stopping distributed job {} via coordinator fallback", jobIdStr);
    coordinator.requestStop(java.util.UUID.fromString(jobIdStr));
  } catch (Exception e) {
    LOG.warn("Failed to stop distributed job {}", jobIdStr, e);
  }
}
💡 Edge Case: Null check on getLatestAppRuns is dead code — method throws

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexApp.java:107-108

AppRepository.getLatestAppRuns(App) throws an exception (via orElseThrow) when no run record exists, rather than returning null. The latestRun != null guard on line 108 is therefore dead code — the absence case is instead handled by the catch (Exception e) block, which logs a warning.

This works correctly in practice since the catch block is resilient, but the null check is misleading to future readers who may assume the method can return null.

Suggested fix
Either:
1. Use the Optional-returning variant (`getLatestExtensionByIdOptional`) instead so the null check is meaningful, or
2. Remove the null check and add a comment noting the exception is intentionally caught below.
✅ 6 resolved
Bug: AppRunLogMetrics class is missing — code will not compile

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/RunLogBuffer.java:92 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java:136 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java:145-146 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java:171
The AppRunLogMetrics class is referenced in both RunLogBuffer.java (line 92) and AppRunLogAppender.java (lines 136, 145, 146, 171) but does not exist anywhere in the codebase. This will cause a compile-time failure. All four static method calls (recordFlush, recordRunStarted, recordRunCompleted, recordLinesCapture, recordCleanup) have no backing implementation.

Bug: periodicSyncSinkStats has check-then-act race on volatile field

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexExecutor.java:135-136 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexExecutor.java:1522-1528
The lastSinkSyncTime field is declared volatile, but periodicSyncSinkStats() performs a non-atomic read-compare-write: it reads the field, checks if enough time has passed, then writes the new timestamp. Multiple consumer threads (up to 40 from consumerExecutor) call this concurrently via processTask(), so several threads may simultaneously observe a stale value and all trigger syncSinkStatsFromBulkSink(). The practical impact is limited to redundant sync calls (no data corruption), but it defeats the throttling intent.

Edge Case: Thread prefix matching can cross-capture logs between apps

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java:65-73 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java:131-133
Thread prefix bindings are stored in a flat global list (threadPrefixBindings). If two apps run concurrently and both register the same thread prefix (e.g., "reindex-"), the append() method iterates the list and the first matching binding wins, routing worker-thread logs to the wrong app's buffer. This could also happen if a previous capture's prefixes were not cleaned up before a new one starts.

Edge Case: abbreviateLoggerName crashes on empty package segments

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java:96-97
If loggerName contains consecutive dots (e.g., "org..service.Foo"), split("\.") produces an empty string element, and parts[i].charAt(0) throws StringIndexOutOfBoundsException. While malformed logger names are unlikely in practice, a guard would make the helper more robust.

Bug: Developer-local config defaults committed to openmetadata.yaml

📄 conf/openmetadata.yaml:244 📄 conf/openmetadata.yaml:249 📄 conf/openmetadata.yaml:252 📄 conf/openmetadata.yaml:269 📄 conf/openmetadata.yaml:448 📄 conf/openmetadata.yaml:463 📄 conf/openmetadata.yaml:483-484 📄 conf/openmetadata.yaml:602
The conf/openmetadata.yaml template has multiple default values changed that appear to be developer-local settings rather than intentional changes for this PR:

  1. DB driver/scheme/port: MySQL → PostgreSQL (lines 244, 249) — breaks the documented default for MySQL users
  2. searchType: elasticsearchopensearch (line 448)
  3. semanticSearchEnabled: falsetrue (line 483)
  4. embeddingProvider: bedrockdjl (line 484)
  5. fernetKey: Changed to a different sample key (line 602)
  6. Connection pool tuning: maxSize 100→150, leakDetectionThreshold 60s→5min, maxConnTotal 30→50

These changes will affect every developer and deployment that relies on the default configuration template. The database driver change alone would cause startup failures for anyone running MySQL (the documented default). If the pool tuning and search type changes are intentional, they should be in a separate commit with justification.

...and 1 more resolved from earlier reviews

🤖 Prompt for agents
Code Review: Search index enhancement with 8 commits addressing logging and job management, but 3 important issues remain unresolved: formatLine drops exception stack traces, exception in the job-stopping loop aborts remaining distributed jobs, and dead null check on getLatestAppRuns. Address these blocking issues before merge.

1. ⚠️ Bug: formatLine drops exception stack traces from log events
   Files: openmetadata-service/src/main/java/org/openmetadata/service/apps/logging/AppRunLogAppender.java:70-78

   The `formatLine` method captures timestamp, level, thread, logger, and message, but never calls `event.getThrowableProxy()`. When application code logs exceptions (e.g., `LOG.error("Reindex failed", exception)`), the stack trace—arguably the most important debugging information—is silently discarded from the captured log files.
   
   For a logging system designed to help debug app run failures, losing stack traces significantly reduces its value.

   Suggested fix:
   static String formatLine(ILoggingEvent event) {
     StringBuilder sb = new StringBuilder();
     sb.append(String.format(
         "{"timestamp":%d,"level":"%s","thread":"%s","logger":"%s","message":"%s"",
         event.getTimeStamp(), event.getLevel(),
         escapeJson(event.getThreadName()),
         escapeJson(event.getLoggerName()),
         escapeJson(event.getFormattedMessage())));
     IThrowableProxy tp = event.getThrowableProxy();
     if (tp != null) {
       ThrowableProxyConverter converter = new ThrowableProxyConverter();
       converter.start();
       sb.append(","exception":"").append(escapeJson(converter.convert(event))).append(""");
     }
     sb.append("}");
     return sb.toString();
   }

2. ⚠️ Edge Case: Exception in loop aborts stopping remaining distributed jobs
   Files: openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexApp.java:90-93

   In `tryStopOutsideQuartz()`, the loop over `runningJobIds` calls `UUID.fromString(jobIdStr)` and `coordinator.requestStop()` without per-iteration exception handling. If any single job ID is malformed or `requestStop` throws, the remaining jobs in the list will not be stopped.
   
   While the outer `tryStopViaApp` catch block prevents an unhandled exception from propagating further, it means a single failure silently prevents cleanup of all subsequent running jobs.

   Suggested fix:
   for (String jobIdStr : runningJobIds) {
     try {
       LOG.info("Stopping distributed job {} via coordinator fallback", jobIdStr);
       coordinator.requestStop(java.util.UUID.fromString(jobIdStr));
     } catch (Exception e) {
       LOG.warn("Failed to stop distributed job {}", jobIdStr, e);
     }
   }

3. 💡 Edge Case: Null check on getLatestAppRuns is dead code — method throws
   Files: openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexApp.java:107-108

   `AppRepository.getLatestAppRuns(App)` throws an exception (via `orElseThrow`) when no run record exists, rather than returning null. The `latestRun != null` guard on line 108 is therefore dead code — the absence case is instead handled by the `catch (Exception e)` block, which logs a warning.
   
   This works correctly in practice since the catch block is resilient, but the null check is misleading to future readers who may assume the method can return null.

   Suggested fix:
   Either:
   1. Use the Optional-returning variant (`getLatestExtensionByIdOptional`) instead so the null check is meaningful, or
   2. Remove the null check and add a comment noting the exception is intentionally caught below.

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 207 out of 207 changed files in this pull request and generated 2 comments.

Comment on lines +15 to +27
// Mirror of EXCLUDED_FIELDS — can't reference it directly
// because SearchIndex has a static initializer that requires Entity to be bootstrapped.

class SearchIndexTest {

private static final Set<String> EXCLUDED_FIELDS =
Set.of(
"changeDescription",
"incrementalChangeDescription",
"upstreamLineage.pipeline.changeDescription",
"upstreamLineage.pipeline.incrementalChangeDescription",
"connection",
"changeSummary");
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test hardcodes a local copy of the production excluded-field list. That duplication is likely to drift from the real list over time, making the test less reliable. Consider moving the excluded-field set to a shared location that can be referenced safely in tests (e.g., a constant in SearchIndexUtils), or exposing it from SearchIndex without triggering heavy static initialization.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +26
private static final Set<String> EXCLUDED_FIELDS =
Set.of(
"changeDescription",
"incrementalChangeDescription",
"upstreamLineage.pipeline.changeDescription",
"upstreamLineage.pipeline.incrementalChangeDescription",
"connection",
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXCLUDED_FIELDS includes dotted-path entries like upstreamLineage.pipeline.changeDescription, but the current SearchIndexUtils.removeFieldByPath(...) implementation only navigates the first path element and then removes the last key from that same map, so deeper paths are not actually removed. Consider adding a test here that asserts dotted-path removal works and updating removeFieldByPath to traverse the full path before removing the final key (otherwise these exclusions won't take effect).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants