Skip to content

[#11171] improvement(core): add JSON formatter for audit logs#11343

Open
freesinger wants to merge 1 commit into
apache:mainfrom
freesinger:json-formatter-audit-dev
Open

[#11171] improvement(core): add JSON formatter for audit logs#11343
freesinger wants to merge 1 commit into
apache:mainfrom
freesinger:json-formatter-audit-dev

Conversation

@freesinger
Copy link
Copy Markdown
Contributor

@freesinger freesinger commented Jun 2, 2026

Switch the default audit formatter to structured JSON so SIEM consumers can parse audit events reliably, while preserving compatibility through the existing formatter configuration. Also add coverage for JSON serialization, redaction, and default formatter wiring.

What changes were proposed in this pull request?

This PR adds a structured JSON formatter for audit logs and switches it to be the default audit formatter.

The main changes are:

  1. Add JsonAuditFormatter to serialize each audit log entry as one JSON object per line.
  2. Include all core audit fields in the JSON output, including structured customInfo.
  3. Format timestamp in ISO 8601 with millisecond precision and an explicit timezone offset.
  4. Redact sensitive values before serialization for the following keys:
    • Authorization
    • Cookie
    • X-Amz-Security-Token
    • s3.access-key-id
    • jdbc-password
  5. Emit resultCount as a top-level JSON field for ListEvent when the count is available.
  6. Change the default audit formatter from SimpleFormatterV2 to JsonAuditFormatter through the existing formatter configuration.
  7. Update the Helm config template and server configuration documentation to reflect the new default formatter.
  8. Add unit tests for JSON serialization, redaction, null identifier handling, list event count output, and default formatter wiring.

Why are the changes needed?

The existing audit formatters mainly emit tab-separated text, which is harder for SIEM systems and other downstream log processors to consume reliably.

This change is needed because:

  1. Structured JSON is easier for SIEM systems to parse and index than TSV output.
  2. AuditLog.customInfo() should be preserved in a structured way instead of being difficult to parse downstream.
  3. Audit logs may include HTTP headers or credential-related properties, so sensitive values must be masked before being written.
  4. The default formatter should move to a more production-friendly structured format while still allowing operators to switch back through gravitino.audit.formatter.className if they need legacy behavior.

Fix: #11171

Does this PR introduce any user-facing change?

Yes.

  1. The default audit log output format changes from TSV-style text to structured JSON.
  2. The default value of gravitino.audit.formatter.className is now org.apache.gravitino.audit.JsonAuditFormatter.
  3. Audit logs now expose customInfo as structured JSON content instead of relying on TSV-compatible string formatting.
  4. Sensitive values for specific headers and properties are redacted in the JSON audit output.
  5. Users can still switch back to legacy formatter implementations through the existing formatter configuration.

How was this patch tested?

The patch was tested with targeted unit tests covering the new formatter and the default formatter wiring.

Executed test command:

./gradlew :core:test --tests org.apache.gravitino.audit.TestJsonAuditFormatter --tests org.apache.gravitino.audit.TestAuditManager --tests org.apache.gravitino.audit.TestFileAuditWriter

The tests cover:

  1. Core JSON field serialization.
  2. ISO 8601 timestamp serialization with millisecond precision.
  3. Sensitive field redaction.
  4. ListEvent resultCount serialization.
  5. Null identifier handling.
  6. Default audit formatter wiring through AuditLogManager.

Comment thread core/src/main/java/org/apache/gravitino/Configs.java Outdated
Comment thread core/src/main/java/org/apache/gravitino/audit/JsonAuditFormatter.java Outdated
@freesinger freesinger force-pushed the json-formatter-audit-dev branch from 7f2b263 to 4fa3164 Compare June 2, 2026 12:34
@freesinger freesinger force-pushed the json-formatter-audit-dev branch from 4fa3164 to 4edfe00 Compare June 2, 2026 12:48
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Code Coverage Report

Overall Project 66.78% +0.15% 🟢
Files changed 72.82% 🟢

Module Coverage
aliyun 1.72% 🔴
api 46.82% 🟢
authorization-common 85.96% 🟢
aws 3.66% 🔴
azure 2.47% 🔴
catalog-common 10.04% 🔴
catalog-fileset 80.33% 🟢
catalog-glue 66.08% 🟢
catalog-hive 79.55% 🟢
catalog-jdbc-clickhouse 80.02% 🟢
catalog-jdbc-common 45.31% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.29% 🟢
catalog-jdbc-starrocks 78.51% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 44.89% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 85.66% 🟢
catalog-lakehouse-paimon 79.29% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.91% 🟢
common 49.99% 🟢
core 82.5% -0.79% 🟢
filesystem-hadoop3 76.97% 🟢
flink 0.0% 🔴
flink-common 41.2% 🟢
flink-runtime 0.0% 🔴
gcp 14.12% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 53.26% 🟢
iceberg-common 56.75% 🟢
iceberg-rest-server 72.26% +0.49% 🟢
idp-basic 85.99% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 20.83% 🔴
lance-rest-server 60.27% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.73% 🟢
server-common 73.66% +2.26% 🟢
spark 32.79% 🔴
spark-common 39.75% 🔴
trino-connector 39.44% 🔴
Files
Module File Coverage
core CatalogMetaService.java 98.88% 🟢
Configs.java 97.89% 🟢
JsonAuditFormatter.java 92.11% 🟢
CatalogManager.java 67.02% 🟢
RelationalEntityStore.java 54.88% 🔴
GravitinoEnv.java 11.83% 🔴
iceberg-rest-server IcebergCleanupJobBaseSQLProvider.java 100.0% 🟢
IcebergCleanupJobSQLProviderFactory.java 94.44% 🟢
IcebergCleanupJobStore.java 93.75% 🟢
IcebergCleanupJobMapper.java 0.0% 🔴
server-common JcasbinAuthorizationLookups.java 84.62% 🟢
JcasbinAuthorizer.java 83.85% 🟢
JcasbinChangePoller.java 63.5% 🟢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add JSON formatter for audit logs

2 participants