Skip to content

add structured logging for aws#1236

Open
gyoge0 wants to merge 1 commit into
devfrom
structured-logging
Open

add structured logging for aws#1236
gyoge0 wants to merge 1 commit into
devfrom
structured-logging

Conversation

@gyoge0
Copy link
Copy Markdown
Collaborator

@gyoge0 gyoge0 commented Apr 5, 2026

What I did

  • Updated logging config to use a structured JSON based logging format that is easily filterable in cloud watch.

Summary by CodeRabbit

  • New Features
    • Added request-scoped logging with automatic request ID tracking (generated or extracted from headers).
    • Implemented structured JSON logging format for improved log readability and analysis.
    • Request context (path, method, status code) automatically injected into all log records for enhanced traceability.
    • Request IDs now included in response headers for end-to-end request tracking.

@gyoge0 gyoge0 requested review from Jay-Lalwani and Copilot April 5, 2026 23:59
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

This pull request introduces request-scoped logging infrastructure to the application. It adds a middleware that captures request metadata (ID, path, method, user context) into a context variable, a logging filter that injects this context into log records, a JSON formatter for structured logging with custom serialization, and integrates these components into Django settings.

Changes

Cohort / File(s) Summary
Request Context and Logging Middleware
tcf_core/request_logging.py
Introduces ContextVar for per-request context storage, RequestLoggingMiddleware that captures request ID (from header or UUID), path, method, user ID, and status code, and RequestLoggingFilter that injects context into log records.
Structured JSON Logging
tcf_core/settings/log_formatter.py
Adds SafeJSONEncoder with custom serialization for datetime, UUID, Decimal, and bytes, and JsonFormatter that builds structured JSON log output with timestamp, level, logger name, message, exception details, and arbitrary extra fields.
Settings Integration
tcf_core/settings/base.py
Registers RequestLoggingMiddleware in MIDDLEWARE pipeline, configures JSON formatter on console handler, adds request context logging filter, and introduces separate loggers for django.request and tcf_website.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Middleware as RequestLoggingMiddleware
    participant ContextVar as request_context
    participant Handler as Logging Handler
    participant Filter as RequestLoggingFilter
    participant Formatter as JsonFormatter
    participant Response

    Client->>Middleware: HTTP Request
    Middleware->>Middleware: Extract/Generate Request ID
    Middleware->>ContextVar: Store context (request_id, path, method)
    Middleware->>Handler: Django processes request
    Handler->>Filter: Log record created
    Filter->>ContextVar: Read current context
    Filter->>Filter: Inject context into record
    Filter->>Formatter: Pass filtered record
    Formatter->>Formatter: Build structured JSON (timestamp, level, extras)
    Formatter->>Handler: Return formatted JSON
    Handler->>Response: Output log
    Middleware->>ContextVar: Update context (status_code)
    Middleware->>Response: Add X-Request-ID header
    Response->>Client: HTTP Response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop, hop, hooray! The requests now flow
With context and ID in structured JSON glow,
Each log record carries the tale of its way,
From middleware to filter, throughout the day,
Tracing the journey of code, safe and clear! 🌟

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description only includes a brief explanation in 'What I did' section but is missing several required template sections like GitHub Issues, Screenshots, Testing, and Questions/Discussions. Complete the PR description by adding all required sections from the template, especially the Testing section describing how reviewers can validate the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'add structured logging for aws' is vague about the actual implementation details and doesn't clearly describe the main changes made to the codebase. Consider a more specific title like 'Add JSON structured logging with request context' that better describes the core functionality added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch structured-logging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tcf_core/request_logging.py (1)

38-50: Consider resetting the ContextVar after request completion.

The request_context ContextVar is set but never reset. While Python's ContextVar isolates contexts per-task/thread, explicitly resetting prevents potential context leakage in edge cases (e.g., connection pooling, async views).

♻️ Proposed fix using token-based reset
         # Set context for this request
-        request_context.set(context)
+        token = request_context.set(context)

-        # Process request
-        response = self.get_response(request)
+        try:
+            # Process request
+            response = self.get_response(request)

-        # Add status code to context after response
-        context['status_code'] = response.status_code
+            # Add status code to context after response
+            context['status_code'] = response.status_code

-        # Return request ID in response headers for client reference
-        response['X-Request-ID'] = request_id
+            # Return request ID in response headers for client reference
+            response['X-Request-ID'] = request_id

-        return response
+            return response
+        finally:
+            request_context.reset(token)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tcf_core/request_logging.py` around lines 38 - 50, The middleware sets the
ContextVar via request_context.set(context) but never resets it; change the
request handling in request_logging.py to store the token returned by
request_context.set(context) (e.g., token = request_context.set(context)), run
self.get_response(request) inside a try block, then in a finally block add the
status_code to context (or ensure it's done before reset), set the response
header, and call request_context.reset(token) to restore the previous context;
reference request_context.set, request_context.reset, the local context dict and
the middleware's get_response call to locate where to wrap with try/finally and
perform the reset.
tcf_core/settings/log_formatter.py (1)

26-31: Consider using frozenset to address the mutable class attribute warning.

The static analysis tool flags this as a mutable class attribute. Since STANDARD_ATTRS is only used for membership checks and should never be modified, using frozenset clarifies intent and silences the linter.

♻️ Proposed fix
     # Standard logging record attributes to exclude from extra fields
-    STANDARD_ATTRS = {
+    STANDARD_ATTRS = frozenset({
         'name', 'msg', 'args', 'created', 'filename', 'funcName', 'levelname',
         'levelno', 'lineno', 'module', 'msecs', 'message', 'pathname', 'process',
         'processName', 'relativeCreated', 'thread', 'threadName', 'exc_info',
         'exc_text', 'stack_info', 'taskName'
-    }
+    })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tcf_core/settings/log_formatter.py` around lines 26 - 31, Replace the mutable
set literal STANDARD_ATTRS with an immutable frozenset to silence the mutable
class attribute warning and express intent: update the STANDARD_ATTRS definition
in log_formatter.py to use frozenset(...) so all membership checks against
STANDARD_ATTRS continue to work but the attribute is immutable; ensure no other
code attempts to modify STANDARD_ATTRS.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tcf_core/settings/log_formatter.py`:
- Around line 41-45: The exception-handling block that sets
log_record["exception_type"], ["exception_message"], and ["exception_traceback"]
should defensively check the contents of record.exc_info before indexing: verify
record.exc_info is a tuple/sequence and that exc_info[0] and exc_info[1] are not
None (or use getattr to safely read __name__), then set exception_type to
exc_info[0].__name__ if present else None, exception_message to str(exc_info[1])
if present else None, and still call self.formatException(record.exc_info) only
when exc_info is usable; update the code in the same method where log_record is
populated to perform these guarded checks rather than assuming exc_info elements
exist.

---

Nitpick comments:
In `@tcf_core/request_logging.py`:
- Around line 38-50: The middleware sets the ContextVar via
request_context.set(context) but never resets it; change the request handling in
request_logging.py to store the token returned by request_context.set(context)
(e.g., token = request_context.set(context)), run self.get_response(request)
inside a try block, then in a finally block add the status_code to context (or
ensure it's done before reset), set the response header, and call
request_context.reset(token) to restore the previous context; reference
request_context.set, request_context.reset, the local context dict and the
middleware's get_response call to locate where to wrap with try/finally and
perform the reset.

In `@tcf_core/settings/log_formatter.py`:
- Around line 26-31: Replace the mutable set literal STANDARD_ATTRS with an
immutable frozenset to silence the mutable class attribute warning and express
intent: update the STANDARD_ATTRS definition in log_formatter.py to use
frozenset(...) so all membership checks against STANDARD_ATTRS continue to work
but the attribute is immutable; ensure no other code attempts to modify
STANDARD_ATTRS.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 779ea0ba-33af-4799-9d39-973654b67649

📥 Commits

Reviewing files that changed from the base of the PR and between c0e5a04 and 66d7d28.

📒 Files selected for processing (3)
  • tcf_core/request_logging.py
  • tcf_core/settings/base.py
  • tcf_core/settings/log_formatter.py

Comment on lines +41 to +45
# Exception handling (split into structured fields)
if record.exc_info:
log_record["exception_type"] = record.exc_info[0].__name__
log_record["exception_message"] = str(record.exc_info[1])
log_record["exception_traceback"] = self.formatException(record.exc_info)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add defensive check for exc_info tuple contents.

When record.exc_info is truthy but the exception type is None (edge case), accessing exc_info[0].__name__ raises AttributeError.

🛡️ Proposed fix
         # Exception handling (split into structured fields)
-        if record.exc_info:
+        if record.exc_info and record.exc_info[0] is not None:
             log_record["exception_type"] = record.exc_info[0].__name__
             log_record["exception_message"] = str(record.exc_info[1])
             log_record["exception_traceback"] = self.formatException(record.exc_info)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Exception handling (split into structured fields)
if record.exc_info:
log_record["exception_type"] = record.exc_info[0].__name__
log_record["exception_message"] = str(record.exc_info[1])
log_record["exception_traceback"] = self.formatException(record.exc_info)
# Exception handling (split into structured fields)
if record.exc_info and record.exc_info[0] is not None:
log_record["exception_type"] = record.exc_info[0].__name__
log_record["exception_message"] = str(record.exc_info[1])
log_record["exception_traceback"] = self.formatException(record.exc_info)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tcf_core/settings/log_formatter.py` around lines 41 - 45, The
exception-handling block that sets log_record["exception_type"],
["exception_message"], and ["exception_traceback"] should defensively check the
contents of record.exc_info before indexing: verify record.exc_info is a
tuple/sequence and that exc_info[0] and exc_info[1] are not None (or use getattr
to safely read __name__), then set exception_type to exc_info[0].__name__ if
present else None, exception_message to str(exc_info[1]) if present else None,
and still call self.formatException(record.exc_info) only when exc_info is
usable; update the code in the same method where log_record is populated to
perform these guarded checks rather than assuming exc_info elements exist.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Django logging setup to emit structured JSON logs (intended for easier filtering in AWS CloudWatch) and adds per-request context (request ID, path, method, user) into log records.

Changes:

  • Added a JSON log formatter with safe serialization for common non-JSON types.
  • Added request-scoped logging context via middleware + logging filter (request ID propagation, request metadata injection).
  • Updated Django LOGGING config to use the JSON formatter and request-context filter on the console handler.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
tcf_core/settings/log_formatter.py Introduces JsonFormatter and SafeJSONEncoder to emit structured JSON log lines.
tcf_core/settings/base.py Wires JSON formatter + request-context filter into Django logging; adds the request logging middleware.
tcf_core/request_logging.py Adds middleware to manage request IDs and a logging filter to inject request context into LogRecords.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +50
request_context.set(context)

# Process request
response = self.get_response(request)

# Add status code to context after response
context['status_code'] = response.status_code

# Return request ID in response headers for client reference
response['X-Request-ID'] = request_id

return response
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request_context is set for the request but never reset/cleared. In a threaded WSGI worker (or any code that logs after the response is returned), this can leak the previous request’s context into unrelated log records. Capture the token returned by request_context.set(...) and reset(token) in a finally block (also covering exception paths).

Suggested change
request_context.set(context)
# Process request
response = self.get_response(request)
# Add status code to context after response
context['status_code'] = response.status_code
# Return request ID in response headers for client reference
response['X-Request-ID'] = request_id
return response
token = request_context.set(context)
try:
# Process request
response = self.get_response(request)
# Add status code to context after response
context['status_code'] = response.status_code
# Return request ID in response headers for client reference
response['X-Request-ID'] = request_id
return response
finally:
request_context.reset(token)

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +23
# Generate or extract request ID
request_id = request.headers.get('X-Request-ID')
if not request_id:
request_id = str(uuid.uuid4())

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request ID taken directly from X-Request-ID is written back into the response header. If a client supplies an invalid header value (e.g., contains newline chars), Django will raise BadHeaderError when setting response['X-Request-ID'], resulting in a 500. Validate/sanitize the incoming header (e.g., strict charset + max length) and fall back to a generated UUID when it’s invalid.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +6

# Thread-safe context variable for request data
request_context: ContextVar[dict] = ContextVar('request_context', default=None)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotation and default disagree: request_context is annotated as ContextVar[dict] but initialized with default=None. If you run mypy/django-stubs, this will be flagged and it also forces filter() to treat it as Optional[...] anyway. Consider annotating as ContextVar[dict[str, Any] | None] (or similar) and updating related code accordingly.

Suggested change
# Thread-safe context variable for request data
request_context: ContextVar[dict] = ContextVar('request_context', default=None)
from typing import Any
# Thread-safe context variable for request data
request_context: ContextVar[dict[str, Any] | None] = ContextVar('request_context', default=None)

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +39
def format(self, record):
log_record = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp is generated at formatting time (datetime.now(...)) rather than using the LogRecord’s creation time (record.created). This can produce inaccurate timestamps if formatting is delayed (e.g., buffered/queued handlers). Prefer deriving the timestamp from record.created (UTC) for correctness.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +49
for key, value in record.__dict__.items():
if key not in self.STANDARD_ATTRS and not key.startswith('_'):
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra fields from record.__dict__ can overwrite reserved keys in log_record (e.g., a caller can pass extra={'level': '...'} or extra={'timestamp': '...'}), which makes logs ambiguous and potentially breaks downstream parsing. Skip keys already present in log_record (or namespace extras under a dedicated sub-object) to avoid collisions.

Suggested change
for key, value in record.__dict__.items():
if key not in self.STANDARD_ATTRS and not key.startswith('_'):
# Skip keys that would overwrite reserved formatter-generated fields.
for key, value in record.__dict__.items():
if (
key not in self.STANDARD_ATTRS
and not key.startswith('_')
and key not in log_record
):

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +13
class RequestLoggingMiddleware:
"""
Middleware that generates/extracts request IDs and stores request context
for automatic inclusion in all logs.
"""
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new middleware/filter/formatter behavior isn’t covered by tests. Adding a small Django test that (1) hits a simple view and asserts X-Request-ID propagation and (2) emits a log record and asserts the JSON output includes request_id/path/method would prevent regressions and verify the filter+formatter wiring.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants