add structured logging for aws#1236
Conversation
📝 WalkthroughWalkthroughThis pull request introduces request-scoped logging infrastructure to the application. It adds a middleware that captures request metadata (ID, path, method, user context) into a context variable, a logging filter that injects this context into log records, a JSON formatter for structured logging with custom serialization, and integrates these components into Django settings. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Middleware as RequestLoggingMiddleware
participant ContextVar as request_context
participant Handler as Logging Handler
participant Filter as RequestLoggingFilter
participant Formatter as JsonFormatter
participant Response
Client->>Middleware: HTTP Request
Middleware->>Middleware: Extract/Generate Request ID
Middleware->>ContextVar: Store context (request_id, path, method)
Middleware->>Handler: Django processes request
Handler->>Filter: Log record created
Filter->>ContextVar: Read current context
Filter->>Filter: Inject context into record
Filter->>Formatter: Pass filtered record
Formatter->>Formatter: Build structured JSON (timestamp, level, extras)
Formatter->>Handler: Return formatted JSON
Handler->>Response: Output log
Middleware->>ContextVar: Update context (status_code)
Middleware->>Response: Add X-Request-ID header
Response->>Client: HTTP Response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tcf_core/request_logging.py (1)
38-50: Consider resetting the ContextVar after request completion.The
request_contextContextVar is set but never reset. While Python'sContextVarisolates contexts per-task/thread, explicitly resetting prevents potential context leakage in edge cases (e.g., connection pooling, async views).♻️ Proposed fix using token-based reset
# Set context for this request - request_context.set(context) + token = request_context.set(context) - # Process request - response = self.get_response(request) + try: + # Process request + response = self.get_response(request) - # Add status code to context after response - context['status_code'] = response.status_code + # Add status code to context after response + context['status_code'] = response.status_code - # Return request ID in response headers for client reference - response['X-Request-ID'] = request_id + # Return request ID in response headers for client reference + response['X-Request-ID'] = request_id - return response + return response + finally: + request_context.reset(token)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tcf_core/request_logging.py` around lines 38 - 50, The middleware sets the ContextVar via request_context.set(context) but never resets it; change the request handling in request_logging.py to store the token returned by request_context.set(context) (e.g., token = request_context.set(context)), run self.get_response(request) inside a try block, then in a finally block add the status_code to context (or ensure it's done before reset), set the response header, and call request_context.reset(token) to restore the previous context; reference request_context.set, request_context.reset, the local context dict and the middleware's get_response call to locate where to wrap with try/finally and perform the reset.tcf_core/settings/log_formatter.py (1)
26-31: Consider usingfrozensetto address the mutable class attribute warning.The static analysis tool flags this as a mutable class attribute. Since
STANDARD_ATTRSis only used for membership checks and should never be modified, usingfrozensetclarifies intent and silences the linter.♻️ Proposed fix
# Standard logging record attributes to exclude from extra fields - STANDARD_ATTRS = { + STANDARD_ATTRS = frozenset({ 'name', 'msg', 'args', 'created', 'filename', 'funcName', 'levelname', 'levelno', 'lineno', 'module', 'msecs', 'message', 'pathname', 'process', 'processName', 'relativeCreated', 'thread', 'threadName', 'exc_info', 'exc_text', 'stack_info', 'taskName' - } + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tcf_core/settings/log_formatter.py` around lines 26 - 31, Replace the mutable set literal STANDARD_ATTRS with an immutable frozenset to silence the mutable class attribute warning and express intent: update the STANDARD_ATTRS definition in log_formatter.py to use frozenset(...) so all membership checks against STANDARD_ATTRS continue to work but the attribute is immutable; ensure no other code attempts to modify STANDARD_ATTRS.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tcf_core/settings/log_formatter.py`:
- Around line 41-45: The exception-handling block that sets
log_record["exception_type"], ["exception_message"], and ["exception_traceback"]
should defensively check the contents of record.exc_info before indexing: verify
record.exc_info is a tuple/sequence and that exc_info[0] and exc_info[1] are not
None (or use getattr to safely read __name__), then set exception_type to
exc_info[0].__name__ if present else None, exception_message to str(exc_info[1])
if present else None, and still call self.formatException(record.exc_info) only
when exc_info is usable; update the code in the same method where log_record is
populated to perform these guarded checks rather than assuming exc_info elements
exist.
---
Nitpick comments:
In `@tcf_core/request_logging.py`:
- Around line 38-50: The middleware sets the ContextVar via
request_context.set(context) but never resets it; change the request handling in
request_logging.py to store the token returned by request_context.set(context)
(e.g., token = request_context.set(context)), run self.get_response(request)
inside a try block, then in a finally block add the status_code to context (or
ensure it's done before reset), set the response header, and call
request_context.reset(token) to restore the previous context; reference
request_context.set, request_context.reset, the local context dict and the
middleware's get_response call to locate where to wrap with try/finally and
perform the reset.
In `@tcf_core/settings/log_formatter.py`:
- Around line 26-31: Replace the mutable set literal STANDARD_ATTRS with an
immutable frozenset to silence the mutable class attribute warning and express
intent: update the STANDARD_ATTRS definition in log_formatter.py to use
frozenset(...) so all membership checks against STANDARD_ATTRS continue to work
but the attribute is immutable; ensure no other code attempts to modify
STANDARD_ATTRS.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 779ea0ba-33af-4799-9d39-973654b67649
📒 Files selected for processing (3)
tcf_core/request_logging.pytcf_core/settings/base.pytcf_core/settings/log_formatter.py
| # Exception handling (split into structured fields) | ||
| if record.exc_info: | ||
| log_record["exception_type"] = record.exc_info[0].__name__ | ||
| log_record["exception_message"] = str(record.exc_info[1]) | ||
| log_record["exception_traceback"] = self.formatException(record.exc_info) |
There was a problem hiding this comment.
Add defensive check for exc_info tuple contents.
When record.exc_info is truthy but the exception type is None (edge case), accessing exc_info[0].__name__ raises AttributeError.
🛡️ Proposed fix
# Exception handling (split into structured fields)
- if record.exc_info:
+ if record.exc_info and record.exc_info[0] is not None:
log_record["exception_type"] = record.exc_info[0].__name__
log_record["exception_message"] = str(record.exc_info[1])
log_record["exception_traceback"] = self.formatException(record.exc_info)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Exception handling (split into structured fields) | |
| if record.exc_info: | |
| log_record["exception_type"] = record.exc_info[0].__name__ | |
| log_record["exception_message"] = str(record.exc_info[1]) | |
| log_record["exception_traceback"] = self.formatException(record.exc_info) | |
| # Exception handling (split into structured fields) | |
| if record.exc_info and record.exc_info[0] is not None: | |
| log_record["exception_type"] = record.exc_info[0].__name__ | |
| log_record["exception_message"] = str(record.exc_info[1]) | |
| log_record["exception_traceback"] = self.formatException(record.exc_info) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tcf_core/settings/log_formatter.py` around lines 41 - 45, The
exception-handling block that sets log_record["exception_type"],
["exception_message"], and ["exception_traceback"] should defensively check the
contents of record.exc_info before indexing: verify record.exc_info is a
tuple/sequence and that exc_info[0] and exc_info[1] are not None (or use getattr
to safely read __name__), then set exception_type to exc_info[0].__name__ if
present else None, exception_message to str(exc_info[1]) if present else None,
and still call self.formatException(record.exc_info) only when exc_info is
usable; update the code in the same method where log_record is populated to
perform these guarded checks rather than assuming exc_info elements exist.
There was a problem hiding this comment.
Pull request overview
This PR updates the Django logging setup to emit structured JSON logs (intended for easier filtering in AWS CloudWatch) and adds per-request context (request ID, path, method, user) into log records.
Changes:
- Added a JSON log formatter with safe serialization for common non-JSON types.
- Added request-scoped logging context via middleware + logging filter (request ID propagation, request metadata injection).
- Updated Django
LOGGINGconfig to use the JSON formatter and request-context filter on the console handler.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
tcf_core/settings/log_formatter.py |
Introduces JsonFormatter and SafeJSONEncoder to emit structured JSON log lines. |
tcf_core/settings/base.py |
Wires JSON formatter + request-context filter into Django logging; adds the request logging middleware. |
tcf_core/request_logging.py |
Adds middleware to manage request IDs and a logging filter to inject request context into LogRecords. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| request_context.set(context) | ||
|
|
||
| # Process request | ||
| response = self.get_response(request) | ||
|
|
||
| # Add status code to context after response | ||
| context['status_code'] = response.status_code | ||
|
|
||
| # Return request ID in response headers for client reference | ||
| response['X-Request-ID'] = request_id | ||
|
|
||
| return response |
There was a problem hiding this comment.
request_context is set for the request but never reset/cleared. In a threaded WSGI worker (or any code that logs after the response is returned), this can leak the previous request’s context into unrelated log records. Capture the token returned by request_context.set(...) and reset(token) in a finally block (also covering exception paths).
| request_context.set(context) | |
| # Process request | |
| response = self.get_response(request) | |
| # Add status code to context after response | |
| context['status_code'] = response.status_code | |
| # Return request ID in response headers for client reference | |
| response['X-Request-ID'] = request_id | |
| return response | |
| token = request_context.set(context) | |
| try: | |
| # Process request | |
| response = self.get_response(request) | |
| # Add status code to context after response | |
| context['status_code'] = response.status_code | |
| # Return request ID in response headers for client reference | |
| response['X-Request-ID'] = request_id | |
| return response | |
| finally: | |
| request_context.reset(token) |
| # Generate or extract request ID | ||
| request_id = request.headers.get('X-Request-ID') | ||
| if not request_id: | ||
| request_id = str(uuid.uuid4()) | ||
|
|
There was a problem hiding this comment.
The request ID taken directly from X-Request-ID is written back into the response header. If a client supplies an invalid header value (e.g., contains newline chars), Django will raise BadHeaderError when setting response['X-Request-ID'], resulting in a 500. Validate/sanitize the incoming header (e.g., strict charset + max length) and fall back to a generated UUID when it’s invalid.
|
|
||
| # Thread-safe context variable for request data | ||
| request_context: ContextVar[dict] = ContextVar('request_context', default=None) |
There was a problem hiding this comment.
Type annotation and default disagree: request_context is annotated as ContextVar[dict] but initialized with default=None. If you run mypy/django-stubs, this will be flagged and it also forces filter() to treat it as Optional[...] anyway. Consider annotating as ContextVar[dict[str, Any] | None] (or similar) and updating related code accordingly.
| # Thread-safe context variable for request data | |
| request_context: ContextVar[dict] = ContextVar('request_context', default=None) | |
| from typing import Any | |
| # Thread-safe context variable for request data | |
| request_context: ContextVar[dict[str, Any] | None] = ContextVar('request_context', default=None) |
| def format(self, record): | ||
| log_record = { | ||
| "timestamp": datetime.now(timezone.utc).isoformat(), | ||
| "level": record.levelname, | ||
| "logger": record.name, | ||
| "message": record.getMessage(), | ||
| } |
There was a problem hiding this comment.
timestamp is generated at formatting time (datetime.now(...)) rather than using the LogRecord’s creation time (record.created). This can produce inaccurate timestamps if formatting is delayed (e.g., buffered/queued handlers). Prefer deriving the timestamp from record.created (UTC) for correctness.
| for key, value in record.__dict__.items(): | ||
| if key not in self.STANDARD_ATTRS and not key.startswith('_'): |
There was a problem hiding this comment.
Extra fields from record.__dict__ can overwrite reserved keys in log_record (e.g., a caller can pass extra={'level': '...'} or extra={'timestamp': '...'}), which makes logs ambiguous and potentially breaks downstream parsing. Skip keys already present in log_record (or namespace extras under a dedicated sub-object) to avoid collisions.
| for key, value in record.__dict__.items(): | |
| if key not in self.STANDARD_ATTRS and not key.startswith('_'): | |
| # Skip keys that would overwrite reserved formatter-generated fields. | |
| for key, value in record.__dict__.items(): | |
| if ( | |
| key not in self.STANDARD_ATTRS | |
| and not key.startswith('_') | |
| and key not in log_record | |
| ): |
| class RequestLoggingMiddleware: | ||
| """ | ||
| Middleware that generates/extracts request IDs and stores request context | ||
| for automatic inclusion in all logs. | ||
| """ |
There was a problem hiding this comment.
This new middleware/filter/formatter behavior isn’t covered by tests. Adding a small Django test that (1) hits a simple view and asserts X-Request-ID propagation and (2) emits a log record and asserts the JSON output includes request_id/path/method would prevent regressions and verify the filter+formatter wiring.
What I did
Summary by CodeRabbit