Skip to content

Make upload MIME validation robust when python-magic is unavailable#340

Open
arpit-k16 wants to merge 4 commits intoRohanExploit:mainfrom
arpit-k16:upload-mime-fallback-733f1
Open

Make upload MIME validation robust when python-magic is unavailable#340
arpit-k16 wants to merge 4 commits intoRohanExploit:mainfrom
arpit-k16:upload-mime-fallback-733f1

Conversation

@arpit-k16
Copy link

@arpit-k16 arpit-k16 commented Feb 5, 2026

Summary

This PR makes image upload validation more robust in environments where python-magic is missing or misconfigured, while preserving all existing security checks.

Changes

  • Import magic defensively in backend/main.py, falling back to None if the library is not available.
  • When python-magic is available, continue to use it to detect MIME type from the file contents.
  • When python-magic is not available or fails, fall back to:
    • FastAPI’s UploadFile.content_type header, or
    • mimetypes.guess_type based on the filename.
  • If MIME type cannot be determined or is not in the allowed list, the upload is rejected with a 400 error.
  • Keep existing security checks:
    • Maximum file size limit
    • PIL-based image validation and corruption checks
    • Optional resizing of large images for performance

Rationale

Previously, the backend assumed python-magic was always installed. In environments where it was missing or misconfigured, the app could fail at import time or when validating uploads.

This change:

  • Avoids hard failures when python-magic is unavailable.
  • Maintains strict validation of uploads (size, MIME type, and image integrity).
  • Does not change any public API or the normal behavior when python-magic is installed.

Testing

  • Verified diff is limited to backend/main.py.
  • Reasoned through behavior in both cases:
    • With python-magic present: MIME detection continues to use content-based detection.
    • Without python-magic: uploads are still validated via content type / filename plus PIL checks, and invalid files are rejected with HTTP 400.
  • Existing tests that mock magic (e.g. backend/tests/test_detection_bytes.py) remain compatible because the import name and usage are unchanged.

Summary by CodeRabbit

  • New Features

    • Integrated image upload + detection pipeline with server-side validation and per-user/IP upload limits.
  • Improvements

    • Automatic resizing of oversized images for smoother processing.
    • More resilient MIME-type detection with a fallback and startup warning if unavailable.
    • Stronger upload validation, clearer 400 errors, and improved logging for image integrity and processing.

@netlify
Copy link

netlify bot commented Feb 5, 2026

👷 Deploy request for fixmybharat pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 3e90f5e

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🙏 Thank you for your contribution, @arpit-k16!

PR Details:

Quality Checklist:
Please ensure your PR meets the following criteria:

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Code is commented where necessary
  • Documentation updated (if applicable)
  • No new warnings generated
  • Tests added/updated (if applicable)
  • All tests passing locally
  • No breaking changes to existing functionality

Review Process:

  1. Automated checks will run on your code
  2. A maintainer will review your changes
  3. Address any requested changes promptly
  4. Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

@github-actions github-actions bot added the size/m label Feb 5, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

Adds an upload validation and detection pipeline in backend/main.py: MIME detection (python-magic with fallback), size and per-identifier upload limits, synchronous PIL-based image integrity checks with automatic resize for oversized images, async wrappers, detection helper, DB table creation at startup, and a shared HTTP client.

Changes

Cohort / File(s) Summary
File Upload Validation & Detection
backend/main.py
Adds MIME detection (python-magic with fallback), ALLOWED_MIME_TYPES and MAX_FILE_SIZE constants, per-identifier upload limiter and check_upload_limits, _validate_uploaded_file_sync (size, MIME, PIL image checks, optional resize), async validate_uploaded_file, and process_and_detect to run detection workflows; raises HTTPException on failures and logs magic absence.
Initialization / Runtime wiring
backend/main.py
Creates database tables at startup, exposes a shared HTTP client, notes image processing cache, and integrates detection workflow helper into app lifecycle.
Manifest / Tooling
manifest
Manifest entries adjusted to reflect added/changed dependencies and features (lines added/removed).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Magic as python-magic
    participant Fallback as Header/Guess
    participant PIL
    participant DB
    participant Storage

    Client->>Server: POST /upload (file)
    Server->>Server: check_upload_limits(identifier)
    Server->>Magic: detect MIME (optional)
    alt magic available
        Magic-->>Server: mime/type
    else
        Server->>Fallback: use Content-Type or mimetypes.guess_type
        Fallback-->>Server: mime/type or None
    end
    Server->>Server: enforce size & ALLOWED_MIME_TYPES
    alt MIME is image
        Server->>PIL: open and validate image bytes
        PIL-->>Server: valid + dimensions
        alt dimensions > MAX_DIMENSION
            Server->>PIL: resize to MAX_DIMENSION
            PIL-->>Server: resized bytes
            Server->>Storage: update file bytes
        else
            Server->>Storage: store original bytes
        end
        Server->>Server: process_and_detect -> detection_func
        Server->>DB: (startup tables created) log/store results
    else
        Server-->>Client: HTTPException (invalid type/size)
    end
    Server-->>Client: response (success/error)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I sniff the bytes and hum with glee,

If magic's missing, I guess what I see,
I trim big pixels down just right,
I count each hop — one upload, one night,
A rabbit's patch of code, all tidy and spry.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: making upload MIME validation robust when python-magic is unavailable, which aligns with the PR's core objective of handling missing/misconfigured python-magic while preserving existing validations.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @arpit-k16! Thank you for your first Pull Request to VishwaGuru! 🎉

We appreciate your contribution to this open source platform empowering India's youth to engage with democracy.

What happens next?

  • A maintainer will review your PR soon
  • Please ensure all tests pass and the code follows our style guidelines
  • Be ready to address any feedback or requested changes

Before your PR is merged:

  • Ensure your code follows the project's coding standards
  • All tests should pass
  • Update documentation if needed
  • Make sure your commits are clear and descriptive

Resources:

Thank you for contributing! 🚀

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/main.py`:
- Around line 217-224: The code sets file.size after rewinding the buffer so it
becomes 0; fix by measuring the buffer length before rewinding (or by seeking to
the end then using tell) and then assign that value to file.size — e.g., after
img.save(...) and before output.seek(0) capture size = output.tell() or
output.seek(0, io.SEEK_END); size = output.tell(); then set file.file = output,
file.size = size and finally rewind the stream (output.seek(0)); update the
block that manipulates output, img.save, and file to use this order.
🧹 Nitpick comments (2)
backend/main.py (2)

184-186: Document the security trade-off when using fallback MIME detection.

When python-magic is unavailable, the code falls back to file.content_type (client-provided, spoofable) and mimetypes.guess_type (extension-based, also spoofable). While the subsequent PIL validation (lines 201-204) provides a secondary security layer that ensures the file is actually a valid image, the MIME-type check alone becomes bypassable.

Consider adding a comment documenting this security trade-off, or logging a more specific warning when fallback is used for a particular upload (not just at startup).

💡 Optional: Add per-upload warning when using fallback
     if not detected_mime:
         # Fallback: trust FastAPI's content_type header or guess from filename
         detected_mime = file.content_type or mimetypes.guess_type(file.filename or "")[0]
+        if detected_mime:
+            logger.debug(
+                f"MIME type for {file.filename} detected via fallback (content_type/extension): {detected_mime}. "
+                "Content will be validated by PIL."
+            )

226-231: Use logger.exception for automatic traceback inclusion.

logger.exception automatically includes the exception traceback, which is more informative than logger.error with manual formatting. The broad exception catch is acceptable here since PIL can raise various exception types for malformed images.

♻️ Proposed fix
     except Exception as pil_error:
-        logger.error(f"PIL validation failed for {file.filename}: {pil_error}")
+        logger.exception(f"PIL validation failed for {file.filename}: {pil_error}")
         raise HTTPException(
             status_code=400,
             detail="Invalid image file. The file appears to be corrupted or not a valid image."
         )

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🔍 Quality Reminder

Thanks for the updates! Please ensure:
- Your changes don't break existing functionality
- All tests still pass
- Code quality standards are maintained

*The maintainers will verify that the overall project flow remains intact.*

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/main.py (1)

424-428: ⚠️ Potential issue | 🔴 Critical

Add await to all validate_uploaded_file() calls; security validation is completely skipped without it.

validate_uploaded_file is async. Calling it without await in lines 427, 582, 606, 632, 658, and 684 creates a coroutine object that never executes, allowing unvalidated files through security checks entirely.

🛠️ Proposed fixes

Line 427:

        if image:
-           validate_uploaded_file(image)
+           await validate_uploaded_file(image)

Line 582:

-   validate_uploaded_file(image)
+   await validate_uploaded_file(image)

Line 606:

-   validate_uploaded_file(image)
+   await validate_uploaded_file(image)

Line 632:

-   validate_uploaded_file(image)
+   await validate_uploaded_file(image)

Line 658:

-   validate_uploaded_file(image)
+   await validate_uploaded_file(image)

Line 684:

-   validate_uploaded_file(image)
+   await validate_uploaded_file(image)
🤖 Fix all issues with AI agents
In `@backend/main.py`:
- Around line 181-208: The image validation/resizing block advances the file
pointer (Image.open(file.file) and img.verify()) but only rewinds when a resize
is performed; add a file.file.seek(0) after the PIL validation/resizing logic so
the pointer is always reset before returning (ensure this happens whether or not
the resize branch ran); update the code around Image.open(file.file) /
img.verify() and the replacement of file.file (and after recording resized_size)
so process_and_detect() and any subsequent Image.open(image.file) read from the
start of the stream.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@backend/main.py`:
- Around line 1-11: The file is missing imports used by check_upload_limits and
_validate_uploaded_file_sync and also duplicates the httpx import; add "import
io" and "from datetime import datetime, timedelta" at the top of backend/main.py
so io.BytesIO(), datetime.now(), and timedelta(...) are defined, and remove the
duplicate httpx import (keep a single import). Update any import grouping to
avoid Ruff F811 redefinition errors and ensure check_upload_limits and
_validate_uploaded_file_sync can reference these symbols.
- Around line 104-123: The check_upload_limits function performs a non-atomic
get→filter→check→append→set on user_upload_cache which permits a TOCTOU race;
fix it by making the read-modify-write atomic for the given identifier—either
implement a per-identifier threading.Lock (e.g., a dict of locks keyed by
identifier) and wrap the body that reads current_uploads, filters old
timestamps, checks length, appends now, and calls user_upload_cache.set inside
that lock, or replace the list-based approach with an atomic counter in a remote
store (e.g., Redis INCR with TTL) and update check_upload_limits to use that
atomic increment/decrement TTL semantics instead of get/set on
user_upload_cache. Ensure you reference and update the logic inside
check_upload_limits and use user_upload_cache.get/set or the new atomic store
consistently.
- Around line 252-253: Remove the unguarded module-level call to
Base.metadata.create_all(bind=engine) so table creation does not run at import
time; the same operation with error handling already exists in lifespan(), so
delete the Base.metadata.create_all line (and any direct uses of engine for
schema creation at module scope) to avoid duplicate/ungarded startup DB access
and rely on the existing lifecycle-managed creation inside lifespan().
- Around line 235-242: The call to validate_image_for_processing(pil_image) in
the async block is missing an import; update backend/main.py to import
validate_image_for_processing from backend.pothole_detection (same symbol used
in backend/routers/detection.py) so the
run_in_threadpool(validate_image_for_processing, pil_image) call resolves, or
alternatively remove that call if you decide the existing synchronous PIL checks
in _validate_uploaded_file_sync are sufficient—ensure the chosen approach keeps
run_in_threadpool(Image.open, image.file) and the surrounding HTTPException
handling intact.
- Around line 57-59: Remove the duplicate import lines that re-import
EXCEPTION_HANDLERS, Base, engine, get_db, SessionLocal, and Issue; locate the
redundant imports (the second occurrences of EXCEPTION_HANDLERS and the group
importing Base, engine, get_db, SessionLocal, Issue) and delete them so each
symbol is imported only once (keep the original imports where they first appear
and remove the later duplicate lines).
- Around line 192-196: Before calling img.resize capture the original format
(e.g., original_format = img.format) and after resizing pass that
original_format into img.save instead of using img.format or blindly falling
back to 'JPEG'; if original_format is None, choose a safe fallback that
preserves alpha (use 'PNG' when img.mode indicates an alpha channel like 'RGBA'
or 'LA', otherwise 'JPEG'), and keep the existing quality parameter when saving;
update the block around img.resize, img.format, and img.save (referencing
img.resize, img.format, img.save, new_width/new_height) to implement this.
🧹 Nitpick comments (2)
backend/main.py (2)

206-211: Chain the re-raised HTTPException with from for proper tracebacks (Ruff B904).

Inside except blocks, raising a new exception without from makes the original cause invisible in tracebacks. This applies here and similarly on lines 242 and 250.

♻️ Proposed fix
     except Exception as pil_error:
         logger.error(f"PIL validation failed for {file.filename}: {pil_error}")
         raise HTTPException(
             status_code=400,
             detail="Invalid image file. The file appears to be corrupted or not a valid image."
-        )
+        ) from pil_error

146-176: Security note: MIME fallback path is header/extension-based and trivially spoofable.

When python-magic is absent, both file.content_type (client-supplied header) and mimetypes.guess_type (extension-based) can be faked by an attacker. The downstream PIL verify() check partially compensates, but a file with a valid image header followed by malicious payload could still pass. Consider adding a comment documenting this accepted trade-off, so future maintainers understand the risk surface when python-magic is missing.

Comment on lines +1 to +11
from fastapi import FastAPI, UploadFile, File, Form, HTTPException, Depends, BackgroundTasks, Request, Query
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
from fastapi.middleware.gzip import GZipMiddleware
from sqlalchemy.orm import Session
from sqlalchemy import func
from contextlib import asynccontextmanager
from functools import lru_cache
from typing import List, Optional, Any
from PIL import Image
import mimetypes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing import io and from datetime import datetime, timedelta will cause NameError at runtime.

io.BytesIO() is used on line 195, datetime.now() on line 109, and timedelta(hours=1) on line 110, but none of these are imported anywhere in the file. Both check_upload_limits and the image-resize path inside _validate_uploaded_file_sync will crash at runtime.

Also, import httpx on line 40 is a redefinition of the identical import on line 32 (Ruff F811).

🐛 Proposed fix — add missing imports and remove duplicate
 from fastapi import FastAPI, UploadFile, File, Form, HTTPException, Depends, BackgroundTasks, Request, Query
 from fastapi.responses import JSONResponse
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.middleware.gzip import GZipMiddleware
 from sqlalchemy.orm import Session
 from sqlalchemy import func
 from contextlib import asynccontextmanager
 from functools import lru_cache
 from typing import List, Optional, Any
 from PIL import Image
 import mimetypes
+import io
+from datetime import datetime, timedelta
 try:
     import magic
 except ImportError:
     magic = None
-import httpx
 from async_lru import alru_cache

Also applies to: 34-41

🤖 Prompt for AI Agents
In `@backend/main.py` around lines 1 - 11, The file is missing imports used by
check_upload_limits and _validate_uploaded_file_sync and also duplicates the
httpx import; add "import io" and "from datetime import datetime, timedelta" at
the top of backend/main.py so io.BytesIO(), datetime.now(), and timedelta(...)
are defined, and remove the duplicate httpx import (keep a single import).
Update any import grouping to avoid Ruff F811 redefinition errors and ensure
check_upload_limits and _validate_uploaded_file_sync can reference these
symbols.

Comment on lines +57 to +59
from backend.exceptions import EXCEPTION_HANDLERS
from backend.database import Base, engine, get_db, SessionLocal
from backend.models import Issue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove duplicate imports (Ruff F811).

Lines 58–59 re-import Base, engine, get_db, SessionLocal, and Issue that are already imported on lines 44–45. Line 57 re-imports EXCEPTION_HANDLERS already imported on line 65. These redefinitions are flagged by Ruff and add confusion.

♻️ Proposed fix — remove the duplicate lines
-from backend.exceptions import EXCEPTION_HANDLERS
-from backend.database import Base, engine, get_db, SessionLocal
-from backend.models import Issue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from backend.exceptions import EXCEPTION_HANDLERS
from backend.database import Base, engine, get_db, SessionLocal
from backend.models import Issue
from backend.exceptions import EXCEPTION_HANDLERS
from backend.database import Base, engine, get_db, SessionLocal
from backend.models import Issue
🧰 Tools
🪛 Ruff (0.14.14)

[error] 58-58: Redefinition of unused Base from line 44: Base redefined here

Remove definition: Base

(F811)


[error] 58-58: Redefinition of unused engine from line 44: engine redefined here

Remove definition: engine

(F811)


[error] 58-58: Redefinition of unused get_db from line 44: get_db redefined here

Remove definition: get_db

(F811)


[error] 58-58: Redefinition of unused SessionLocal from line 44: SessionLocal redefined here

Remove definition: SessionLocal

(F811)


[error] 59-59: Redefinition of unused Issue from line 45: Issue redefined here

Remove definition: Issue

(F811)

🤖 Prompt for AI Agents
In `@backend/main.py` around lines 57 - 59, Remove the duplicate import lines that
re-import EXCEPTION_HANDLERS, Base, engine, get_db, SessionLocal, and Issue;
locate the redundant imports (the second occurrences of EXCEPTION_HANDLERS and
the group importing Base, engine, get_db, SessionLocal, Issue) and delete them
so each symbol is imported only once (keep the original imports where they first
appear and remove the later duplicate lines).

Comment on lines +104 to +123
def check_upload_limits(identifier: str, limit: int) -> None:
"""
Check if the user/IP has exceeded upload limits using thread-safe cache.
"""
current_uploads = user_upload_cache.get(identifier) or []
now = datetime.now()
one_hour_ago = now - timedelta(hours=1)

# Filter out old timestamps (older than 1 hour)
recent_uploads = [ts for ts in current_uploads if ts > one_hour_ago]

if len(recent_uploads) >= limit:
raise HTTPException(
status_code=429,
detail=f"Upload limit exceeded. Maximum {limit} uploads per hour allowed."
)

# Add current timestamp and update cache atomically
recent_uploads.append(now)
user_upload_cache.set(recent_uploads, identifier)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

TOCTOU race in upload-limit enforcement.

The get → filter → check → append → set sequence on lines 108–123 is not atomic. Two concurrent requests for the same identifier can both read the same list, both pass the limit check, and both write back — allowing more uploads than limit. If user_upload_cache does not provide a compare-and-swap or locking primitive, consider wrapping this in a threading.Lock keyed by identifier, or using an atomic counter (e.g., Redis INCR with TTL) for correctness under load.

🧰 Tools
🪛 Ruff (0.14.14)

[error] 109-109: Undefined name datetime

(F821)


[error] 110-110: Undefined name timedelta

(F821)

🤖 Prompt for AI Agents
In `@backend/main.py` around lines 104 - 123, The check_upload_limits function
performs a non-atomic get→filter→check→append→set on user_upload_cache which
permits a TOCTOU race; fix it by making the read-modify-write atomic for the
given identifier—either implement a per-identifier threading.Lock (e.g., a dict
of locks keyed by identifier) and wrap the body that reads current_uploads,
filters old timestamps, checks length, appends now, and calls
user_upload_cache.set inside that lock, or replace the list-based approach with
an atomic counter in a remote store (e.g., Redis INCR with TTL) and update
check_upload_limits to use that atomic increment/decrement TTL semantics instead
of get/set on user_upload_cache. Ensure you reference and update the logic
inside check_upload_limits and use user_upload_cache.get/set or the new atomic
store consistently.

Comment on lines +192 to +196
img = img.resize((new_width, new_height), Image.Resampling.LANCZOS)

# Save resized image back to file
output = io.BytesIO()
img.save(output, format=img.format or 'JPEG', quality=85)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

img.format is None after resize() — fallback to 'JPEG' may corrupt alpha-channel images.

Image.resize() returns a new Image object whose .format attribute is None. The or 'JPEG' fallback will always activate. For PNG/GIF/WebP uploads with an alpha channel, saving as JPEG will either raise an error (cannot write mode RGBA as JPEG) or silently drop transparency.

Capture the original format before resizing and use it in img.save():

🐛 Proposed fix
         img = Image.open(file.file)
         if img.width > 1024 or img.height > 1024:
+            original_format = img.format or 'JPEG'
             # Calculate new size maintaining aspect ratio
             ratio = min(1024 / img.width, 1024 / img.height)
             new_width = int(img.width * ratio)
             new_height = int(img.height * ratio)

             img = img.resize((new_width, new_height), Image.Resampling.LANCZOS)

             # Save resized image back to file
             output = io.BytesIO()
-            img.save(output, format=img.format or 'JPEG', quality=85)
+            img.save(output, format=original_format, quality=85)
🧰 Tools
🪛 Ruff (0.14.14)

[error] 195-195: Undefined name io

(F821)

🤖 Prompt for AI Agents
In `@backend/main.py` around lines 192 - 196, Before calling img.resize capture
the original format (e.g., original_format = img.format) and after resizing pass
that original_format into img.save instead of using img.format or blindly
falling back to 'JPEG'; if original_format is None, choose a safe fallback that
preserves alpha (use 'PNG' when img.mode indicates an alpha channel like 'RGBA'
or 'LA', otherwise 'JPEG'), and keep the existing quality parameter when saving;
update the block around img.resize, img.format, and img.save (referencing
img.resize, img.format, img.save, new_width/new_height) to implement this.

Comment on lines +235 to +242
pil_image = await run_in_threadpool(Image.open, image.file)
# Validate image for processing
await run_in_threadpool(validate_image_for_processing, pil_image)
except HTTPException:
raise # Re-raise HTTP exceptions from validation
except Exception as e:
logger.error(f"Error during background initialization: {e}", exc_info=True)
logger.error(f"Invalid image file during processing: {e}", exc_info=True)
raise HTTPException(status_code=400, detail="Invalid image file")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for definition of validate_image_for_processing
rg -nP --type=py 'def validate_image_for_processing'

Repository: RohanExploit/VishwaGuru

Length of output: 141


🏁 Script executed:

# Search for imports of validate_image_for_processing
rg -nP --type=py 'from .* import.*validate_image_for_processing|import.*validate_image_for_processing'

Repository: RohanExploit/VishwaGuru

Length of output: 380


🏁 Script executed:

# Check the imports and context in backend/main.py around line 237
head -n 250 backend/main.py | tail -n +1 | head -n 50

Repository: RohanExploit/VishwaGuru

Length of output: 1939


🏁 Script executed:

# Get a broader view of imports at the top of backend/main.py
head -n 100 backend/main.py

Repository: RohanExploit/VishwaGuru

Length of output: 3750


🏁 Script executed:

# Check all imports in backend/main.py to see if validate_image_for_processing appears anywhere
rg -n 'from backend|import.*validate_image_for_processing' backend/main.py

Repository: RohanExploit/VishwaGuru

Length of output: 987


🏁 Script executed:

# Verify the actual code around line 237 in backend/main.py
sed -n '230,245p' backend/main.py

Repository: RohanExploit/VishwaGuru

Length of output: 692


Import validate_image_for_processing from backend.pothole_detection.

Line 237 calls validate_image_for_processing(pil_image), but this function is not imported in backend/main.py. The function is defined in backend.pothole_detection and already imported in other files like backend/routers/detection.py. Add it to the imports at the top of the file, or remove the call if the PIL validation in _validate_uploaded_file_sync is sufficient.

🧰 Tools
🪛 Ruff (0.14.14)

[error] 237-237: Undefined name validate_image_for_processing

(F821)


[warning] 242-242: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In `@backend/main.py` around lines 235 - 242, The call to
validate_image_for_processing(pil_image) in the async block is missing an
import; update backend/main.py to import validate_image_for_processing from
backend.pothole_detection (same symbol used in backend/routers/detection.py) so
the run_in_threadpool(validate_image_for_processing, pil_image) call resolves,
or alternatively remove that call if you decide the existing synchronous PIL
checks in _validate_uploaded_file_sync are sufficient—ensure the chosen approach
keeps run_in_threadpool(Image.open, image.file) and the surrounding
HTTPException handling intact.

Comment on lines +252 to +253
# Create tables if they don't exist
Base.metadata.create_all(bind=engine)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redundant and unguarded Base.metadata.create_all at module level.

This synchronous call runs at import time without error handling, blocking the module load and crashing if the database is unreachable. The same operation is already performed inside lifespan() (line 265) with proper error handling. Remove this line to avoid the duplicate call and the risk of an unhandled startup crash.

🐛 Proposed fix
-# Create tables if they don't exist
-Base.metadata.create_all(bind=engine)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Create tables if they don't exist
Base.metadata.create_all(bind=engine)
🤖 Prompt for AI Agents
In `@backend/main.py` around lines 252 - 253, Remove the unguarded module-level
call to Base.metadata.create_all(bind=engine) so table creation does not run at
import time; the same operation with error handling already exists in
lifespan(), so delete the Base.metadata.create_all line (and any direct uses of
engine for schema creation at module scope) to avoid duplicate/ungarded startup
DB access and rely on the existing lifecycle-managed creation inside lifespan().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant