✨File preview: Add file preview backend service#2470
✨File preview: Add file preview backend service#2470Stockton11 wants to merge 2 commits intoModelEngine-Group:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a backend “file preview” capability: direct streaming for previewable types (PDF/images/text) and Office→PDF conversion via LibreOffice with MinIO-backed caching (intended TTL: 7 days).
Changes:
- Add
GET /file/preview/{object_name:path}endpoint that returns inlineStreamingResponse. - Implement Office-to-PDF conversion with concurrency limits + per-file locking + MinIO cache write-back.
- Add MinIO copy/existence helpers and extensive unit tests for preview, conversion, and headers.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/apps/file_management_app.py | Adds preview endpoint and inline Content-Disposition support |
| backend/services/file_management_service.py | Implements preview_file_impl() + conversion/caching workflow + PDF validation |
| backend/utils/file_management_utils.py | Adds convert_office_to_pdf() using LibreOffice |
| backend/database/attachment_db.py | Adds file_exists() and copy_object() helpers; extends MIME map (.md) |
| backend/database/client.py | Adds MinioClient.copy_object() wrapper |
| backend/consts/const.py | Adds office MIME list and conversion concurrency limit |
| docker/docker-compose.yml | Adds MinIO ILM expiry rule for converted/ prefix |
| test/backend/app/test_file_management_app.py | Adds tests for preview endpoint + inline headers |
| test/backend/services/test_file_management_service.py | Adds tests for preview routing, cache hit/miss, and PDF validation |
| test/backend/utils/test_file_management_utils.py | Adds tests for LibreOffice conversion helper |
| test/backend/database/test_client.py | Adds tests for MinioClient.copy_object() |
| test/backend/database/test_attachment_db.py | Adds tests for file_exists() / copy_object() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Limit concurrent Office-to-PDF conversions | ||
| MAX_CONCURRENT_CONVERSIONS = 5 | ||
|
|
||
| # Supported Office file MIME types for preview conversion | ||
| OFFICE_MIME_TYPES = [ | ||
| 'application/msword', # .doc | ||
| 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', # .docx | ||
| 'application/vnd.ms-excel', # .xls | ||
| 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', # .xlsx | ||
| 'application/vnd.ms-powerpoint', # .ppt | ||
| 'application/vnd.openxmlformats-officedocument.presentationml.presentation' # .pptx | ||
| ] |
There was a problem hiding this comment.
MAX_CONCURRENT_CONVERSIONS is hard-coded to 5 here. Since this directly controls how many LibreOffice processes can run concurrently (CPU/memory heavy), it likely needs to be configurable via env var (similar to other settings) so deployments can tune it per machine size.
| return StreamingResponse( | ||
| file_stream, | ||
| media_type=content_type, | ||
| headers={ | ||
| "Content-Disposition": content_disposition, | ||
| "Cache-Control": "public, max-age=3600", | ||
| "ETag": f'"{object_name}"', | ||
| } | ||
| ) |
There was a problem hiding this comment.
The ETag response header is being set to the literal object_name (e.g. "documents/test.pdf"), which is not a representation of the content. This can cause clients/proxies to incorrectly treat different versions of the same object as identical and serve stale previews. Either omit the ETag header, or populate it with the real object ETag/version from storage metadata (e.g., head_object / stat) and consider also handling If-None-Match for 304 responses.
| return response | ||
|
|
||
|
|
||
| def get_file_size_from_minio(object_name: str, bucket: Optional[str] = None) -> int: |
There was a problem hiding this comment.
这里修改了函数名之后,要确认所有使用到它的地方,是否也能兼容。
1.添加文件预览接口/preview/{object_name:path}
2.添加文件预览逻辑,使用LibreOffice将Office文件转换为PDF并将转换后的PDF缓存回MinIO,设置为7天有效期;其他文件直接返回文件流。