Skip to content

Comments

✨File preview: Add file preview backend service#2470

Open
Stockton11 wants to merge 2 commits intoModelEngine-Group:developfrom
Stockton11:zwb/file_preview
Open

✨File preview: Add file preview backend service#2470
Stockton11 wants to merge 2 commits intoModelEngine-Group:developfrom
Stockton11:zwb/file_preview

Conversation

@Stockton11
Copy link

1.添加文件预览接口/preview/{object_name:path}
2.添加文件预览逻辑,使用LibreOffice将Office文件转换为PDF并将转换后的PDF缓存回MinIO,设置为7天有效期;其他文件直接返回文件流。

Copilot AI review requested due to automatic review settings February 9, 2026 08:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a backend “file preview” capability: direct streaming for previewable types (PDF/images/text) and Office→PDF conversion via LibreOffice with MinIO-backed caching (intended TTL: 7 days).

Changes:

  • Add GET /file/preview/{object_name:path} endpoint that returns inline StreamingResponse.
  • Implement Office-to-PDF conversion with concurrency limits + per-file locking + MinIO cache write-back.
  • Add MinIO copy/existence helpers and extensive unit tests for preview, conversion, and headers.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
backend/apps/file_management_app.py Adds preview endpoint and inline Content-Disposition support
backend/services/file_management_service.py Implements preview_file_impl() + conversion/caching workflow + PDF validation
backend/utils/file_management_utils.py Adds convert_office_to_pdf() using LibreOffice
backend/database/attachment_db.py Adds file_exists() and copy_object() helpers; extends MIME map (.md)
backend/database/client.py Adds MinioClient.copy_object() wrapper
backend/consts/const.py Adds office MIME list and conversion concurrency limit
docker/docker-compose.yml Adds MinIO ILM expiry rule for converted/ prefix
test/backend/app/test_file_management_app.py Adds tests for preview endpoint + inline headers
test/backend/services/test_file_management_service.py Adds tests for preview routing, cache hit/miss, and PDF validation
test/backend/utils/test_file_management_utils.py Adds tests for LibreOffice conversion helper
test/backend/database/test_client.py Adds tests for MinioClient.copy_object()
test/backend/database/test_attachment_db.py Adds tests for file_exists() / copy_object()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +38 to +49
# Limit concurrent Office-to-PDF conversions
MAX_CONCURRENT_CONVERSIONS = 5

# Supported Office file MIME types for preview conversion
OFFICE_MIME_TYPES = [
'application/msword', # .doc
'application/vnd.openxmlformats-officedocument.wordprocessingml.document', # .docx
'application/vnd.ms-excel', # .xls
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', # .xlsx
'application/vnd.ms-powerpoint', # .ppt
'application/vnd.openxmlformats-officedocument.presentationml.presentation' # .pptx
]
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_CONCURRENT_CONVERSIONS is hard-coded to 5 here. Since this directly controls how many LibreOffice processes can run concurrently (CPU/memory heavy), it likely needs to be configurable via env var (similar to other settings) so deployments can tune it per machine size.

Copilot uses AI. Check for mistakes.
Comment on lines +602 to +610
return StreamingResponse(
file_stream,
media_type=content_type,
headers={
"Content-Disposition": content_disposition,
"Cache-Control": "public, max-age=3600",
"ETag": f'"{object_name}"',
}
)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ETag response header is being set to the literal object_name (e.g. "documents/test.pdf"), which is not a representation of the content. This can cause clients/proxies to incorrectly treat different versions of the same object as identical and serve stale previews. Either omit the ETag header, or populate it with the real object ETag/version from storage metadata (e.g., head_object / stat) and consider also handling If-None-Match for 304 responses.

Copilot uses AI. Check for mistakes.
return response


def get_file_size_from_minio(object_name: str, bucket: Optional[str] = None) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里修改了函数名之后,要确认所有使用到它的地方,是否也能兼容。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其他地方也已修改

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants