Add PDF support with web upload interface #11

newton3 · 2025-11-23T07:07:33Z

Summary

Add comprehensive PDF processing with smart TOC extraction and page-based fallback
Extend command-line interface to support both EPUB and PDF files
Add web upload interface for drag-and-drop book uploads (both EPUB and PDF)

Changes Made

PDF Processing (`reader3.py`)

Added PyMuPDF library for PDF text extraction and metadata parsing
Implemented process_pdf() function with:
- Automatic TOC/bookmark extraction when available
- Page-based chunking fallback (10 pages per chapter) when no TOC exists
- Metadata extraction (title, author, publisher, date)
- Same Book/ChapterContent structure as EPUB for seamless integration

Command Line Support

Modified CLI to detect file type (.epub vs .pdf)
Route to appropriate processor based on file extension
Usage: uv run reader3.py <file.epub|file.pdf>

Web Upload Interface

New /upload GET route with file upload form (templates/upload.html)
New /upload POST endpoint to handle file processing
Added "Upload Book" button to library page
Server-side validation for EPUB and PDF files
Automatic processing and redirect to library after upload
Added python-multipart library for multipart form data handling

Dependencies

pymupdf>=1.25.1 - PDF parsing and text extraction
python-multipart>=0.0.9 - File upload handling in FastAPI

Test Plan

Verify EPUB processing still works via command line
Test PDF processing via command line with TOC-based PDFs
Test PDF processing with PDFs lacking TOC (page-based chunking)
Test web upload for EPUB files
Test web upload for PDF files
Verify processed books appear in library
Confirm reader interface works for both EPUB and PDF content

🤖 Generated with Claude Code

Provides guidance for Claude Code including development commands, architecture overview, and key distinctions between spine vs TOC navigation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add PyMuPDF library for PDF processing with smart TOC extraction - Implement process_pdf() with automatic TOC detection and page-based fallback - Extend CLI to handle both .epub and .pdf file formats - Add web upload interface at /upload for drag-and-drop book uploads - Support both EPUB and PDF uploads through web interface - Add python-multipart for file upload handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

newton3 and others added 2 commits November 22, 2025 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PDF support with web upload interface #11

Add PDF support with web upload interface #11

Uh oh!

newton3 commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add PDF support with web upload interface #11

Are you sure you want to change the base?

Add PDF support with web upload interface #11

Uh oh!

Conversation

newton3 commented Nov 23, 2025

Summary

Changes Made

PDF Processing (reader3.py)

Command Line Support

Web Upload Interface

Dependencies

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PDF Processing (`reader3.py`)