Skip to content

Conversation

@newton3
Copy link

@newton3 newton3 commented Nov 23, 2025

Summary

  • Add comprehensive PDF processing with smart TOC extraction and page-based fallback
  • Extend command-line interface to support both EPUB and PDF files
  • Add web upload interface for drag-and-drop book uploads (both EPUB and PDF)

Changes Made

PDF Processing (reader3.py)

  • Added PyMuPDF library for PDF text extraction and metadata parsing
  • Implemented process_pdf() function with:
    • Automatic TOC/bookmark extraction when available
    • Page-based chunking fallback (10 pages per chapter) when no TOC exists
    • Metadata extraction (title, author, publisher, date)
    • Same Book/ChapterContent structure as EPUB for seamless integration

Command Line Support

  • Modified CLI to detect file type (.epub vs .pdf)
  • Route to appropriate processor based on file extension
  • Usage: uv run reader3.py <file.epub|file.pdf>

Web Upload Interface

  • New /upload GET route with file upload form (templates/upload.html)
  • New /upload POST endpoint to handle file processing
  • Added "Upload Book" button to library page
  • Server-side validation for EPUB and PDF files
  • Automatic processing and redirect to library after upload
  • Added python-multipart library for multipart form data handling

Dependencies

  • pymupdf>=1.25.1 - PDF parsing and text extraction
  • python-multipart>=0.0.9 - File upload handling in FastAPI

Test Plan

  • Verify EPUB processing still works via command line
  • Test PDF processing via command line with TOC-based PDFs
  • Test PDF processing with PDFs lacking TOC (page-based chunking)
  • Test web upload for EPUB files
  • Test web upload for PDF files
  • Verify processed books appear in library
  • Confirm reader interface works for both EPUB and PDF content

🤖 Generated with Claude Code

newton3 and others added 2 commits November 22, 2025 22:18
Provides guidance for Claude Code including development commands, architecture overview, and key distinctions between spine vs TOC navigation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add PyMuPDF library for PDF processing with smart TOC extraction
- Implement process_pdf() with automatic TOC detection and page-based fallback
- Extend CLI to handle both .epub and .pdf file formats
- Add web upload interface at /upload for drag-and-drop book uploads
- Support both EPUB and PDF uploads through web interface
- Add python-multipart for file upload handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant