Skip to content

Vision-Script is build to solve the problem text digitalization , it use OCR to convert your hard copy to soft copy , its offine so once you set it up you can use it offline without any problem ....

License

Notifications You must be signed in to change notification settings

imadnan4/Vision-Script

Repository files navigation

VisionScript

A cutting-edge text extraction and summarization solution built with React, TypeScript, and Python, featuring a modern tech stack and intelligent document processing capabilities. Transform your physical documents into digital format with ease and precision.

Recent Updates

  • AI Text Summarization - Added intelligent text summarization with OpenRouter Mistral integration
  • PDF Support - Upload and process PDF documents for text extraction and summarization
  • Smart Response Cleaning - AI responses automatically formatted for human readability
  • Multilanguage Support - Added support for 13+ languages with seamless language switching
  • UI Improvements - Enhanced user interface for better user experience
  • Backend Optimization - Improved text detection accuracy and processing speed

Key Features

Smart Document Capture

  • Drag-and-drop file upload support
  • Real-time camera-based document scanning
  • Intelligent frame capture and processing
  • Support for multiple image formats (JPG, PNG, BMP)
  • Advanced image preprocessing for optimal results

Powerful OCR Processing

  • High-accuracy text extraction using Tesseract OCR
  • Smart document layout analysis
  • Automatic document deskewing and orientation correction
  • Noise reduction and image enhancement
  • Confidence-based text filtering
  • Multi-language support for text recognition in 13+ languages

πŸ“„ Advanced Export Options

  • Multiple export formats:
    • Text files (TXT) with UTF-8 encoding
    • Word documents (DOCX) with formatting preservation
    • Excel spreadsheets (XLSX) for tabular data
  • Smart paragraph and layout detection
  • Header/footer identification
  • Table structure recognition

🎯 Real-Time Processing

  • Live camera feed processing
  • Instant text extraction feedback
  • Progress indicators and status updates
  • Performance optimization for smooth operation
  • Frame rate and resolution control

πŸ›‘οΈ Robust Error Handling

  • Comprehensive input validation
  • Detailed error messaging
  • Automatic retry mechanisms
  • Fallback strategies
  • Secure file handling

πŸ€– AI-Powered Text Summarization

  • Smart Summarization Options:
    • Local Smart Summary - Advanced algorithms for intelligent text condensation
    • AI Summary (OpenRouter Mistral) - High-quality AI-powered summarization
  • Multiple Summary Formats:
    • Paragraph summaries for natural reading
    • Bullet point summaries for quick scanning
    • Key phrase extraction for topic identification
  • Customizable Length Control:
    • Short (25%) - Quick overview in 2-3 sentences
    • Medium (50%) - Balanced summary in 3-4 sentences
    • Long (75%) - Detailed summary in 5-7 sentences
  • PDF Support - Upload and summarize PDF documents directly
  • Rate Limit Management - Real-time API usage monitoring with cooldown display
  • Clean Output - AI responses automatically cleaned for human readability

🌐 Multi-Language Support

  • Support for 13+ languages including:
    • English, French, Spanish, German
    • Italian, Portuguese, Dutch
    • Chinese, Japanese, Korean
    • Russian, Arabic, Hindi
  • Language-specific text recognition optimization
  • Easy language selection via dropdown interface
  • Real-time language switching

πŸ“ Project Structure

VisionScript/
β”œβ”€β”€ src/             # React + TypeScript frontend
β”‚   β”œβ”€β”€ components/  # Reusable UI components
β”‚   β”‚   β”œβ”€β”€ ImageUpload.tsx       # Image upload component
β”‚   β”‚   β”œβ”€β”€ RealTimeDetection.tsx # Camera-based detection
β”‚   β”‚   β”œβ”€β”€ LanguageSelector.tsx  # Language selection component
β”‚   β”‚   └── TextSummarizer.tsx    # Text summarization component
β”‚   └── styles/      # Tailwind CSS styling
β”œβ”€β”€ backend/         # Python Flask backend
β”‚   β”œβ”€β”€ main_test.py # Main application server
β”‚   └── .env         # Environment configuration
β”œβ”€β”€ docs/           # Documentation
└── node_modules/   # Dependencies

πŸš€ Getting Started

  1. Clone the repository:

    git clone git@github.com:imadnan4/VisionScript.git
  2. Navigate to the project directory:

    cd VisionScript
  3. Install dependencies:

    npm install
    cd backend
    pip install -r requirements.txt
  4. Start the development server:

    npm run dev
  5. (Optional) Set up AI Text Summarization:

    • Get a free API key from OpenRouter
    • Add to backend/.env file:
      OPENROUTER_API_KEY=your_api_key_here
      
    • Restart the backend server to enable AI summarization
  6. Using the multilanguage feature:

    • Look for the globe icon in the interface
    • Click to open the language dropdown
    • Select your preferred language for OCR processing
    • The system will automatically optimize text recognition for the selected language
  7. Using the text summarizer:

    • Upload images/PDFs or use camera to extract text
    • Click "Summarize Text" button on extracted text
    • Or go directly to "Text Summarizer" from the home screen
    • Choose between Local Smart Summary or AI Summary (OpenRouter)
    • Select format (paragraph, bullets, key phrases) and length
    • Export summaries in TXT, DOCX, or Excel format

πŸ› οΈ Technical Stack

Frontend

  • React 18+ with TypeScript
  • Vite for fast development
  • Tailwind CSS for modern styling
  • Framer Motion for smooth animations
  • React Context API for state management

Backend

  • Python Flask for API endpoints
  • Tesseract OCR for text extraction
  • EasyOCR for multi-language support
  • OpenCV for image processing
  • python-docx for Word document generation
  • PyPDF2 for PDF text extraction
  • Text Summarization Stack:
    • SUMY library for local extractive summarization
    • NLTK for natural language processing
    • OpenRouter API integration for AI-powered summarization
    • Mistral AI model for high-quality text generation
    • Custom response cleaning algorithms

πŸ”œ Upcoming Features

  • Text Translation - Integrate Google Translate API for multi-language text translation
  • Voice-to-Text - Add speech recognition for audio file transcription
  • Smart Text Comparison - Compare multiple documents and highlight differences
  • Text Analytics Dashboard - Word count, reading time, sentiment analysis, and readability scores
  • Custom Summary Templates - User-defined summary formats (executive summary, meeting notes, etc.)
  • Batch Processing - Process multiple images/PDFs in one operation
  • Text Search & Highlight - Search within extracted text and highlight matches
  • Export to More Formats - Add support for Markdown, HTML, and plain text exports
  • OCR Confidence Scoring - Display confidence levels for extracted text with editing suggestions
  • Dark/Light Theme Toggle - Customizable UI themes for better user experience

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ‘€ Author

🀝 Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

About

Vision-Script is build to solve the problem text digitalization , it use OCR to convert your hard copy to soft copy , its offine so once you set it up you can use it offline without any problem ....

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors