Skip to content

Conversation

@HARSHDIPSAHA
Copy link

Fixes issue #16
Currently, the KnowledgeSpace AI agent relies only on text-based queries, while a large amount of
neuroscience metadata remains locked inside non-editable formats such as figures, tables, and
presentation screenshots. This PR introduces a multimodal pipeline that allows users to upload
images and automatically convert them into refined, high-signal search queries.

The feature includes:

  • A React-based upload interface with an auto-expanding search box for long scientific queries.
  • A FastAPI backend endpoint that performs OCR using Pytesseract.
  • An intelligence layer using Gemini 2.0 Flash-Lite for zero-shot Named Entity Recognition (NER).
  • A refined prompt that outputs only a clean, comma-separated list of scientific entities,
    preventing conversational noise from polluting the search engine.

Impact:

  • Unlocks neuroscience metadata trapped in images.
  • Improves dataset discovery precision and recall.
  • Enables a fully multimodal search workflow for KnowledgeSpace.

@HARSHDIPSAHA
Copy link
Author

@QuantumByte-01 please review.

k
Removed comment about adding method to assistant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant