Multimodal KS #27

HARSHDIPSAHA · 2026-01-11T06:51:18Z

Fixes issue #16
Currently, the KnowledgeSpace AI agent relies only on text-based queries, while a large amount of
neuroscience metadata remains locked inside non-editable formats such as figures, tables, and
presentation screenshots. This PR introduces a multimodal pipeline that allows users to upload
images and automatically convert them into refined, high-signal search queries.

The feature includes:

A React-based upload interface with an auto-expanding search box for long scientific queries.
A FastAPI backend endpoint that performs OCR using Pytesseract.
An intelligence layer using Gemini 2.0 Flash-Lite for zero-shot Named Entity Recognition (NER).
A refined prompt that outputs only a clean, comma-separated list of scientific entities,
preventing conversational noise from polluting the search engine.

Impact:

Unlocks neuroscience metadata trapped in images.
Improves dataset discovery precision and recall.
Enables a fully multimodal search workflow for KnowledgeSpace.

HARSHDIPSAHA · 2026-01-13T04:39:13Z

@QuantumByte-01 please review.

Removed comment about adding method to assistant.

has been a week

42ccf83

k

cc4c9a0

Removed comment about adding method to assistant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multimodal KS #27

Multimodal KS #27

Uh oh!

HARSHDIPSAHA commented Jan 11, 2026

Uh oh!

HARSHDIPSAHA commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Multimodal KS #27

Are you sure you want to change the base?

Multimodal KS #27

Uh oh!

Conversation

HARSHDIPSAHA commented Jan 11, 2026

Uh oh!

HARSHDIPSAHA commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant