Skip to content

Conversation

@arybhatt4533
Copy link

@arybhatt4533 arybhatt4533 commented Jan 8, 2026

Summary of Changes
I have implemented several fixes to improve the backend's stability and search result ranking.

Key Updates:

Robust JSON Parsing: Added a clean_and_parse_json utility to handle cases where the LLM returns markdown formatting or extra text. This prevents the system from crashing during keyword and intent extraction.

Search Ranking (RRF): Integrated Reciprocal Rank Fusion (RRF) logic to merge results from Knowledge Space and Vector searches. This ensures that the most relevant datasets from both sources are prioritized in the final response.

Lightweight Testing (Torch Bypass): Added an is_enabled flag and commented out the Retriever import in VectorSearchAgent. This allows for local testing and API validation without needing to download heavy torch dependencies.

Intent Handling: Refined the logic to better distinguish between general greetings and actual data discovery queries, preventing unnecessary search triggers.

Verification: The backend has been verified using the FastAPI Swagger UI. The API returns a 200 OK status, and the fusion logic correctly processes search results even with the vector bypass active.
Closes #8

@Areeba-Tahir-18
Copy link

hi arybhatt !! Do you know who is mentoring this repo of the INCF ? repo of Knowledge space agent ? who is mentor ? how to reach .. etc ?

@arybhatt4533
Copy link
Author

Hi @Areeba-Tahir-18
I’m not entirely sure about a single assigned mentor for this repository. From what I understand, this repo is maintained under INCF, and mentoring usually happens through the official INCF / Knowledge Space Agent channels rather than one fixed mentor.
If you’re exploring this for GSoC 2026 (like me), I’d suggest checking the repository README, INCF website, or opening an issue/discussion in the repo to ask about mentors and points of contact. The maintainers listed there would be the right people to reach out to

@Areeba-Tahir-18
Copy link

Areeba-Tahir-18 commented Jan 11, 2026

Thank you !! Yess i am trying to reach out from a month . Alomost explored websites , tried to contact them via email , but still................... no response . Let's see

@arybhatt4533
Copy link
Author

Totally understand — reaching out can be slow sometimes. INCF and similar orgs often take time to respond, especially outside official GSoC timelines.
I think you’re doing the right things already (exploring the website, emails). One thing that sometimes helps is opening a GitHub issue or discussion in the repo with a clear GSoC-related question — maintainers are usually more active there.
Hopefully we’ll get some clarity as things move closer to GSoC 2026. Let’s see 🤞 @Areeba-Tahir-18

@Areeba-Tahir-18
Copy link

Yess !! I am exploring and doing all this for GSOC 2026 like you . Will open my pull request soon... Thanks @arybhatt4533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Improving chunk quality and metadata grounding in ingestion pipeline

2 participants