feat(supabase): add SupabaseGroongaDocumentStore and SupabaseGroongaRetriever#3266
feat(supabase): add SupabaseGroongaDocumentStore and SupabaseGroongaRetriever#3266ShubhamGond105 wants to merge 20 commits into
Conversation
…m write behaviour
|
Hi @davidsbatista, count argument type error — need to use CountMethod.exact instead of "exact" string |
|
Can you confirm that the client doesn't support async operations? You did not add any do the |
| ) | ||
|
|
||
| @component.output_types(documents=list[Document]) | ||
| def run( |
There was a problem hiding this comment.
Can you also define an async def run_async(...) ? Does the client supports it?
| # ───────────────────────────────────────────── | ||
| # DOCUMENT STORE TESTS | ||
| # ───────────────────────────────────────────── | ||
|
|
There was a problem hiding this comment.
Please, aggregate the tests into a TestDocumentStore class.
There was a problem hiding this comment.
This is still unresolved
| assert written == 0 | ||
|
|
||
|
|
||
| def test_write_documents_overwrite(groonga_store, mock_supabase_client): |
There was a problem hiding this comment.
Add a test verifying that with DuplicatePolicy.FAIL, it raises DuplicateDocumentError.
There was a problem hiding this comment.
this is still unresolved
|
Hi @davidsbatista, Thank you for the thorough review — the feedback is very helpful. I will address all the points raised. Thank you again for your time! |
|
There are no integration tests at all. Please also note that Integration tests should inherit from Please, read through this article https://docs.haystack.deepset.ai/docs/creating-custom-document-stores to understand in more detail what's missing in your PR |
|
Hi @ShubhamGond105, thanks for your contribution. I left some comments that need to be addressed; also, the type-checking issues need to be fixed. In the future, please make sure the CI is green before asking for a review, unless the reason why is not green is out of your control and needs our action. Thanks once again! |
Coverage report (supabase)Click to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||
|
"Hi @davidsbatista, I've addressed all the feedback points: Moved client initialization to warm_up() All tests passing locally. Please take a look when you get a chance!" |
|
Tests are failing on the CI; we need to have them fixed |
|
There are no integration tests:
Please make sure you go through this, we need to have integration tests. |
| self, | ||
| *, | ||
| supabase_url: str, | ||
| supabase_key: Secret = Secret.from_env_var("SUPABASE_SERVICE_KEY"), |
There was a problem hiding this comment.
| supabase_key: Secret = Secret.from_env_var("SUPABASE_SERVICE_KEY"), | |
| supabase_key: Secret = Secret.from_env_var("SUPABASE_SERVICE_KEY", strict=False) |
| """ | ||
| return self.run(query=query, filters=filters, top_k=top_k) | ||
|
|
||
| def _merge_filters(self, runtime_filters: dict[str, Any] | None) -> dict[str, Any]: |
There was a problem hiding this comment.
This is a shallow dict merge, we should change it and use the AND operator.
Concrete example:
# Init-time filters (set when retriever is constructed)
self.filters = {
"operator": "AND",
"conditions": [
{"field": "meta.language", "operator": "==", "value": "en"}
]
}
# Runtime filters (passed to run())
runtime_filters = {
"operator": "AND",
"conditions": [
{"field": "meta.topic", "operator": "==", "value": "python"}
]
}
What the current code produces:
{
"operator": "AND",
"conditions": [{"field": "meta.topic", "operator": "==", "value": "python"}]
}
The "language == en" condition is dropped. What MERGE should produce:
{
"operator": "AND",
"conditions": [
{"field": "meta.language", "operator": "==", "value": "en"},
{"field": "meta.topic", "operator": "==", "value": "python"}
]
}
Check the apply_filter_policy in haystack.document_stores.types.filter_policy, which handles all combinations of comparison and logical filters correctly. Every other retriever in this repo uses it.
The fix is to drop _merge_filters entirely and replace it with a single call:
from haystack.document_stores.types.filter_policy import apply_filter_policy
In run() / run_async(), replace:
merged_filters = self._merge_filters(filters)
with:
merged_filters = apply_filter_policy(self.filter_policy, self.filters, filters)
Also make sure this is tested, see the Mixin tests in haystack.testing.document_stores
|
|
||
| # Create table if not exists | ||
| create_table_sql = f""" | ||
| CREATE TABLE IF NOT EXISTS {self.table_name} ( |
There was a problem hiding this comment.
self.table_name is interpolated directly into raw SQL strings passed to exec_sql.
We should validate table_name against a strict pattern (e.g., re.fullmatch(r'[a-zA-Z_][a-zA-Z0-9_]*', self.table_name)) in init.
|
There are still a few unresolved issues, and no integration tests - please go through them and let me know if you need any help. When in doubt, look at how other DocumentStores work (e.g., retrievers logic, filters, etc.) and replicate it - ElasticSearch and OpenSearch document stores have good patterns to follow |
Related Issues
SupabaseGroongaDocumentStoreandSupabaseGroongaRetriever#3083Proposed Changes:
Adds
SupabaseGroongaDocumentStoreandSupabaseGroongaRetrieverfor full-text search powered by PGroonga, a PostgreSQL extension for fast multilingual full-text search exposed by Supabase.SupabaseGroongaDocumentStore: stores and searches documents using PGroonga full-text search — no embeddings neededSupabaseGroongaRetriever: retrieves documents using plain text queriesSupabasePgvectorEmbeddingRetrieverin hybrid search pipelinesHow test
19 unit tests covering:
Notes for the reviewer
Checklist
feat: