Skip to content

feat(supabase): add SupabaseGroongaDocumentStore and SupabaseGroongaRetriever#3266

Open
ShubhamGond105 wants to merge 20 commits into
deepset-ai:mainfrom
ShubhamGond105:feat/supabase-groonga
Open

feat(supabase): add SupabaseGroongaDocumentStore and SupabaseGroongaRetriever#3266
ShubhamGond105 wants to merge 20 commits into
deepset-ai:mainfrom
ShubhamGond105:feat/supabase-groonga

Conversation

@ShubhamGond105
Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

Adds SupabaseGroongaDocumentStore and SupabaseGroongaRetriever for full-text search powered by PGroonga, a PostgreSQL extension for fast multilingual full-text search exposed by Supabase.

How test

19 unit tests covering:

  • Default and custom init parameters
  • write_documents, delete_documents, filter_documents, count_documents
  • Serialization round-trip (to_dict / from_dict)
  • Retriever init, run, and serialization
  • Empty query handling
  • DuplicatePolicy (OVERWRITE, SKIP, FAIL) handling

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: feat:

@ShubhamGond105 ShubhamGond105 requested a review from a team as a code owner May 2, 2026 17:49
@ShubhamGond105 ShubhamGond105 requested review from sjrl and removed request for a team May 2, 2026 17:49
@github-actions github-actions Bot added integration:chroma integration:supabase type:documentation Improvements or additions to documentation labels May 2, 2026
@sjrl sjrl requested review from davidsbatista and removed request for sjrl May 8, 2026 14:14
@ShubhamGond105
Copy link
Copy Markdown
Contributor Author

Hi @davidsbatista,
I'm actively working on fixing the remaining CI issues:

count argument type error — need to use CountMethod.exact instead of "exact" string
result.data union-attr error in _groonga_retrieval — need stronger type guard
Will push fixes shortly.
Happy to get any other feedback in the meantime!

@davidsbatista
Copy link
Copy Markdown
Contributor

Can you confirm that the client doesn't support async operations? You did not add any do the SupabaseGroongaDocumentStore

)

@component.output_types(documents=list[Document])
def run(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also define an async def run_async(...) ? Does the client supports it?

# ─────────────────────────────────────────────
# DOCUMENT STORE TESTS
# ─────────────────────────────────────────────

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, aggregate the tests into a TestDocumentStore class.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still unresolved

assert written == 0


def test_write_documents_overwrite(groonga_store, mock_supabase_client):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test verifying that with DuplicatePolicy.FAIL, it raises DuplicateDocumentError.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still unresolved

@ShubhamGond105
Copy link
Copy Markdown
Contributor Author

Hi @davidsbatista,

Thank you for the thorough review — the feedback is very helpful.

I will address all the points raised.
I will push the fixes shortly.

Thank you again for your time!

@davidsbatista
Copy link
Copy Markdown
Contributor

There are no integration tests at all. Please also note that Integration tests should inherit from DocumentStoreBaseTests (and optionally DocumentStoreBaseExtendedTests) to cover all standard document store behaviours. This is completely absent.

Please, read through this article https://docs.haystack.deepset.ai/docs/creating-custom-document-stores to understand in more detail what's missing in your PR

@davidsbatista
Copy link
Copy Markdown
Contributor

Hi @ShubhamGond105, thanks for your contribution. I left some comments that need to be addressed; also, the type-checking issues need to be fixed.

In the future, please make sure the CI is green before asking for a review, unless the reason why is not green is out of your control and needs our action.

Thanks once again!

@github-actions
Copy link
Copy Markdown
Contributor

Coverage report (supabase)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/supabase/src/haystack_integrations/components/retrievers/supabase
  groonga_retriever.py 133-135
  integrations/supabase/src/haystack_integrations/document_stores/supabase
  groonga_document_store.py 93-94, 97, 125-126, 141-142, 172-179, 196-197, 216-217, 223-227, 238-239, 260-261, 273, 285-314
Project Total  

This report was generated by python-coverage-comment-action

@ShubhamGond105
Copy link
Copy Markdown
Contributor Author

"Hi @davidsbatista, I've addressed all the feedback points:

Moved client initialization to warm_up()
Added DocumentStore base class
Fixed default DuplicatePolicy to FAIL
Implemented filter support
Added run_async() to retriever
Fixed all mypy and lint errors

All tests passing locally. Please take a look when you get a chance!"

@davidsbatista
Copy link
Copy Markdown
Contributor

Tests are failing on the CI; we need to have them fixed

@davidsbatista
Copy link
Copy Markdown
Contributor

There are no integration tests:
- Zero integration tests exist; the checklist requires at minimum inheriting DocumentStoreBaseTests
- Add a Docker Compose service with PGroonga, or add @pytest.mark.integration tests that skip when the service is unavailable

There are no integration tests at all. Please also note that Integration tests should inherit from DocumentStoreBaseTests (and optionally DocumentStoreBaseExtendedTests) to cover all standard document store behaviours. This is completely absent.

Please, read through this article https://docs.haystack.deepset.ai/docs/creating-custom-document-stores to understand in more detail what's missing in your PR

Please make sure you go through this, we need to have integration tests.

self,
*,
supabase_url: str,
supabase_key: Secret = Secret.from_env_var("SUPABASE_SERVICE_KEY"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
supabase_key: Secret = Secret.from_env_var("SUPABASE_SERVICE_KEY"),
supabase_key: Secret = Secret.from_env_var("SUPABASE_SERVICE_KEY", strict=False)

"""
return self.run(query=query, filters=filters, top_k=top_k)

def _merge_filters(self, runtime_filters: dict[str, Any] | None) -> dict[str, Any]:
Copy link
Copy Markdown
Contributor

@davidsbatista davidsbatista May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a shallow dict merge, we should change it and use the AND operator.

Concrete example:

  # Init-time filters (set when retriever is constructed)
  self.filters = {
      "operator": "AND",
      "conditions": [
          {"field": "meta.language", "operator": "==", "value": "en"}
      ]
  }

  # Runtime filters (passed to run())
  runtime_filters = {
      "operator": "AND",
      "conditions": [
          {"field": "meta.topic", "operator": "==", "value": "python"}
      ]
  }

What the current code produces:

{
     "operator": "AND",
     "conditions": [{"field": "meta.topic", "operator": "==", "value": "python"}]
}

The "language == en" condition is dropped. What MERGE should produce:

  {
      "operator": "AND",
      "conditions": [
          {"field": "meta.language", "operator": "==", "value": "en"},
          {"field": "meta.topic", "operator": "==", "value": "python"}
      ]
  }

Check the apply_filter_policy in haystack.document_stores.types.filter_policy, which handles all combinations of comparison and logical filters correctly. Every other retriever in this repo uses it.

The fix is to drop _merge_filters entirely and replace it with a single call:

from haystack.document_stores.types.filter_policy import apply_filter_policy

In run() / run_async(), replace:
merged_filters = self._merge_filters(filters)

with:
merged_filters = apply_filter_policy(self.filter_policy, self.filters, filters)

Also make sure this is tested, see the Mixin tests in haystack.testing.document_stores


# Create table if not exists
create_table_sql = f"""
CREATE TABLE IF NOT EXISTS {self.table_name} (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.table_name is interpolated directly into raw SQL strings passed to exec_sql.

We should validate table_name against a strict pattern (e.g., re.fullmatch(r'[a-zA-Z_][a-zA-Z0-9_]*', self.table_name)) in init.

@davidsbatista
Copy link
Copy Markdown
Contributor

There are still a few unresolved issues, and no integration tests - please go through them and let me know if you need any help.

When in doubt, look at how other DocumentStores work (e.g., retrievers logic, filters, etc.) and replicate it - ElasticSearch and OpenSearch document stores have good patterns to follow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SupabaseGroongaDocumentStore and SupabaseGroongaRetriever

2 participants