Skip to content

Conversation

@zabarn
Copy link

@zabarn zabarn commented Jul 22, 2025

What this PR does / why we need it:

  1. Adds Search RPCs to RegistryServer.proto
  2. Adds RPC implementations in registry_server.py
  3. Adds support for registry_server.py registry calls to sql.py
  4. Adds search.py to create Python objects for related RPCs
  5. Adds tests for search.py in test_search.py

Which issue(s) this PR fixes:

EAPC-19174

Misc

Original PR: #135

Copy link

@vineet-belur vineet-belur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to see if there are resources other than this for understanding the remote registry further.

Is the plan to have a registry 1:1 with a feature server? Would deployment happen concurrently with feature-server or as a separate step?

self,
search_text: str = "",
updated_at: Optional[datetime] = None,
page_size: int = 10,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do you think it makes sense to default to unsized if page_size is unspecified?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they don't pass in a page_size, I think 10 is a better default.

We don't want to query for more results than necessary, and on AI Workbench, all of the searches default to 10 unless the user changes it.

count_stmt = select(func.count(projects.c.project_id.distinct()))
if search_text:
count_stmt = count_stmt.where(
projects.c.project_id.like(f"%{search_text}%")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cwe-89

Not sure if this repo gets flagged for vulnerabilities.

Same comment below.

if team and fv.tags.get("team") != team:
add_to_results = False

if created_at:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can un-nest these if statements, and short-circuits in python.

)

projects_and_related_feature_views = []
with ThreadPoolExecutor() as executor:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Are we bypassing the GIL with this? I'm not too familiar with concurrent.futures.threadpoolexecutor.

If we're not then this is a cpu-bound workload so it may be advisable to, at minimum, add a max_workers parameter to threadpoolexecutor(defaults to min(32, (os.process_cpu_count() or 1) + 4)) or just do the work sequentially.

Copy link
Author

@zabarn zabarn Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you're right we should be using ProcessPoolExecutor rather than ThreadPoolExecutor if anything.

We weren't able to do this request sequentially in the past because of the fact that there were no project objects, so this request with 100 projects requested could take >10 seconds to execute.

Now that we have project protos, we might be able to do it sequentially. I'll switch to ProcessPoolExecutor and test it with or Test data, then remove it if it's unnecessary.

@zabarn
Copy link
Author

zabarn commented Jul 30, 2025

Wanted to see if there are resources other than this for understanding the remote registry further.

That's pretty much the only thing available. I think we have an old document for the AI Workbench requirements that I'll try to find.

Is the plan to have a registry 1:1 with a feature server? Would deployment happen concurrently with feature-server or as a separate step?

No this is separate from a feature server. This is meant to replace https://github.expedia.biz/eg-ai-platform/eg-feature-store-registry in the future

Copy link
Collaborator

@piket piket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zabarn zabarn changed the base branch from feature/range-query-improvements to master August 25, 2025 15:17
EXPEbdodla and others added 26 commits August 25, 2025 10:22
…ts (#260)

* fix: Update Docker images to use official sources and improve workflow triggers

* docs: Update README to include instructions for running integration tests

---------

Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com>
#261)

* fix: Only add sort key filter models to list if they meet the conditions. Split read range int tests into its own file.

* add pauses in int test to allow for apply/materialize to compelete

* set int tests to not run in parallel
* adding snake case to include metadata check

* fixing formatting

* added trim space to string function
#263)

* fix: Pack and unpack repeated list values into and out of arrow arrays. Restructure integration tests to properly separate concerns

* Throw error when requested feature service includes both view types

* Re-use CopyProtoValuesToArrowArray instead of duplicating switch logic.
* Adding update changes

* fixing tests

* fixing linting

* addressing PR comments

* fixing unit test

* Added in sort order check

* using feast object

* removing return type from validation

* Adding another exception type

* Addressing PR comments to add an entity check

* fixing tests

* Addressing PR comments

* fixing test assertion

* fixing formatting
* fix: Validate requested features exist in view.

* add test cases for invalid features in feature service

* reduce time complexity and duplicate checks for feature validation

* use if-else blocks
* feat: seperate entities into their own field in range query response

* feat: seperate entities from features in http range query response

* change range query response name back to results instead of features

* fix http server unit test

* add debugging logs

* modify getOnlineFeaturesRange logic

* fix grpc range query code and integration tests

* fix: Javadoc errors (#270)

* update the function description to fix javdoc errors.

* fix: Formatting

---------

Co-authored-by: vbhagwat <vbhagwat@expediagroup.com>

* fix: dont add entities to feature vectors

* fix: Formatting javadocs (#271)

* update the function description to fix javdoc errors.

* fix: Formatting

* formatting

* fix: formatting

* fix linting errors

* fix params

---------

Co-authored-by: vbhagwat <vbhagwat@expediagroup.com>

* fix: Formatting (#272)

* update the function description to fix javdoc errors.

* fix: Formatting

* formatting

* fix: formatting

* fix linting errors

* fix params

* fix javadoc error

---------

Co-authored-by: vbhagwat <vbhagwat@expediagroup.com>

* fix: Formatting (#273)

* update the function description to fix javdoc errors.

* fix: Formatting

* formatting

* fix: formatting

* fix linting errors

* fix params

* fix javadoc error

* Update FeastClient.java

---------

Co-authored-by: vbhagwat <vbhagwat@expediagroup.com>

* feat: Add getOnlineFeature and getOnlineFeaturesRange overloaded methods wi… (#275)

* feat: add getOnlineFeature and getOnlineFeaturesRange overloaded methods without project

Co-authored-by: vbhagwat <vbhagwat@expediagroup.com>

* update java client for separate entities in response

* Update tests to reflect changes

* fix processFeatureVector tests

* throw exception when more rows than entity values

* change based on pr feedback

* pr feedback

---------

Co-authored-by: omiranda <omiranda@expediagroup.com>
Co-authored-by: vanitabhagwat <92561664+vanitabhagwat@users.noreply.github.com>
Co-authored-by: vbhagwat <vbhagwat@expediagroup.com>
* fix: Http range timestamp values should return in rfc3339 format to match how get features timestamps are returned

* use the exact timestamp format arrow marshalling uses
* added another exception type

* raising exceptions in the http methods instead

* calling handle_exception method instead of a raise

* Added a TODO comment
…ound null values. (#279)

* fix: Timestamps should be seconds percision. Return null status for found null values.

* convert UnixTimestamp sort filters as time.Time

* use consistent time for type conversion test

* fix materializing timestamps as seconds and java converting time values into epoch seconds

* fix linting

* fix test

* fix test
…s. (#281)

* fix: Go server errors should be status errors with proper status codes. FeastClient handles status exceptions as differentiated FeastExceptions.

* fix and expand tests

* add more test cases
* fix: Validation order for multiple missing end keys should pass

* fix int test and create test for only an equals filter

* add conversions of int32 to higher values for http
* fix: Handle null values in Arrow list conversion to Proto values

* fix: Fixed the nil representation in http

* fix: Correct variable name for null check in ArrowListToProtoList function

* fix: Null value handling for types.Values

* fix: Fixed the nil representation in http

* fix: Fixed the issue with NOT_FOUND values representation and repeated null values

* fix: Fixed the nil representation

* fix: Fixed the nil representation in http

* fix: Removed duplicate test

* fix: Added status handling for feature values and ensured consistency in value-status mapping

* fix: Fixed issue with timestamps in HTTP

* fix: Fixed failing tests

* fix: Simplified null value handling in type conversion

---------

Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com>
#286)

Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com>
#288)

* feat: Add versioning information to the server and expose it via a new endpoint

1. Version information is exposed via endpoints
2. It published information to datadog as well
3. Version information is printed on pod startup

* fix: fixed test failures

* fix: fixed test failures

* fix: added comment

* fix: update GetVersionInfo method and related proto definitions

* fix: set server type in version information for HTTP, gRPC, and hybrid servers

* fix: refactor Datadog version info publishing to a separate function

* fix: improve error handling for statsd client flush and close operations

---------

Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com>
@zabarn zabarn force-pushed the feature/remote-registry-support branch from 78ed008 to 64ea0e8 Compare August 25, 2025 15:34
@zabarn zabarn closed this Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants