-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Search RPCs for Remote Registry #290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vineet-belur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to see if there are resources other than this for understanding the remote registry further.
Is the plan to have a registry 1:1 with a feature server? Would deployment happen concurrently with feature-server or as a separate step?
| self, | ||
| search_text: str = "", | ||
| updated_at: Optional[datetime] = None, | ||
| page_size: int = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Do you think it makes sense to default to unsized if page_size is unspecified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they don't pass in a page_size, I think 10 is a better default.
We don't want to query for more results than necessary, and on AI Workbench, all of the searches default to 10 unless the user changes it.
| count_stmt = select(func.count(projects.c.project_id.distinct())) | ||
| if search_text: | ||
| count_stmt = count_stmt.where( | ||
| projects.c.project_id.like(f"%{search_text}%") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if team and fv.tags.get("team") != team: | ||
| add_to_results = False | ||
|
|
||
| if created_at: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can un-nest these if statements, and short-circuits in python.
| ) | ||
|
|
||
| projects_and_related_feature_views = [] | ||
| with ThreadPoolExecutor() as executor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Are we bypassing the GIL with this? I'm not too familiar with concurrent.futures.threadpoolexecutor.
If we're not then this is a cpu-bound workload so it may be advisable to, at minimum, add a max_workers parameter to threadpoolexecutor(defaults to min(32, (os.process_cpu_count() or 1) + 4)) or just do the work sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, you're right we should be using ProcessPoolExecutor rather than ThreadPoolExecutor if anything.
We weren't able to do this request sequentially in the past because of the fact that there were no project objects, so this request with 100 projects requested could take >10 seconds to execute.
Now that we have project protos, we might be able to do it sequentially. I'll switch to ProcessPoolExecutor and test it with or Test data, then remove it if it's unnecessary.
That's pretty much the only thing available. I think we have an old document for the AI Workbench requirements that I'll try to find.
No this is separate from a feature server. This is meant to replace https://github.expedia.biz/eg-ai-platform/eg-feature-store-registry in the future |
piket
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ts (#260) * fix: Update Docker images to use official sources and improve workflow triggers * docs: Update README to include instructions for running integration tests --------- Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com>
#261) * fix: Only add sort key filter models to list if they meet the conditions. Split read range int tests into its own file. * add pauses in int test to allow for apply/materialize to compelete * set int tests to not run in parallel
* adding snake case to include metadata check * fixing formatting * added trim space to string function
#263) * fix: Pack and unpack repeated list values into and out of arrow arrays. Restructure integration tests to properly separate concerns * Throw error when requested feature service includes both view types * Re-use CopyProtoValuesToArrowArray instead of duplicating switch logic.
* Adding update changes * fixing tests * fixing linting * addressing PR comments * fixing unit test * Added in sort order check * using feast object * removing return type from validation * Adding another exception type * Addressing PR comments to add an entity check * fixing tests * Addressing PR comments * fixing test assertion * fixing formatting
* fix: Validate requested features exist in view. * add test cases for invalid features in feature service * reduce time complexity and duplicate checks for feature validation * use if-else blocks
* feat: seperate entities into their own field in range query response * feat: seperate entities from features in http range query response * change range query response name back to results instead of features * fix http server unit test * add debugging logs * modify getOnlineFeaturesRange logic * fix grpc range query code and integration tests * fix: Javadoc errors (#270) * update the function description to fix javdoc errors. * fix: Formatting --------- Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * fix: dont add entities to feature vectors * fix: Formatting javadocs (#271) * update the function description to fix javdoc errors. * fix: Formatting * formatting * fix: formatting * fix linting errors * fix params --------- Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * fix: Formatting (#272) * update the function description to fix javdoc errors. * fix: Formatting * formatting * fix: formatting * fix linting errors * fix params * fix javadoc error --------- Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * fix: Formatting (#273) * update the function description to fix javdoc errors. * fix: Formatting * formatting * fix: formatting * fix linting errors * fix params * fix javadoc error * Update FeastClient.java --------- Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * feat: Add getOnlineFeature and getOnlineFeaturesRange overloaded methods wi… (#275) * feat: add getOnlineFeature and getOnlineFeaturesRange overloaded methods without project Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * update java client for separate entities in response * Update tests to reflect changes * fix processFeatureVector tests * throw exception when more rows than entity values * change based on pr feedback * pr feedback --------- Co-authored-by: omiranda <omiranda@expediagroup.com> Co-authored-by: vanitabhagwat <92561664+vanitabhagwat@users.noreply.github.com> Co-authored-by: vbhagwat <vbhagwat@expediagroup.com>
* fix: Http range timestamp values should return in rfc3339 format to match how get features timestamps are returned * use the exact timestamp format arrow marshalling uses
* added another exception type * raising exceptions in the http methods instead * calling handle_exception method instead of a raise * Added a TODO comment
…ound null values. (#279) * fix: Timestamps should be seconds percision. Return null status for found null values. * convert UnixTimestamp sort filters as time.Time * use consistent time for type conversion test * fix materializing timestamps as seconds and java converting time values into epoch seconds * fix linting * fix test * fix test
…s. (#281) * fix: Go server errors should be status errors with proper status codes. FeastClient handles status exceptions as differentiated FeastExceptions. * fix and expand tests * add more test cases
* fix: Validation order for multiple missing end keys should pass * fix int test and create test for only an equals filter * add conversions of int32 to higher values for http
* fix: Handle null values in Arrow list conversion to Proto values * fix: Fixed the nil representation in http * fix: Correct variable name for null check in ArrowListToProtoList function * fix: Null value handling for types.Values * fix: Fixed the nil representation in http * fix: Fixed the issue with NOT_FOUND values representation and repeated null values * fix: Fixed the nil representation * fix: Fixed the nil representation in http * fix: Removed duplicate test * fix: Added status handling for feature values and ensured consistency in value-status mapping * fix: Fixed issue with timestamps in HTTP * fix: Fixed failing tests * fix: Simplified null value handling in type conversion --------- Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com>
#286) Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com>
#288) * feat: Add versioning information to the server and expose it via a new endpoint 1. Version information is exposed via endpoints 2. It published information to datadog as well 3. Version information is printed on pod startup * fix: fixed test failures * fix: fixed test failures * fix: added comment * fix: update GetVersionInfo method and related proto definitions * fix: set server type in version information for HTTP, gRPC, and hybrid servers * fix: refactor Datadog version info publishing to a separate function * fix: improve error handling for statsd client flush and close operations --------- Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com>
…ent sql injection
78ed008 to
64ea0e8
Compare
What this PR does / why we need it:
RegistryServer.protoregistry_server.pyregistry_server.pyregistry calls tosql.pysearch.pyto create Python objects for related RPCssearch.pyintest_search.pyWhich issue(s) this PR fixes:
EAPC-19174
Misc
Original PR: #135