feat(dataset-versioning): support running versioned experiments #1517
+88
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Important
This PR adds support for running experiments on versioned datasets by propagating a version timestamp through
get_dataset()intoDatasetClientand using it during experiment execution, verified by a new integration test.versiontimestamp toget_dataset()inclient.py, propagating it toDatasetClient.datasetVersionindataset_run_items.create(...)during experiment execution inclient.py.DatasetClientindatasets.pyto store and useversiontimestamp.run_experiment()inclient.pyto accept_dataset_versionand pass it to_run_experiment_async().test_run_experiment_with_versioned_dataset()intest_datasets.pyto verify versioned dataset behavior.test_datasets.py.This description was created by
for 7de4f28. You can customize this summary. It will automatically update as commits are pushed.
Disclaimer: Experimental PR review
Greptile Overview
Greptile Summary
This PR threads an optional dataset “version” timestamp through
get_dataset()intoDatasetClient, and then forwards that timestamp into experiment execution so thatdataset_run_items.create(...)is called withdatasetVersionwhen running experiments.The change mainly touches the client-facing dataset wrapper (
langfuse/_client/datasets.py) and the experiment runner in the main client (langfuse/_client/client.py), plus adds an integration test to verify that a versioned dataset only runs its historical items.Confidence Score: 4/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant U as User code participant DC as DatasetClient participant LC as Langfuse client participant API as Langfuse API U->>LC: get_dataset(name, version=ts) LC->>API: datasets.get(dataset_name) loop paginate items LC->>API: dataset_items.list(dataset_name, page, limit, version=ts) API-->>LC: page of items end LC-->>U: DatasetClient(items, version=ts) U->>DC: run_experiment(...) DC->>LC: run_experiment(..., _dataset_version=DC.version) LC->>LC: _run_experiment_async(..., dataset_version=_dataset_version) loop each dataset item LC->>API: dataset_run_items.create(runName, datasetItemId, traceId, observationId, datasetVersion=ts) API-->>LC: dataset_run_item (dataset_run_id) end LC-->>U: ExperimentResult(item_results, dataset_run_id/url)(5/5) You can turn off certain types of comments like style here!
Context used:
dashboard- Move imports to the top of the module instead of placing them within functions or methods. (source)