Skip to content

Incorrect estimation procedure ids #1152

@LaurensKrudde

Description

@LaurensKrudde

Description

The estimation_procedure_id does not always seem to correspond with the displayed estimation procedure. I came across this when reproducing tasks from existing datasets to new datasets.

Steps/Code to Reproduce

For example some tasks from the dataset 'credit-approval' with id 29:

import openml

task_df = openml.tasks.list_tasks(data_id=29, output_format='dataframe').iloc[:5]
print(task_df[['tid', 'estimation_procedure']])

print(openml.tasks.get_task(29).estimation_procedure_id)
print(openml.tasks.get_task(259).estimation_procedure_id)
print(openml.tasks.get_task(1793).estimation_procedure_id)
print(openml.tasks.get_task(88).estimation_procedure_id)
print(openml.tasks.get_task(1728).estimation_procedure_id)

gives:

       tid             estimation_procedure
29      29          10-fold Crossvalidation
88      88  10 times 10-fold Learning Curve
259    259                  33% Holdout set
1728  1728           10-fold Learning Curve
1793  1793   5 times 2-fold Crossvalidation

1
1
1
13
13

Expected Results

The first three should have estimation_procedure_id 1, 6 and 2.
The first three should have estimation_procedure_id 3 and 13.

Actual Results

Actually the first three all have id 1. While the last two both have id 13.

Versions

Windows-10-10.0.19043-SP0
Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)]
NumPy 1.22.0
SciPy 1.8.0
Scikit-Learn 1.0.2
OpenML 0.12.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugserversideThese issues are present in the rest API and not fixable by the Python package.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions