Enhance estimator validation and implement "fit-if-needed" logic#2
Enhance estimator validation and implement "fit-if-needed" logic#2RektPunk wants to merge 2 commits intoxRiskLab:mainfrom
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds standardized estimator validation and conditional fitting logic to the Pearsonify wrapper, ensuring the wrapped estimator is a scikit-learn classifier with probability estimates and only fitting it when necessary. Sequence diagram for Pearsonify fit-if-needed estimator validationsequenceDiagram
participant Pearsonify
participant Estimator
participant SklearnUtils
Pearsonify->>SklearnUtils: check_is_fitted(estimator)
alt estimator is fitted
SklearnUtils-->>Pearsonify: return
Pearsonify->>Estimator: hasattr(predict_proba)
alt has predict_proba
Pearsonify-->>Estimator: proceed without fitting
else missing predict_proba
Pearsonify-->>Pearsonify: raise TypeError(Estimator validation failed)
end
else estimator not fitted
SklearnUtils-->>Pearsonify: raise NotFittedError
Pearsonify->>Estimator: fit(X_train, y_train)
end
Pearsonify->>Estimator: predict_proba(X_cal)
Estimator-->>Pearsonify: y_cal_pred_proba
Class diagram for Pearsonify wrapper with estimator validationclassDiagram
class BaseEstimator
class Estimator {
fit(X_train, y_train)
predict_proba(X)
}
BaseEstimator <|-- Estimator
class Pearsonify {
- estimator: BaseEstimator
- alpha: float
+ __init__(estimator, alpha)
+ fit(X_train, y_train, X_cal, y_cal)
}
Pearsonify --> BaseEstimator: wraps
class SklearnUtils {
+ check_is_fitted(estimator)
+ NotFittedError
}
Pearsonify ..> SklearnUtils: uses for validation
Pearsonify ..> Estimator: calls fit, predict_proba
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
predict_probainterface check is only performed inside thecheck_is_fittedtry block, so if the estimator is not fitted you skip this check entirely; consider validating the presence ofpredict_probaregardless of fitted status to avoid runtime errors later. - Catching
TypeErroraround the whole validation block and then re‑raising a newTypeErrorcan obscure the original source of the error; you might narrow thetryscope or preserve the original exception type/message directly instead of wrapping it generically. - If you truly want to enforce that the estimator is a scikit‑learn estimator, consider explicitly checking
isinstance(estimator, BaseEstimator)or a similar check early in__init__rather than only relying oncheck_is_fittedinfit.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `predict_proba` interface check is only performed inside the `check_is_fitted` try block, so if the estimator is not fitted you skip this check entirely; consider validating the presence of `predict_proba` regardless of fitted status to avoid runtime errors later.
- Catching `TypeError` around the whole validation block and then re‑raising a new `TypeError` can obscure the original source of the error; you might narrow the `try` scope or preserve the original exception type/message directly instead of wrapping it generically.
- If you truly want to enforce that the estimator is a scikit‑learn estimator, consider explicitly checking `isinstance(estimator, BaseEstimator)` or a similar check early in `__init__` rather than only relying on `check_is_fitted` in `fit`.
## Individual Comments
### Comment 1
<location> `pearsonify/wrapper.py:30-32` </location>
<code_context>
- self.estimator.fit(X_train, y_train)
+ try:
+ check_is_fitted(self.estimator)
+ if not hasattr(self.estimator, "predict_proba"):
+ raise TypeError("The estimator must have 'predict_proba' method.")
+ except TypeError as e:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Tighten the `predict_proba` check to ensure it is callable, not just present.
`hasattr` only checks for the presence of the attribute, which might be `None` or non-callable. To avoid runtime errors when invoking it, use `callable(getattr(self.estimator, "predict_proba", None))` instead.
```suggestion
check_is_fitted(self.estimator)
if not callable(getattr(self.estimator, "predict_proba", None)):
raise TypeError("The estimator must have a callable 'predict_proba' method.")
```
</issue_to_address>
### Comment 2
<location> `pearsonify/wrapper.py:33-34` </location>
<code_context>
+ check_is_fitted(self.estimator)
+ if not hasattr(self.estimator, "predict_proba"):
+ raise TypeError("The estimator must have 'predict_proba' method.")
+ except TypeError as e:
+ raise TypeError(f"Estimator validation failed: {e}") from e
+ except NotFittedError:
+ # Attempt to fit the estimator if not already fitted
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Narrow or simplify the `TypeError` wrapping to avoid redundant exception handling.
This `except TypeError` will also catch the `TypeError` you raise for a missing `predict_proba`, only to re-wrap it with a slightly changed message, and it may also hide unrelated `TypeError`s from inside `check_is_fitted` or the estimator. Consider either validating via explicit checks and not catching `TypeError` at all, or using/narrowing to a custom exception type you control for your own validation failure.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| except TypeError as e: | ||
| raise TypeError(f"Estimator validation failed: {e}") from e |
There was a problem hiding this comment.
suggestion (bug_risk): Narrow or simplify the TypeError wrapping to avoid redundant exception handling.
This except TypeError will also catch the TypeError you raise for a missing predict_proba, only to re-wrap it with a slightly changed message, and it may also hide unrelated TypeErrors from inside check_is_fitted or the estimator. Consider either validating via explicit checks and not catching TypeError at all, or using/narrowing to a custom exception type you control for your own validation failure.
|
Hi @xRiskLab , I just wanted to follow up on this PR. I know you're likely busy, but I’d appreciate it if you could take a look when you have a moment. Let me know if there’s anything I should clarify or update. Thanks! |
Description
This PR improves the Pearsonify class by implementing a standardized validation workflow. It ensures the input estimator is a valid Scikit-learn instance and a classifier capable of probability estimation.
Key Changes
check_is_fittedandhasattrto verify both the state (fit) and the required interface (predict_proba) of the estimator.Summary by Sourcery
Validate estimators before use in Pearsonify and add conditional fitting behavior to avoid redundant training.
Bug Fixes:
Enhancements: