Families/ adding base version of exponential family#67
Conversation
779438f to
12516ff
Compare
f67f3fd to
2d470c9
Compare
2d470c9 to
189afea
Compare
1605771 to
1d58077
Compare
83f5584 to
2500678
Compare
| if hasattr(dot, "__len__"): | ||
| dot = dot[0] | ||
|
|
||
| result = np.log(self._normalization(x)) + dot + self._log_partition(theta) |
There was a problem hiding this comment.
The formula does not correspond to the docstring and standard notation I guess.
wiki:
your code:
There was a problem hiding this comment.
Yes, but I decided that + log-partition function is easier to write and check formulas. This is described in docstring of class
ec832d5 to
d5a2be1
Compare
d5a2be1 to
804cfa4
Compare
| def posterior_density(parametrization: Parametrization, x: NumericArray) -> Number: | ||
| parametrization = cast(ExponentialConjugateHyperparameters, parametrization) | ||
| return cast( | ||
| np.float32, | ||
| self._normalization(x) | ||
| * conjugate_log_partition(parametrization) | ||
| / conjugate_log_partition( | ||
| self.posterior_hyperparameters(parametrizaiton=parametrization, sample=[x]) | ||
| ), | ||
| ) | ||
|
|
||
| family = ParametricFamily( | ||
| name=f"PosteriorPredictive{self.name}", | ||
| distr_type=UnivariateContinuous, | ||
| distr_characteristics={CharacteristicName.PDF: posterior_density}, | ||
| distr_parametrizations=["posterior"], | ||
| support_by_parametrization=lambda _: ContinuousSupport(), | ||
| ) | ||
| parametrization(family=family, name="posterior")(ExponentialConjugateHyperparameters) | ||
| return family |
There was a problem hiding this comment.
AI finding
posterior_predictive does not preserve the original observation support.
The predictive density is defined for a new observation x, so it should be zero outside the original family support. Currently posterior_density does not check self._support, and the created family uses ContinuousSupport() as an unconditional real-line support.
For example, for the exponential-family test case with support x > 0, pdf(-0.5) currently returns a positive value, but it should be outside support.
The posterior predictive family should reuse the original support (or a proper support resolver), and posterior_density should return 0 outside it.
| def transform( | ||
| self, | ||
| transform_function: Callable[[NumericArray], NumericArray], | ||
| ) -> ContinuousExponentialClassFamily: | ||
| """ | ||
| Transform the random variable by a monotonic, differentiable function. | ||
|
|
||
| The new density is obtained via the change‑of‑variable formula. | ||
| The sufficient statistic becomes T(transform(x)) and the base measure | ||
| gains the Jacobian factor. | ||
|
|
||
| Args: | ||
| transform_function: Invertible, differentiable function g(y) such that | ||
| x = g(y). Must be defined on the original support. | ||
|
|
||
| Returns: | ||
| ContinuousExponentialClassFamily: A new family for the transformed variable. | ||
| """ | ||
|
|
||
| def calculate_jacobian(x: NumericArray) -> NumericArray: | ||
| if not isinstance(x, Iterable): | ||
| x = np.array([x], dtype=float) | ||
| else: | ||
| x = np.atleast_1d(np.asarray(x, dtype=float)) | ||
|
|
||
| return np.abs(det(jacobian(transform_function, x).df)) | ||
|
|
||
| def new_support(x: NumericArray) -> bool: | ||
| return bool(self._support.contains(transform_function(x))) | ||
|
|
||
| def new_sufficient(x: NumericArray) -> NumericArray: | ||
| return self._sufficient(transform_function(x)) | ||
|
|
||
| def new_normalization(x: NumericArray) -> Number: | ||
| return cast(np.float64, self._normalization(x) * calculate_jacobian(x)) | ||
|
|
||
| return ContinuousExponentialClassFamily( | ||
| log_partition=self._log_partition, | ||
| sufficient_statistics=new_sufficient, | ||
| normalization_constant=new_normalization, | ||
| support=PredicateSupport(predicate=new_support), | ||
| parameter_space=self._parameter_space, | ||
| sufficient_statistics_values=self._sufficient_statistics_values, | ||
| name=f"Transformed{self._name}", | ||
| distr_type=self._distr_type, | ||
| distr_parametrizations=self.parametrization_names, | ||
| support_by_parametrization=self.support_resolver, | ||
| ) |
There was a problem hiding this comment.
I might be wrong but the transformed base measure is still evaluated at the wrong point.
The docstring now treats transform_function as the inverse map:
For a change of variables, the transformed density should use:
but new_normalization currently computes:
because it calls self._normalization(x) instead of self._normalization(transform_function(x)).
This is hidden by current tests because they use h(x) = 1. Add a test with a non-constant base measure.
| family_characteristics: CharacteristicsMap = { | ||
| CharacteristicName.PDF: self.density, | ||
| CharacteristicName.MEAN: self._mean, | ||
| CharacteristicName.VAR: self._var, | ||
| } | ||
| merged_characteristics = dict(distr_characteristics or {}) | ||
| merged_characteristics.update(family_characteristics) |
There was a problem hiding this comment.
The merge order gives default characteristics higher priority than user-provided ones.
distr_characteristics is copied first, then family_characteristics is applied with update(), so user-provided PDF, MEAN, or VAR are silently overwritten by the generic implementations.
This makes it impossible to replace expensive numerical MEAN / VAR with analytical implementations for a concrete exponential family. Use labels so the generic implementation does not overwrite the analytical one.
| @parametrization(family=self, name="theta") | ||
| class ThetaParametrization(ExponentialFamilyParametrization): | ||
| @constraint(description="theta belongs to parameter_space") | ||
| def check_theta_in_parameter_space(self) -> bool: | ||
| theta = np.atleast_1d(np.asarray(self.theta, dtype=float)) | ||
| return bool(self.__family__._parameter_space.contains(theta)) # type: ignore[attr-defined] |
There was a problem hiding this comment.
The theta validation is coupled to a private family attribute and assumes scalar membership.
check_theta_in_parameter_space reaches into self.__family__._parameter_space and then casts the result to bool. This is fragile for two reasons:
- it depends on a private attribute and requires
type: ignore; Support.contains()may return aBoolArray, andbool(array)fails for arrays with more than one element.
Route this through a small helper that explicitly handles scalar-vs-batch membership, or make the parameter-space check return a guaranteed scalar boolean for a single theta.
There was a problem hiding this comment.
The new Support protocol is array-only, but tests still exercise scalar calls directly.
The protocol now says:
def contains(self, x: NumericArray) -> bool | BoolArray: ...but several tests still call contains(point) with plain Python scalars. Runtime works because implementations call np.asarray, but the tests no longer reflect the declared API contract.
Update tests and internal call sites to pass np.asarray(...) / arrays consistently, or change the protocol if scalar inputs are still intended to be supported.
| """ | ||
| Initialize a continuous exponential family distribution. | ||
|
|
||
| Args: | ||
| log_partition: Function A(θ) – the log‑partition function. | ||
| sufficient_statistics: Function T(x) – the sufficient statistic vector. | ||
| normalization_constant: Function h(x) – the base measure. | ||
| support: Predicate defining the support of the distribution. | ||
| parameter_space: Predicate defining the natural parameter space. | ||
| sufficient_statistics_values: Predicate defining the range of T(x). | ||
| name: Name of the family. | ||
| distr_type: Type of distribution or a callable returning it. | ||
| distr_parametrizations: List of parametrization names this family supports. | ||
| distr_characteristics: Additional analytical characteristics to register. | ||
| support_by_parametrization: Callable that returns the support given a parametrization. | ||
| base_score: Optional base score function. | ||
| """ |
There was a problem hiding this comment.
Docstrings in this module should be aligned with the project style and the actual formulas. We require NumPy-style docstrings, but this constructor uses Args:. There are also inconsistencies around the meaning of log_partition, and a typo in parametrizaiton.
Please rewrite the public method docstrings in NumPy style and make the mathematical notation consistent with the implementation.
| """ | ||
| Update the conjugate prior hyperparameters given observed data. | ||
|
|
||
| For a conjugate prior with hyperparameters (ν₀, n₀), the posterior | ||
| hyperparameters become: | ||
| ν = ν₀ + Σ_{i} T(x_i) | ||
| n = n₀ + N | ||
|
|
||
| Args: | ||
| parametrizaiton: Current conjugate hyperparameters. | ||
| sample: List of observations (each can be scalar or array). | ||
|
|
||
| Returns: | ||
| ExponentialConjugateHyperparameters: | ||
| Updated hyperparameters after incorporating the sample. | ||
| """ |
There was a problem hiding this comment.
Same docstring issues here
No description provided.