Skip to content

Families/ adding base version of exponential family#67

Open
domosedy wants to merge 9 commits into
PySATL:mainfrom
domosedy:exponential-family
Open

Families/ adding base version of exponential family#67
domosedy wants to merge 9 commits into
PySATL:mainfrom
domosedy:exponential-family

Conversation

@domosedy
Copy link
Copy Markdown

@domosedy domosedy commented Feb 3, 2026

No description provided.

Comment thread src/pysatl_core/families/exponential_family.py
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
@Desiment Desiment self-requested a review February 20, 2026 19:02
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py
@domosedy domosedy force-pushed the exponential-family branch 2 times, most recently from 779438f to 12516ff Compare March 29, 2026 18:56
@LeonidElkin LeonidElkin added core.families Enhancement New feature or request labels May 1, 2026
@LeonidElkin LeonidElkin added this to the Prototype alpha-version milestone May 1, 2026
Comment thread src/pysatl_core/types.py Outdated
Comment thread src/pysatl_core/distributions/support.py Outdated
Comment thread src/pysatl_core/distributions/support.py Outdated
Comment thread src/pysatl_core/families/__init__.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py
Comment thread src/pysatl_core/families/exponential_family.py
Comment thread src/pysatl_core/families/exponential_family.py
Comment thread tests/unit/families/test_exponential_family.py Outdated
@domosedy domosedy force-pushed the exponential-family branch 2 times, most recently from f67f3fd to 2d470c9 Compare May 3, 2026 21:20
@domosedy domosedy force-pushed the exponential-family branch from 2d470c9 to 189afea Compare May 11, 2026 12:36
@domosedy domosedy force-pushed the exponential-family branch 3 times, most recently from 1605771 to 1d58077 Compare May 11, 2026 13:09
@domosedy domosedy force-pushed the exponential-family branch from 83f5584 to 2500678 Compare May 17, 2026 12:15
Comment thread src/pysatl_core/distributions/support.py Outdated
if hasattr(dot, "__len__"):
dot = dot[0]

result = np.log(self._normalization(x)) + dot + self._log_partition(theta)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formula does not correspond to the docstring and standard notation I guess.

wiki:

$$f(x \mid \theta) = h(x)\exp\left(\theta^\top T(x) - A(\theta)\right)$$

your code:

$$\log f(x \mid \theta) = \log h(x) + \theta^\top T(x) + A(\theta)$$

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I decided that + log-partition function is easier to write and check formulas. This is described in docstring of class

Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py
Comment thread src/pysatl_core/families/exponential_family.py
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/distributions/support.py Outdated
Comment thread src/pysatl_core/distributions/support.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
Comment thread src/pysatl_core/families/exponential_family.py Outdated
@domosedy domosedy force-pushed the exponential-family branch from ec832d5 to d5a2be1 Compare May 31, 2026 09:33
@domosedy domosedy force-pushed the exponential-family branch from d5a2be1 to 804cfa4 Compare May 31, 2026 10:50
Comment on lines +464 to +483
def posterior_density(parametrization: Parametrization, x: NumericArray) -> Number:
parametrization = cast(ExponentialConjugateHyperparameters, parametrization)
return cast(
np.float32,
self._normalization(x)
* conjugate_log_partition(parametrization)
/ conjugate_log_partition(
self.posterior_hyperparameters(parametrizaiton=parametrization, sample=[x])
),
)

family = ParametricFamily(
name=f"PosteriorPredictive{self.name}",
distr_type=UnivariateContinuous,
distr_characteristics={CharacteristicName.PDF: posterior_density},
distr_parametrizations=["posterior"],
support_by_parametrization=lambda _: ContinuousSupport(),
)
parametrization(family=family, name="posterior")(ExponentialConjugateHyperparameters)
return family
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI finding

posterior_predictive does not preserve the original observation support.

The predictive density is defined for a new observation x, so it should be zero outside the original family support. Currently posterior_density does not check self._support, and the created family uses ContinuousSupport() as an unconditional real-line support.

For example, for the exponential-family test case with support x > 0, pdf(-0.5) currently returns a positive value, but it should be outside support.

The posterior predictive family should reuse the original support (or a proper support resolver), and posterior_density should return 0 outside it.

Comment on lines +305 to +352
def transform(
self,
transform_function: Callable[[NumericArray], NumericArray],
) -> ContinuousExponentialClassFamily:
"""
Transform the random variable by a monotonic, differentiable function.

The new density is obtained via the change‑of‑variable formula.
The sufficient statistic becomes T(transform(x)) and the base measure
gains the Jacobian factor.

Args:
transform_function: Invertible, differentiable function g(y) such that
x = g(y). Must be defined on the original support.

Returns:
ContinuousExponentialClassFamily: A new family for the transformed variable.
"""

def calculate_jacobian(x: NumericArray) -> NumericArray:
if not isinstance(x, Iterable):
x = np.array([x], dtype=float)
else:
x = np.atleast_1d(np.asarray(x, dtype=float))

return np.abs(det(jacobian(transform_function, x).df))

def new_support(x: NumericArray) -> bool:
return bool(self._support.contains(transform_function(x)))

def new_sufficient(x: NumericArray) -> NumericArray:
return self._sufficient(transform_function(x))

def new_normalization(x: NumericArray) -> Number:
return cast(np.float64, self._normalization(x) * calculate_jacobian(x))

return ContinuousExponentialClassFamily(
log_partition=self._log_partition,
sufficient_statistics=new_sufficient,
normalization_constant=new_normalization,
support=PredicateSupport(predicate=new_support),
parameter_space=self._parameter_space,
sufficient_statistics_values=self._sufficient_statistics_values,
name=f"Transformed{self._name}",
distr_type=self._distr_type,
distr_parametrizations=self.parametrization_names,
support_by_parametrization=self.support_resolver,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but the transformed base measure is still evaluated at the wrong point.

The docstring now treats transform_function as the inverse map:

$$ x = g(y) $$

For a change of variables, the transformed density should use:

$$ p_Y(y) = h(g(y)) \exp(\theta^\top T(g(y)) + A(\theta)) \left|J_g(y)\right| $$

but new_normalization currently computes:

$$ h(y) \left|J_g(y)\right| $$

because it calls self._normalization(x) instead of self._normalization(transform_function(x)).

This is hidden by current tests because they use h(x) = 1. Add a test with a non-constant base measure.

Comment on lines +164 to +170
family_characteristics: CharacteristicsMap = {
CharacteristicName.PDF: self.density,
CharacteristicName.MEAN: self._mean,
CharacteristicName.VAR: self._var,
}
merged_characteristics = dict(distr_characteristics or {})
merged_characteristics.update(family_characteristics)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge order gives default characteristics higher priority than user-provided ones.

distr_characteristics is copied first, then family_characteristics is applied with update(), so user-provided PDF, MEAN, or VAR are silently overwritten by the generic implementations.

This makes it impossible to replace expensive numerical MEAN / VAR with analytical implementations for a concrete exponential family. Use labels so the generic implementation does not overwrite the analytical one.

Comment on lines +182 to +187
@parametrization(family=self, name="theta")
class ThetaParametrization(ExponentialFamilyParametrization):
@constraint(description="theta belongs to parameter_space")
def check_theta_in_parameter_space(self) -> bool:
theta = np.atleast_1d(np.asarray(self.theta, dtype=float))
return bool(self.__family__._parameter_space.contains(theta)) # type: ignore[attr-defined]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theta validation is coupled to a private family attribute and assumes scalar membership.

check_theta_in_parameter_space reaches into self.__family__._parameter_space and then casts the result to bool. This is fragile for two reasons:

  1. it depends on a private attribute and requires type: ignore;
  2. Support.contains() may return a BoolArray, and bool(array) fails for arrays with more than one element.

Route this through a small helper that explicitly handles scalar-vs-batch membership, or make the parameter-space check return a guaranteed scalar boolean for a single theta.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Support protocol is array-only, but tests still exercise scalar calls directly.

The protocol now says:

def contains(self, x: NumericArray) -> bool | BoolArray: ...

but several tests still call contains(point) with plain Python scalars. Runtime works because implementations call np.asarray, but the tests no longer reflect the declared API contract.

Update tests and internal call sites to pass np.asarray(...) / arrays consistently, or change the protocol if scalar inputs are still intended to be supported.

Comment on lines +139 to +155
"""
Initialize a continuous exponential family distribution.

Args:
log_partition: Function A(θ) – the log‑partition function.
sufficient_statistics: Function T(x) – the sufficient statistic vector.
normalization_constant: Function h(x) – the base measure.
support: Predicate defining the support of the distribution.
parameter_space: Predicate defining the natural parameter space.
sufficient_statistics_values: Predicate defining the range of T(x).
name: Name of the family.
distr_type: Type of distribution or a callable returning it.
distr_parametrizations: List of parametrization names this family supports.
distr_characteristics: Additional analytical characteristics to register.
support_by_parametrization: Callable that returns the support given a parametrization.
base_score: Optional base score function.
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings in this module should be aligned with the project style and the actual formulas. We require NumPy-style docstrings, but this constructor uses Args:. There are also inconsistencies around the meaning of log_partition, and a typo in parametrizaiton.

Please rewrite the public method docstrings in NumPy style and make the mathematical notation consistent with the implementation.

Comment on lines +405 to +420
"""
Update the conjugate prior hyperparameters given observed data.

For a conjugate prior with hyperparameters (ν₀, n₀), the posterior
hyperparameters become:
ν = ν₀ + Σ_{i} T(x_i)
n = n₀ + N

Args:
parametrizaiton: Current conjugate hyperparameters.
sample: List of observations (each can be scalar or array).

Returns:
ExponentialConjugateHyperparameters:
Updated hyperparameters after incorporating the sample.
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same docstring issues here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core.families Enhancement New feature or request

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

ExponentialClassFamily: conjugate prior support and posterior hyperparameters

3 participants