Families/ adding base version of exponential family by domosedy · Pull Request #67 · PySATL/pysatl-core

domosedy · 2026-02-03T12:00:22Z

No description provided.

…onjugate prior

LeonidElkin · 2026-05-25T15:20:14Z

+            if hasattr(dot, "__len__"):
+                dot = dot[0]
+
+            result = np.log(self._normalization(x)) + dot + self._log_partition(theta)


The formula does not correspond to the docstring and standard notation I guess.

wiki:

$$f(x \mid \theta) = h(x)\exp\left(\theta^\top T(x) - A(\theta)\right)$$

your code:

$$\log f(x \mid \theta) = \log h(x) + \theta^\top T(x) + A(\theta)$$

Yes, but I decided that + log-partition function is easier to write and check formulas. This is described in docstring of class

…blems

LeonidElkin · 2026-05-31T19:15:31Z

+        def posterior_density(parametrization: Parametrization, x: NumericArray) -> Number:
+            parametrization = cast(ExponentialConjugateHyperparameters, parametrization)
+            return cast(
+                np.float32,
+                self._normalization(x)
+                * conjugate_log_partition(parametrization)
+                / conjugate_log_partition(
+                    self.posterior_hyperparameters(parametrizaiton=parametrization, sample=[x])
+                ),
+            )
+
+        family = ParametricFamily(
+            name=f"PosteriorPredictive{self.name}",
+            distr_type=UnivariateContinuous,
+            distr_characteristics={CharacteristicName.PDF: posterior_density},
+            distr_parametrizations=["posterior"],
+            support_by_parametrization=lambda _: ContinuousSupport(),
+        )
+        parametrization(family=family, name="posterior")(ExponentialConjugateHyperparameters)
+        return family


AI finding

posterior_predictive does not preserve the original observation support.

The predictive density is defined for a new observation x, so it should be zero outside the original family support. Currently posterior_density does not check self._support, and the created family uses ContinuousSupport() as an unconditional real-line support.

For example, for the exponential-family test case with support x > 0, pdf(-0.5) currently returns a positive value, but it should be outside support.

The posterior predictive family should reuse the original support (or a proper support resolver), and posterior_density should return 0 outside it.

LeonidElkin · 2026-05-31T19:21:02Z

+    def transform(
+        self,
+        transform_function: Callable[[NumericArray], NumericArray],
+    ) -> ContinuousExponentialClassFamily:
+        """
+        Transform the random variable by a monotonic, differentiable function.
+
+        The new density is obtained via the change‑of‑variable formula.
+        The sufficient statistic becomes T(transform(x)) and the base measure
+        gains the Jacobian factor.
+
+        Args:
+            transform_function: Invertible, differentiable function g(y) such that
+                x = g(y). Must be defined on the original support.
+
+        Returns:
+            ContinuousExponentialClassFamily: A new family for the transformed variable.
+        """
+
+        def calculate_jacobian(x: NumericArray) -> NumericArray:
+            if not isinstance(x, Iterable):
+                x = np.array([x], dtype=float)
+            else:
+                x = np.atleast_1d(np.asarray(x, dtype=float))
+
+            return np.abs(det(jacobian(transform_function, x).df))
+
+        def new_support(x: NumericArray) -> bool:
+            return bool(self._support.contains(transform_function(x)))
+
+        def new_sufficient(x: NumericArray) -> NumericArray:
+            return self._sufficient(transform_function(x))
+
+        def new_normalization(x: NumericArray) -> Number:
+            return cast(np.float64, self._normalization(x) * calculate_jacobian(x))
+
+        return ContinuousExponentialClassFamily(
+            log_partition=self._log_partition,
+            sufficient_statistics=new_sufficient,
+            normalization_constant=new_normalization,
+            support=PredicateSupport(predicate=new_support),
+            parameter_space=self._parameter_space,
+            sufficient_statistics_values=self._sufficient_statistics_values,
+            name=f"Transformed{self._name}",
+            distr_type=self._distr_type,
+            distr_parametrizations=self.parametrization_names,
+            support_by_parametrization=self.support_resolver,
+        )


I might be wrong but the transformed base measure is still evaluated at the wrong point.

The docstring now treats transform_function as the inverse map:

$$ x = g(y) $$

For a change of variables, the transformed density should use:

$$ p_Y(y) = h(g(y)) \exp(\theta^\top T(g(y)) + A(\theta)) \left|J_g(y)\right| $$

but new_normalization currently computes:

$$ h(y) \left|J_g(y)\right| $$

because it calls self._normalization(x) instead of self._normalization(transform_function(x)).

This is hidden by current tests because they use h(x) = 1. Add a test with a non-constant base measure.

LeonidElkin · 2026-05-31T19:21:54Z

+        family_characteristics: CharacteristicsMap = {
+            CharacteristicName.PDF: self.density,
+            CharacteristicName.MEAN: self._mean,
+            CharacteristicName.VAR: self._var,
+        }
+        merged_characteristics = dict(distr_characteristics or {})
+        merged_characteristics.update(family_characteristics)


The merge order gives default characteristics higher priority than user-provided ones.

distr_characteristics is copied first, then family_characteristics is applied with update(), so user-provided PDF, MEAN, or VAR are silently overwritten by the generic implementations.

This makes it impossible to replace expensive numerical MEAN / VAR with analytical implementations for a concrete exponential family. Use labels so the generic implementation does not overwrite the analytical one.

LeonidElkin · 2026-05-31T19:32:15Z

+        @parametrization(family=self, name="theta")
+        class ThetaParametrization(ExponentialFamilyParametrization):
+            @constraint(description="theta belongs to parameter_space")
+            def check_theta_in_parameter_space(self) -> bool:
+                theta = np.atleast_1d(np.asarray(self.theta, dtype=float))
+                return bool(self.__family__._parameter_space.contains(theta))  # type: ignore[attr-defined]


The theta validation is coupled to a private family attribute and assumes scalar membership.

check_theta_in_parameter_space reaches into self.__family__._parameter_space and then casts the result to bool. This is fragile for two reasons:

it depends on a private attribute and requires type: ignore;

Support.contains() may return a BoolArray, and bool(array) fails for arrays with more than one element.

Route this through a small helper that explicitly handles scalar-vs-batch membership, or make the parameter-space check return a guaranteed scalar boolean for a single theta.

LeonidElkin · 2026-05-31T19:33:07Z

The new Support protocol is array-only, but tests still exercise scalar calls directly.

The protocol now says:

def contains(self, x: NumericArray) -> bool | BoolArray: ...

but several tests still call contains(point) with plain Python scalars. Runtime works because implementations call np.asarray, but the tests no longer reflect the declared API contract.

Update tests and internal call sites to pass np.asarray(...) / arrays consistently, or change the protocol if scalar inputs are still intended to be supported.

LeonidElkin · 2026-05-31T19:33:57Z

+        """
+        Initialize a continuous exponential family distribution.
+
+        Args:
+            log_partition: Function A(θ) – the log‑partition function.
+            sufficient_statistics: Function T(x) – the sufficient statistic vector.
+            normalization_constant: Function h(x) – the base measure.
+            support: Predicate defining the support of the distribution.
+            parameter_space: Predicate defining the natural parameter space.
+            sufficient_statistics_values: Predicate defining the range of T(x).
+            name: Name of the family.
+            distr_type: Type of distribution or a callable returning it.
+            distr_parametrizations: List of parametrization names this family supports.
+            distr_characteristics: Additional analytical characteristics to register.
+            support_by_parametrization: Callable that returns the support given a parametrization.
+            base_score: Optional base score function.
+        """


Docstrings in this module should be aligned with the project style and the actual formulas. We require NumPy-style docstrings, but this constructor uses Args:. There are also inconsistencies around the meaning of log_partition, and a typo in parametrizaiton.

Please rewrite the public method docstrings in NumPy style and make the mathematical notation consistent with the implementation.

LeonidElkin · 2026-05-31T19:34:15Z

+        """
+        Update the conjugate prior hyperparameters given observed data.
+
+        For a conjugate prior with hyperparameters (ν₀, n₀), the posterior
+        hyperparameters become:
+            ν = ν₀ + Σ_{i} T(x_i)
+            n = n₀ + N
+
+        Args:
+            parametrizaiton: Current conjugate hyperparameters.
+            sample: List of observations (each can be scalar or array).
+
+        Returns:
+            ExponentialConjugateHyperparameters:
+                Updated hyperparameters after incorporating the sample.
+        """


Same docstring issues here

domosedy linked an issue Feb 3, 2026 that may be closed by this pull request

ExponentialClassFamily: conjugate prior support and posterior hyperparameters #50

Open

LeonidElkin reviewed Feb 20, 2026

View reviewed changes

Desiment self-requested a review February 20, 2026 19:02

Desiment requested changes Feb 21, 2026

View reviewed changes

domosedy force-pushed the exponential-family branch 2 times, most recently from 779438f to 12516ff Compare March 29, 2026 18:56

LeonidElkin assigned domosedy May 1, 2026

LeonidElkin added core.families Enhancement New feature or request labels May 1, 2026

LeonidElkin added this to PySATL Core Developing May 1, 2026

LeonidElkin added this to the Prototype alpha-version milestone May 1, 2026

LeonidElkin requested changes May 1, 2026

View reviewed changes

domosedy force-pushed the exponential-family branch 2 times, most recently from f67f3fd to 2d470c9 Compare May 3, 2026 21:20

domosedy force-pushed the exponential-family branch from 2d470c9 to 189afea Compare May 11, 2026 12:36

domosedy added 4 commits May 11, 2026 15:42

feat(exponential): added base version of executable ExponentialFamily

23c5607

feat(exponential): added new class structure and manual testing for c…

82e959a

…onjugate prior

feat(exponential): added transform method to ExponentialFamily

2ef19cf

refactor(exponential): refactoring for current version of master

8cfa0b8

domosedy force-pushed the exponential-family branch 3 times, most recently from 1605771 to 1d58077 Compare May 11, 2026 13:09

domosedy added 4 commits May 17, 2026 15:15

feat(exponential): add posterior predictive function

a1684e4

refactor(exponential): fix pr issues

2e6fe1c

docs(exponential): add docstrings

f3b6575

feat(exponential): remove NumberParameter logic

2500678

domosedy force-pushed the exponential-family branch from 83f5584 to 2500678 Compare May 17, 2026 12:15

LeonidElkin reviewed May 25, 2026

View reviewed changes

domosedy force-pushed the exponential-family branch from ec832d5 to d5a2be1 Compare May 31, 2026 09:33

refactor(exponential): update support protocol and fix some minor pro…

804cfa4

…blems

domosedy force-pushed the exponential-family branch from d5a2be1 to 804cfa4 Compare May 31, 2026 10:50

LeonidElkin reviewed May 31, 2026

View reviewed changes

Conversation

domosedy commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants