Skip to content

Mean functions: better defaults and a formula interface #16

@bwengals

Description

@bwengals

Most GP libraries treat the mean function as an afterthought. The default is usually zero, and most examples never change it. But for practitioners, the mean function matters:

  • A zero mean function means the GP reverts to zero away from data, which is rarely what you want in practice
  • A constant mean is a better default (the GP reverts to the data mean), but still leaves trend on the table
  • A linear mean function captures obvious trends and lets the kernel focus on residual structure, which is the semi-parametric GP pattern that works well in practice

ptgp currently has a Zero mean function. Some things to consider:

Better default

Should the default mean function be a constant (estimated from data) rather than zero? This is a small change that would make out-of-the-box behavior much more reasonable for practitioners who don't think about mean functions.

Formula interface

A random idea that might be interesting: a Wilkinson-style formula language (like R's lm) for specifying mean functions:

gp = pg.VFE(kernel=k, mean=pg.mean.Formula("1 + x1 + C(x2)"), ...)

Open questions

  • What should the default mean function be? Constant or linear?
  • Do mean functions need their own class, or can they just be a Python callable?
  • What do people actually use for mean functions in practice?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions