Skip to content

Is this the expected behavior of patsy when building a design matrix of a two-level categorical variable without an intercept? #80

@bsmith89

Description

@bsmith89

cloned from stackoverflow: http://stackoverflow.com/questions/35944841/is-this-the-expected-behavior-of-patsy-when-building-a-design-matrix-of-a-two-le

(patsy v0.4.1, python 3.5.0)

I would like to use patsy (ideally through statsmodels) to build a design matrix for regression.

The patsy-style formula that I would like to fit is

response ~ 0 + category

where category is a two-level categorical variable. The 0 + ... is supposed to indicate that I do not want the implicit intercept term.

The design matrix that I expect has a single column with zeros and ones indicating whether category has the base-level (0) or the other level (1).

The following code:

import pandas as pd
import patsy

df = pd.DataFrame({'category': ['A', 'B'] * 3})

patsy.dmatrix('0 + category', data=df)

Outputs:

DesignMatrix with shape (6, 2)
  category[A]  category[B]
            1            0
            0            1
            1            0
            0            1
            1            0
            0            1
  Terms:
    'category' (columns 0:2)

which is singular and not what I want.

When I instead run

import pandas as pd
import patsy

df = pd.DataFrame({'category': ['A', 'B'] * 3})

patsy.dmatrix('category', data=df)

the output is

DesignMatrix with shape (6, 2)
  Intercept  category[T.B]
          1              0
          1              1
          1              0
          1              1
          1              0
          1              1
  Terms:
    'Intercept' (column 0)
    'category' (column 1)

which is correct for the model which includes an intercept, but still not what I want.

Is the output without an intercept the intended behavior? If so, why?
Am I just confused about how this design matrix is supposed to work with standard coding?

I know that I can edit the design matrix to make my regression work the way I intend, but if this is a bug I'd like to see it fixed in patsy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions