Skip to content

Cannot use BigQuery CDC, fields beginning with underscore are not serialized #513

@jmesterh

Description

@jmesterh

Environment details

  • Programming language: Python
  • OS: Debian 11.11
  • Language runtime version: 3.11.11
  • Package version: 1.25.0

Steps to reproduce

I am attempting to implement BigQuery CDC using the instructions here. This requires the addition of a pseudocolumn with the name "_change_type" in the protobuffer message:

class FooExample(proto.Message):
    foo = proto.Field(proto.STRING, number=1)
    _change_type = proto.Field(proto.STRING, number=2)

When the message is serialized with Message.serialize() the contents of _change_type are not included in the serialized output.

The omission happens here where it calls super in __setattr__ if the first character of the field begins with _ (presumably so _pb functions correctly).

I also tried setting json_name= hoping this would allow an alternative field name:

class FooExample(proto.Message):
    foo = proto.Field(proto.STRING, number=1)
    change_type = proto.Field(proto.STRING, json_name="_change_type", number=2)

However the BigQuery Storage Write API returns with an error that the column change_type does not exist.

I was hoping to use this library in lieu of .proto files, but cannot until this is fixed. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions