Skip to content

Allow arbitrary JSON in google.protobuf.Struct fields during validation#2119

Open
SkyrimL wants to merge 12 commits intomainfrom
zhipingliu/struct-parsing-fix
Open

Allow arbitrary JSON in google.protobuf.Struct fields during validation#2119
SkyrimL wants to merge 12 commits intomainfrom
zhipingliu/struct-parsing-fix

Conversation

@SkyrimL
Copy link
Copy Markdown
Member

@SkyrimL SkyrimL commented Mar 19, 2026

The current verifyObjectMatchesProto helper is very strict. When users provide arbitrary JSON metadata in a Struct field (like extra_properties), the validator throws an "unexpected property" error because those custom keys are not explicitly defined in the protobuf message.

This PR introduces a manual conversion from plain JavaScript objects to the standard google.protobuf.Struct format during the validation phase.

@SkyrimL SkyrimL requested a review from a team as a code owner March 19, 2026 22:32
@SkyrimL SkyrimL requested review from andrzej-grudzien, fernst, kolina and rahulmadanahalli and removed request for a team March 19, 2026 22:32
Copy link
Copy Markdown
Contributor

@kolina kolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add compilation test cases

@SkyrimL SkyrimL requested a review from ikholopov-omni March 20, 2026 21:03
Copy link
Copy Markdown
Contributor

@kolina kolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests seem to be failing?

return;
}
const config = this.verifyConfig(unverifiedConfig);
if (this.unverifiedConfig.metadata?.extraProperties) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users of JavaScript API may theoretically pass a protobuf object directly: see here.

I think, there should be a check that if extraProperties is already an instance of protobuf struct object, that this conversion is skipped.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this case?

export function jsonToValue(value: any): any {
if (value === null || typeof value === "undefined") {
return { nullValue: 0 };
} else if (typeof value === "number") {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Else statement are redundant in these chains.

return { fields };
}

export function jsonToValue(value: any): any {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't omit the returned type

return buf;
}

export function jsonToStruct(obj: any): any {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't omit the return type

core/session.ts Outdated
}

this.jitContextData.fields[key] = unknownToValue(data);
this.jitContextData.fields[key] = google.protobuf.Value.create(jsonToValue(data));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't require JiT data to be a JSON object - it can be any JS object. Please consider using a better name for a conversion function.

return buf;
}

export interface IValue {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this interface? google.protobuf.IValue should already cover this

const CONFIGS_PROTO_DOCUMENTATION_URL =
"https://dataform-co.github.io/dataform/docs/configs-reference";
const REPORT_ISSUE_URL = "https://github.com/dataform-co/dataform/issues";
const STRUCT_FIELD_NAMES = new Set(["extraProperties", "contexts", "glossary_terms"]);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to parametrize this set on all the possible keys that we can store inside extraProperties - we'd need to update this open source project every time we want to support another key. Let's generalize it so we don't need to store glossary_terms or contacts here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have other fields except extraProperties here?

glossary_terms: [
{
column_name: "trip_id",
project: "bq-dataworkeragent-test",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not use the name of our internal project here - rename this to project_identifier or something else

}
],
generic: {
system: "yo",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's replace these with better values (ie. "my custom system value"

const CONFIGS_PROTO_DOCUMENTATION_URL =
"https://dataform-co.github.io/dataform/docs/configs-reference";
const REPORT_ISSUE_URL = "https://github.com/dataform-co/dataform/issues";
const STRUCT_FIELD_NAMES = new Set(["extraProperties", "contexts", "glossary_terms"]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have other fields except extraProperties here?

return {
structValue: {
fields: Object.fromEntries(
Object.entries(raw as object).map(([key, value]) => [key, unknownToValue(value)])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is cast raw as object needed here?

Comment on lines +59 to +61
if (structFieldNames.has(presentKey)) {
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this check needed here?

Comment on lines +116 to +125
if (Array.isArray(value)) {
result[key] = {
listValue: {
values: value.map(item => unknownToValue(item))
}
};
} else if (typeof value === "object" && !value.fields) {
const converted = unknownToValue(value);
result[key] = converted.structValue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a bit unclear to me, I think you only need a check for top-level field that is always a struct at the moment?

return;
}
const config = this.verifyConfig(unverifiedConfig);
if (this.unverifiedConfig.metadata?.extraProperties) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants