Skip to content

Conversation

@Myracle
Copy link
Contributor

@Myracle Myracle commented Feb 11, 2026

What is the purpose of the change

This pull request exposes 5 additional Jackson CsvParser.Feature options as Flink SQL CSV format configuration options, allowing users to fine-tune CSV deserialization behavior. Currently, the CSV format connector only exposes a limited set of parser options (like csv.allow-comments and csv.ignore-parse-errors), but several useful Jackson CSV parser features are not accessible. This change adds the following new options:

  • csv.trim-spaces — Trims leading/trailing whitespace from unquoted field values (CsvParser.Feature.TRIM_SPACES)
  • csv.ignore-trailing-unmappable — Ignores extra trailing columns that don't map to the schema (CsvParser.Feature.IGNORE_TRAILING_UNMAPPABLE)
  • csv.allow-trailing-comma — Allows a trailing comma after the last field value (CsvParser.Feature.ALLOW_TRAILING_COMMA)
  • csv.fail-on-missing-columns — Fails when a row has fewer columns than expected by the schema (CsvParser.Feature.FAIL_ON_MISSING_COLUMNS)
  • csv.empty-string-as-null — Treats empty string values as null (CsvParser.Feature.EMPTY_STRING_AS_NULL)

These options only affect deserialization (source side).

Brief change log

  • Added 5 new ConfigOption<Boolean> definitions in CsvFormatOptions with descriptions indicating they only affect deserialization
  • Registered new options in CsvCommons as both optional and forwarded options
  • Extended CsvRowDataDeserializationSchema.Builder with setter methods for each new feature, and configured enabled/disabled features on the CsvMapper during open()
  • Updated CsvFormatFactory.configureDeserializationSchema() to read and pass the new options to the schema builder
  • Updated CsvFileFormatFactory to support the new features in the Bulk Format / File Source path via createCsvMapperFactory()
  • Fixed a pre-existing bug in CsvFileFormatFactory where ignoreParseErrors was determined by isPresent() instead of reading the actual config value
  • Updated both English and Chinese documentation with descriptions for the 5 new options

Verifying this change

This change added tests and can be verified as follows:

  • Added testTrimSpaces() test in CsvFormatFactoryTest to verify the csv.trim-spaces option trims whitespace from unquoted field values
  • Added testIgnoreTrailingUnmappable() test to verify extra trailing columns are silently ignored
  • Added testAllowTrailingComma() test to verify a trailing comma after the last field value is accepted
  • Added testFailOnMissingColumns() test to verify deserialization fails when a row has fewer columns than expected
  • Added testEmptyStringAsNull() test to verify empty strings are treated as null values
  • Added testAllCsvParserFeaturesTogether() test to verify all 5 new features work correctly when enabled simultaneously
  • Updated testSeDeSchema() test to include csv.trim-spaces and csv.empty-string-as-null options, verifying the complete option-to-schema configuration chain
  • Added testBulkFormatWithParserFeatures() test to verify the Bulk Format / File Source path correctly applies the new CsvParser.Feature options via CsvBulkDecodingFormat

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes (CsvFormatOptions is annotated with @PublicEvolving, 5 new ConfigOption fields are added)
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no (the features are configured once during open(), not per-record)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs (both English and Chinese documentation updated in docs/content/docs/connectors/table/formats/csv.md and docs/content.zh/docs/connectors/table/formats/csv.md)

…rt additional CsvParser.Feature options for CSV format deserialization
@flinkbot
Copy link
Collaborator

flinkbot commented Feb 11, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants