Skip to content

[Repo Assist] Add CSV property-based tests and CSV benchmarks#1692

Merged
dsyme merged 2 commits intomainfrom
repo-assist/improve-csv-tests-and-benchmarks-20260309-89ae3e835c77fb70
Mar 12, 2026
Merged

[Repo Assist] Add CSV property-based tests and CSV benchmarks#1692
dsyme merged 2 commits intomainfrom
repo-assist/improve-csv-tests-and-benchmarks-20260309-89ae3e835c77fb70

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Mar 9, 2026

🤖 This is an automated pull request from Repo Assist, an AI assistant for this repository.

Addresses two gaps identified in the test/benchmark infrastructure:

Task 9 — Testing Improvements: CSV Property-Based Tests

Adds tests/FSharp.Data.Core.Tests/CsvParserProperties.fs with 10 new tests:

  • FsCheck property test (500 random test cases): generates arbitrary rows of arbitrary strings, encodes them to RFC 4180 CSV, parses with readCsvFile, and verifies a complete roundtrip. Covers embedded commas, quotes, newlines, CRLF, and empty strings automatically.
  • 9 targeted tests for specific edge cases: fields containing separators, embedded quotes, LF newlines, CRLF newlines, tab separators, custom quote characters, multi-row data, empty string fields, and single-column empty-field rows (a case that was previously untested).

The key insight driving the design: the CSV parser skips blank lines at the top level, so a single-column row containing only an empty string must be quoted ("") rather than left as an empty line. The new encodeCsvField helper always quotes empty strings to ensure correct roundtripping. This pattern is modelled directly on the existing JsonParserProperties.fs.

Task 8 — Performance: CSV Benchmarks

Adds tests/FSharp.Data.Benchmarks/CsvBenchmarks.fs with 8 benchmarks:

Benchmark File size
ParseAirQualityCsv / IterateAirQualityCsv 3 KB
ParseBanklistCsv / IterateBanklistCsv 40 KB
ParseTitanicCsv / IterateTitanicCsv 60 KB
ParseMSFTCsv / IterateMSFTCsv 328 KB

The parse-vs-iterate split is important because CsvFile.Parse is lazy — the parse benchmark measures header parsing and reader setup, while the iterate benchmark measures full row-by-row iteration.

Also fixes Program.fs: HtmlBenchmarks was defined but never wired into the benchmark runner entry point. This PR adds html and csv as command-line arguments and includes both in the default "run all" path.

Test Status

  • ✅ Build: 0 errors, 25 pre-existing warnings
  • ✅ New CSV property tests: 10/10 passed
  • ✅ Benchmarks project builds cleanly (0 errors)
  • ✅ Fantomas formatting: clean (format task reports Ok)

Generated by Repo Assist

Generated by Repo Assist ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@346204513ecfa08b81566450d7d599556807389f

Task 9 (Testing Improvements): Add CsvParserProperties.fs with 10 tests:
- FsCheck property-based roundtrip test (500 random test cases) verifying
  that arbitrary string values survive encode → parse roundtrip with comma
  separator and double-quote quoting
- Targeted tests for fields containing separators, quotes, newlines (LF),
  CRLF, tab separators, custom quote chars, multiple rows, empty fields,
  and single-column empty-field rows (which previously had no coverage)

Task 8 (Performance): Add CsvBenchmarks.fs with 8 benchmarks covering:
- Parse-only and full row-iteration for AirQuality (3 KB), banklist (40 KB),
  Titanic (60 KB), and MSFT (328 KB) CSV files
- Also fix Program.fs to actually run HtmlBenchmarks and CsvBenchmarks
  (HtmlBenchmarks was defined but never wired into the program entry point)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dsyme dsyme marked this pull request as ready for review March 12, 2026 02:25
@dsyme dsyme merged commit c25289a into main Mar 12, 2026
2 checks passed
@dsyme dsyme deleted the repo-assist/improve-csv-tests-and-benchmarks-20260309-89ae3e835c77fb70 branch March 12, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant