Skip to content

[Repo Assist] Add Excel docs page; optimize Frame.stack and Frame.unstack#700

Merged
dsyme merged 2 commits intomasterfrom
repo-assist/docs-excel-perf-stack-unstack-20260405-8c9b4030b991bd5a
Apr 6, 2026
Merged

[Repo Assist] Add Excel docs page; optimize Frame.stack and Frame.unstack#700
dsyme merged 2 commits intomasterfrom
repo-assist/docs-excel-perf-stack-unstack-20260405-8c9b4030b991bd5a

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented Apr 5, 2026

🤖 This is an automated pull request from Repo Assist, an AI assistant for this repository.


Summary

Two improvements in one PR (Tasks 10 and 8 from this run):


Task 10 — docs/excel.fsx: Excel integration documentation page

Adds a comprehensive documentation page covering both Excel packages:

Package Coverage
Deedle.Excel.Reader Cross-platform reading of .xlsx/.xls files via ExcelDataReader
Deedle.Excel Live read/write against a running Excel instance (Windows only)

Sections

  • Package setup — NuGet install instructions and open statements
  • Reading the first worksheetreadExcel
  • Reading by sheet namereadExcelSheet
  • Reading by sheet indexreadExcelSheetByIndex
  • Listing worksheetssheetNames
  • Working with the resulting frame — typed column access, indexing, merging sheets
  • Type coercion and missing values — how obj columns are coerced; empty cells as missing
  • Error handling — exceptions thrown for unknown sheets; try/with pattern
  • Writing to Excel (Windows-only) — brief overview of Deedle.Excel live-write API

Two small sample data files are added to docs/data/ (sales.xlsx and quarterly.xlsx) to support live evaluation in fsdocs. A link to excel.html is added to docs/index.fsx.


Task 8 — Performance: Frame.stack and Frame.unstack

Frame.stack

Before: the nested for rowKey in rows do for colKey in cols do loop called frame.ColumnIndex.Locate(colKey) and frame.RowIndex.Locate(rowKey) on every cell — O(rows × cols) total dictionary lookups for each axis.

After: precompute per-column data vectors (once per column) and per-row addresses (once per row) before entering the nested loop. The inner loop now only calls vec.GetObject(rowAddr) — no index lookups.

Operation Before After
Column address lookups rows × cols cols
Row address lookups rows × cols rows

Frame.unstack

Before: iterated frame.RowIndex.Keys twice — once to collect unique fst keys and once for snd keys (Seq.map fst |> Seq.distinct |> Array.ofSeq).

After: single pass using HashSet<'R1> and HashSet<'R2> with corresponding ResizeArrays, maintaining first-occurrence order. Keys are iterated once.


Test Status

Passed!  - Failed: 0, Passed: 703, Skipped: 0, Total: 703

All 703 tests pass. Semantics are unchanged — same outputs, fewer allocations and dictionary lookups.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@7ee2b60744abf71b985bead4599640f165edcd93

Task 10: Add docs/excel.fsx — comprehensive documentation page for
Deedle.Excel.Reader (cross-platform xlsx/xls reading) and Deedle.Excel
(Windows live-Excel writer). Includes two sample data files under
docs/data/ (sales.xlsx, quarterly.xlsx). Links excel.html from
docs/index.fsx.

Task 8: Performance improvements to Frame.stack and Frame.unstack:
- Frame.stack: precompute per-column data vectors and per-row addresses
  before the nested loop, eliminating O(rows×cols) redundant index
  lookups (dictionary lookups saved: (rows-1)×cols for col addresses
  and (cols-1)×rows for row addresses).
- Frame.unstack: single-pass extraction of unique r1/r2 keys using
  HashSet-backed ResizeArray, replacing two Seq.map|>Seq.distinct
  traversals of the row index.

Test status: all 703 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dsyme dsyme marked this pull request as ready for review April 6, 2026 21:29
@dsyme dsyme merged commit 09884ad into master Apr 6, 2026
2 checks passed
@dsyme dsyme deleted the repo-assist/docs-excel-perf-stack-unstack-20260405-8c9b4030b991bd5a branch April 6, 2026 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant