Conversation
Adds the Legacy Bench announcement as a new blog post covering the first benchmark for evaluating AI agents on legacy software engineering tasks (COBOL, Fortran, Java 7, etc). Includes benchmark results charts and a new Blog tab in the docs navigation. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
|
||
| ## What is Legacy Bench | ||
|
|
||
| Legacy Bench consists of 100 tasks spanning six legacy language families and real enterprise domains. The full benchmark is used for evaluation, with ten representative tasks publicly available as open samples. |
There was a problem hiding this comment.
[P1] Task total is inconsistent across the post and charts
The post states "Legacy Bench consists of 100 tasks" (line 16), but the per-language chart shows "OVERALL (99)" while the per-language counts sum to 100. Please reconcile the benchmark total across the narrative and charts (and adjust any derived percentages) so readers aren’t left unsure whether results are for 99 or 100 tasks.
| | Language | % of Benchmark | Domains | | ||
| | --- | --- | --- | | ||
| | **COBOL** | 46% | Financial settlement, payroll processing, insurance claims, telecom billing, VSAM file handling | | ||
| | **Java 7** | 32% | Enterprise middleware, CDR processing, warehouse logistics, binary parsing, EJB patterns | |
There was a problem hiding this comment.
[P2] Java 7 share in the table doesn’t match the task-count chart
The language table lists Java 7 as 32% (line 21), but the per-language chart shows "Java 7 (33)" tasks. If the benchmark is 100 tasks, that row should read 33%; if the benchmark is 99 tasks, the percentage needs to be recalculated. Please update the table (or the chart) so the share and the counts agree.
Summary
Adds the Legacy Bench announcement as a new blog post on the docs site. Legacy Bench is the first benchmark designed to measure frontier AI agent capabilities on legacy software engineering tasks spanning COBOL, Fortran, Java 7, BASIC, C89, and Assembly.
Changes
docs/blog/legacy-bench.mdx-- full blog post converted from the Notion draftdocs/images/docs.jsonwith a "Research" groupContent highlights
Notes
[PLACEHOLDER: Agent comparison chart]) was intentionally omitted -- the text description stands on its ownhttps://github.com/factory-ai/legacy-benchhttps://factory.ai/contact