Skip to content

[layout detection] Task definitions #2009

@felixdittrich92

Description

@felixdittrich92

🚀 The feature

  • datasets: DocLayNet, PubLayNet, DocBank, M6Doc, RanLayNet, PRImA, ? (offline work - merge into unique)
  • model: https://github.com/Zeba-Xie/RTMDet-R2 (two stages: backbone, neck + head + losses)
  • metrics: mmAP (75, 50) - rotated (ref.: https://github.com/open-mmlab/mmrotate/blob/main/mmrotate/core/evaluation/eval_map.py)
  • Implement train / eval / latency scripts - reuse DetectionDataset for KIE annotations ?
  • Integrate into pipeline - standalone predictor - Extend DocumentBuilder components
  • If layout information available improve sorting to keep reading order (heuristic ? graph based ordering (with networkx) ?)
  • Needs other ticket: Allow different output formats (markdown, ..)

Motivation, pitch

TODO: Split into single issues - and add better descriptions

Alternatives

No response

Additional context

No response

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions