It would make manual validation of the of benchmark results easier, if a convenient to visualise table was provided.
Perhaps two tables is the best solution, where
- the first table has query, response, expected response (incl. not answer ..., yes and no) and foreign keys to
- a second table that contains the context(s)