Add support for manual inspection of the responses and expected answers

It would make manual validation of the of benchmark results easier, if a convenient to visualise table was provided. 
Perhaps two tables is the best solution, where
1. the first table has query, response, expected response (incl. not answer ..., yes and no) and foreign keys to
2. a second table that contains the context(s)