-
Notifications
You must be signed in to change notification settings - Fork 1
Add evaluation script for knowledgeqa #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@fcogidi, i have to refactor my judges code after your factory PR gets merged. |
| ground_truth: str, | ||
| answer_type: str = "Single Answer", | ||
| ) -> JudgeResult: | ||
| """Async version of evaluate.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstrings here.
| elif "```" in response_text: | ||
| start = response_text.find("```") + 3 | ||
| end = response_text.find("```", start) | ||
| response_text = response_text[start:end].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this block of code, now it's the 3 strike rule.
| config=types.GenerateContentConfig( | ||
| temperature=self._temperature, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a function called _call_llm_async with this exact block of code. It also appears a couple of times again in other files. Can it be put into a function that does that instead of repeating it?
| elif "```" in response_text: | ||
| start = response_text.find("```") + 3 | ||
| end = response_text.find("```", start) | ||
| response_text = response_text[start:end].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeated code again, with the block above repeated as well.
| global _judge # noqa: PLW0603 | ||
| if _judge is not None: | ||
| _judge.close() | ||
| _judge = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be a better implementation if we wrap those functions and the global var in a class do the singleton approach the same way we do with AsyncClientManager.
…val-agents into ak/add_evaluate_script
…val-agents into ak/add_evaluate_script
Summary
Clickup Ticket(s): Link(s) if applicable.
Type of Change
Changes Made
Testing
uv run pytest tests/)uv run mypy <src_dir>)uv run ruff check src_dir/)Manual testing details:
Screenshots/Recordings
Related Issues
Deployment Notes
Checklist