-
Notifications
You must be signed in to change notification settings - Fork 221
Labels
area/promptsarea/qualityTracks quality issuesTracks quality issueskind/enhancementNew feature or requestNew feature or requestpriority/p0Critical and urgent e.g., critical security vulnerability, major breakageCritical and urgent e.g., critical security vulnerability, major breakage
Description
Problem
Currently, we have no systematic way to:
- Measure how well these workflows perform their intended tasks
- Test prompt improvements before deploying them
- Compare different prompt variations or model configurations
- Validate that changes don't regress quality
- Provide quality benchmarks for the community
Solution
Use the Gemini CLI evaluation framework to systematically test and improve the effectiveness of prompts and configurations used in our example workflows. This will enable data-driven optimization of our provided workflows and give the community tools to evaluate their own Gemini CLI automations.
Dependencies
- Gemini CLI evaluation framework: Gemini CLI Evaluation Framework for Specific Use Cases google-gemini/gemini-cli#6757
- Prompt Reusability: Reusability of Prompts #76
References
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
area/promptsarea/qualityTracks quality issuesTracks quality issueskind/enhancementNew feature or requestNew feature or requestpriority/p0Critical and urgent e.g., critical security vulnerability, major breakageCritical and urgent e.g., critical security vulnerability, major breakage