Skip to content

For rebuttal, add UIE eval results#103

Open
Ki-Seki wants to merge 1 commit into
rebuttal/uie-modelfrom
rebuttal/uie-model-eval-results
Open

For rebuttal, add UIE eval results#103
Ki-Seki wants to merge 1 commit into
rebuttal/uie-modelfrom
rebuttal/uie-model-eval-results

Conversation

@Ki-Seki
Copy link
Copy Markdown
Member

@Ki-Seki Ki-Seki commented Apr 10, 2026

No description provided.

Copilot AI review requested due to automatic review settings April 10, 2026 18:24
@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented Apr 10, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
30182866 Triggered OpenRouter API Key e03502f results/260411-kdd-rebuttal-cv-uie-model/eval.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CV-parse evaluation artifacts for the “PP-UIE-7B” UIE model run, intended to support KDD rebuttal reporting.

Changes:

  • Added a full JSON dump of CV parsing evaluation outputs and summary metrics.
  • Added a helper eval.sh script to reproduce the run against OpenRouter.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

File Description
results/260411-kdd-rebuttal-cv-uie-model/PP-UIE-7B_Sculpt-AI_GIMBench-cv-parse_260411-020045.json Stores per-item extraction results and aggregate accuracy for the UIE CV-parse evaluation.
results/260411-kdd-rebuttal-cv-uie-model/eval.sh Script to run the same evaluation configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,6 @@
export API_KEY=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
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script commits a live OpenRouter API key (sk-or-v1-…) into the repository. Remove the key from the script, rotate/revoke it immediately, and read it from an environment variable or secret manager at runtime (with a clear error if unset).

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +6
python -m gimbench.cv.cv_parse --use_uie --model_name "PP-UIE-7B" --api_key $API_KEY --base_url $API_BASE

shutdown -h +3
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including shutdown -h +3 in a repo script is risky because running this file will power off the host machine unexpectedly (including in CI or by other developers). Consider removing it, or guarding it behind an explicit opt-in flag/environment variable so evaluation can be run safely by default.

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +97
"phone_number": {
"prediction": "+ 4 9$ (0) 621 181 2098",
"expected": "+49 (0) 621 181 2098",
"verbatim_correct": false,
"judge_model_correct": true,
"correct": true
},
"email": {
"prediction": "b n.zhang@uni-mannheim.de",
"expected": "n.zhang@uni-mannheim.de",
"verbatim_correct": false,
"judge_model_correct": false,
"correct": false
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results JSON appears to contain personal data from CVs (e.g., emails and phone numbers under extraction_details). If this repository is shared publicly, committing per-example PII is a privacy/security risk; consider redacting these fields or only committing aggregated metrics (e.g., totals/accuracy) rather than raw predictions/expected values.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants