ADFA-3910 | Fix single placeholder dropdown array generation in strings file by jatezzz · Pull Request #1274 · appdevforall/CodeOnTheGo

jatezzz · 2026-05-05T18:15:59Z

Description

Fixed an issue where single placeholder labels (e.g., "dropdown", "select", "choose") sketched inside a spinner/dropdown widget incorrectly generated a string-array in strings.xml. Additionally, OCR tag parsing logic has been refactored and centralized into a new WidgetTagParser object.

Details

Added isSinglePlaceholderEntry() check in SpinnerWidget to bypass string array creation when the only item is a known placeholder.
Extracted and centralized OCR normalization and tag extraction logic from MarginAnnotationParser and WidgetAnnotationMatcher into WidgetTagParser.
Improved regex tokenization for splitting entry candidates in AttributeValidator.

document_5174968672900351910.mp4

Ticket

ADFA-3910

Observation

Centralizing the tag parsing logic into WidgetTagParser significantly cleans up the margin and annotation matching domains, ensuring that OCR text normalization (e.g., replacing 'l' with '1', 'O' with '0') is applied consistently across the entire pipeline.

coderabbitai · 2026-05-05T18:16:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b46c43a-22ca-450b-ab8c-3a4f5a31e547

📥 Commits

Reviewing files that changed from the base of the PR and between 19d42ff and bba5231.

📒 Files selected for processing (6)

cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetAnnotationMatcher.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt

📝 Walkthrough

Release Notes

Features & Improvements

Fixed placeholder dropdown handling: Spinner/dropdown widgets containing only a single placeholder label (e.g., "dropdown", "select", "choose") no longer generate unnecessary string-array entries in strings.xml
Centralized OCR text parsing: Extracted and consolidated OCR normalization and tag extraction logic from MarginAnnotationParser and WidgetAnnotationMatcher into new WidgetTagParser utility for consistent text processing across the pipeline
Enhanced OCR artifact handling: Improved handling of common OCR misreadings including character normalization (e.g., 'l'→'1', 'O'→'0', '8'→'B', '8W'/'S8'→'SW')
Improved entry tokenization: Refined regex-based splitting of dropdown/spinner entry candidates in AttributeValidator and new OcrExtensions utility

Technical Changes

New WidgetTagParser internal utility (93 lines) provides standardized tag parsing with OCR normalization
New OcrExtensions.extractOcrEntries() function for intelligent OCR-aware string tokenization (punctuation-first, with whitespace fallback)
AndroidWidget.resolveWidgetId() visibility changed from private to protected open to support widget subclass customization
Added resource name sanitization utilities (sanitizeResourceName(), regex patterns) to AndroidWidget
Refactored tag extraction in MarginAnnotationParser (±46 lines) and WidgetAnnotationMatcher (±30 lines) to delegate to centralized parser
Removed internal extractOrdinal() helper and tag regex patterns from multiple files in favor of WidgetTagParser delegation

Risks & Best Practice Concerns

⚠️ Multi-file refactoring risk: Changes touch 5+ core files (MarginAnnotationParser, WidgetAnnotationMatcher, AttributeValidator, AndroidWidget, plus 2 new files). High potential for regression in OCR parsing and widget annotation matching workflows. Comprehensive testing across all annotation types is critical.
⚠️ Visibility modifier change: resolveWidgetId() changed from private to protected open, increasing API surface area. This enables subclass customization but could allow unintended overrides if not properly documented.
⚠️ Entry parsing behavior changes: Spinner entry parsing now uses OCR-aware extraction with placeholder filtering. May alter results for edge cases with malformed OCR text or unusual delimiter patterns compared to previous comma-split logic.
⚠️ Implicit block filtering change: Removed isCanvasMetadata check from implicit blocks filtering in MarginAnnotationParser. Verify this doesn't inadvertently include unwanted canvas metadata in certain annotation scenarios.
⚠️ OCR normalization scope: Centralization improves consistency but requires validation that all OCR misread mappings (e.g., character → digit conversions) are correct and complete for your OCR engine's typical outputs.
Missing test coverage evidence: No test files referenced in change summary. Ensure adequate unit tests for new WidgetTagParser and OcrExtensions utilities, particularly OCR normalization edge cases and placeholder detection logic.

Migration Notes

Existing spinner/dropdown widgets with single-item placeholders will no longer generate string resources—this is the intended fix but may affect resource file generation patterns
OCR text normalization is now centralized; if custom OCR handling existed elsewhere, verify consistency with WidgetTagParser logic

Walkthrough

This PR centralizes widget tag parsing into a new WidgetTagParser, updates margin/annotation parsing to use it, adds OCR-aware entry extraction utilities, and adjusts Android widget ID resolution and spinner entry handling. Changes refactor tag/ordinal extraction, metadata filtering, and spinner entry parsing.

Changes

Tag Parsing Centralization

Layer / File(s)	Summary
New Tag Parser Utility `cv-image-to-xml/src/main/java/.../WidgetTagParser.kt`	Adds `WidgetTagParser` with regex-based tag validation, normalization (including OCR misread fixes like 8→B, 8W/S8→SW), extraction (tag + optional trailing text), and ordinal extraction.
Annotation Matcher Refactoring `cv-image-to-xml/src/main/java/.../WidgetAnnotationMatcher.kt`	Removes local tag regex/helpers and delegates normalization, tag detection, and ordinal extraction to `WidgetTagParser`; updates distinctBy/dedup and matching logic to use `WidgetTagParser` utilities.
Margin Annotation Refactoring `cv-image-to-xml/src/main/java/.../MarginAnnotationParser.kt`	Adds `GAP_MULTIPLIER` and `HEIGHT_FRACTION` constants; replaces internal tag/ordinal helpers with `WidgetTagParser.extractTag`/`extractOrdinal`; tightens explicit-block requirements (non-blank annotationText + tag) and adjusts metadata filtering for implicit blocks; unresolved-tag sorting uses `WidgetTagParser`.

OCR-Aware Entry Extraction & Widget Enhancement

Layer / File(s)	Summary
New OCR Extension Utility `cv-image-to-xml/src/main/java/.../OcrExtensions.kt`	Adds `String.extractOcrEntries()` that splits OCR text by punctuation/newlines first, falls back to whitespace tokenization, and validates numeric-like OCR tokens.
Attribute Validator Update `cv-image-to-xml/src/main/java/.../grammar/AttributeValidator.kt`	Switches entry parsing in `EntriesValidator.validate` from comma-splitting to `extractOcrEntries()`; removes `isEnclosedInBrackets` helper.
Android Widget Enhancement `cv-image-to-xml/src/main/java/.../xml/AndroidWidget.kt`	Makes `resolveWidgetId` `protected open`; adds `sanitizeResourceName` and regex helpers; enhances `SpinnerWidget` to use `extractOcrEntries()`, guard against placeholder-only entries, normalize dropdown labels, and prefer explicit/requested IDs when resolving spinner IDs.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

appdevforall/CodeOnTheGo#1234: Overlaps OCR-driven spinner entry parsing and related validator changes.
appdevforall/CodeOnTheGo#1028: Related edits to margin/tag parsing and MarginAnnotationParser.
appdevforall/CodeOnTheGo#1241: Overlapping updates to metadata detection and tag/attribute extraction in annotation parsing.

Suggested reviewers

avestaadfa
Daniel-ADFA

Poem

🐰 I hop through tags and tidy text,

OCR whispers I parse next.
Widgets named and gaps refined,
IDs neat, spinner lists aligned.
A happy rabbit leaves no mess behind.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly addresses the main objective: preventing single placeholder dropdown labels from incorrectly generating string arrays in strings.xml.
Description check	✅ Passed	The description is closely related to the changeset, explaining the primary fix and mentioning the secondary refactoring effort to centralize OCR tag parsing logic.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ADFA-3910-dropdown-placeholder-string-xml-experimental

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt (1)
12-29: 💤 Low value

Solid OCR-aware splitting; one optional simplification.

The branching (punctuation-first, then digit-only whitespace fallback) sensibly avoids over-splitting non-numeric phrases while still handling space-separated numeric OCR like "1 2 3". The empty-input case (size != 1) gracefully short-circuits to an empty list.

Minor nit: the this.trim() on line 22 is redundant given that WHITESPACE_DELIMITERS = "\\s+" plus filter { it.isNotBlank() } already discards leading/trailing whitespace tokens. Removing it would reduce one allocation per call.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt`
around lines 12 - 29, In extractOcrEntries(), remove the redundant this.trim()
call when creating whitespaceTokens since WHITESPACE_DELIMITERS ("\\s+") plus
filter { it.isNotBlank() } already handles leading/trailing spaces; update the
whitespaceTokens assignment in the function (referencing WHITESPACE_DELIMITERS
and whitespaceTokens) to split directly on WHITESPACE_DELIMITERS and filter
blanks, leaving the rest of the function logic (punctuatedTokens,
isSpaceSeparatedOcrNumbers, and the return) unchanged.
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt (1)
123-178: 💤 Low value

Single-placeholder bypass implemented correctly; consider unifying placeholder knowledge.

The core fix — isSinglePlaceholderEntry() gating string-array creation in processAttributes (line 158) — is precise and matches the PR objective. Nicely scoped: ID derivation/resolution still proceeds normally, only the string-array write is suppressed.

Optional consistency nit: isMeaningfulDropdownText (line 202) only excludes "dropdown", while placeholderEntries covers year/month/day/select/choose/dropdown. As a result, a box.text of "select" will still pass isMeaningfulDropdownText and feed into rawEntries, only to be filtered out later by isSinglePlaceholderEntry. Routing both checks through the same set would make the intent uniform and avoid future drift if the placeholder list grows.
♻️ Suggested alignment
     private fun String.isMeaningfulDropdownText(): Boolean {
-        val cleaned = normalizedDropdownLabel()
-        return cleaned.isNotBlank() && !cleaned.equals("dropdown", ignoreCase = true)
+        val cleaned = normalizedDropdownLabel()
+        return cleaned.isNotBlank() && cleaned.lowercase() !in placeholderEntries
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`
around lines 123 - 178, The placeholder list is duplicated: companion object
placeholderEntries is used by isSinglePlaceholderEntry but
isMeaningfulDropdownText uses a different hardcoded check; unify them by making
isMeaningfulDropdownText consult the same placeholderEntries set (or move
placeholderEntries to a shared util/constant) and ensure both use the same
normalization/lowercasing (call normalizedDropdownLabel().lowercase() or
equivalent) so processAttributes, isSinglePlaceholderEntry, and
isMeaningfulDropdownText all reference the same canonical placeholder set.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`:
- Around line 60-83: normalizeTagToken calls normalizeOcrDigits with an
UPPERCASED token but normalizeOcrDigits currently targets lowercase letters and
only maps O/I/! → digits, causing inconsistency with isNumericLikeOcrChar and
EntriesValidator.cleanNumberArtifacts; fix by making normalizeOcrDigits operate
on the uppercase input and include the same OCR-to-digit mappings (e.g., map 'I'
and '!' and 'L' → '1', 'O' → '0', and add 'Z' → '2', 'S' → '5', 'B' → '6') so
normalizeOcrDigits, isNumericLikeOcrChar, and extractOrdinal behavior are
consistent (update the normalizeOcrDigits implementation referenced by
normalizeTagToken and keep isNumericLikeOcrChar as-is).

---

Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`:
- Around line 123-178: The placeholder list is duplicated: companion object
placeholderEntries is used by isSinglePlaceholderEntry but
isMeaningfulDropdownText uses a different hardcoded check; unify them by making
isMeaningfulDropdownText consult the same placeholderEntries set (or move
placeholderEntries to a shared util/constant) and ensure both use the same
normalization/lowercasing (call normalizedDropdownLabel().lowercase() or
equivalent) so processAttributes, isSinglePlaceholderEntry, and
isMeaningfulDropdownText all reference the same canonical placeholder set.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt`:
- Around line 12-29: In extractOcrEntries(), remove the redundant this.trim()
call when creating whitespaceTokens since WHITESPACE_DELIMITERS ("\\s+") plus
filter { it.isNotBlank() } already handles leading/trailing spaces; update the
whitespaceTokens assignment in the function (referencing WHITESPACE_DELIMITERS
and whitespaceTokens) to split directly on WHITESPACE_DELIMITERS and filter
blanks, leaving the rest of the function logic (punctuatedTokens,
isSpaceSeparatedOcrNumbers, and the return) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c853016-4505-4a8b-827a-8ca2c2356296

📥 Commits

Reviewing files that changed from the base of the PR and between d087123 and 19d42ff.

📒 Files selected for processing (6)

cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetAnnotationMatcher.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt

Extract WidgetTagParser and ignore isolated placeholder entries like `select` or `dropdown` from strings.xml

jatezzz requested review from a team and Daniel-ADFA May 5, 2026 18:16

jatezzz requested a review from avestaadfa May 6, 2026 13:05

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread ...e-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt

jatezzz added 2 commits May 6, 2026 10:29

fix: prevent string-array generation for single placeholder dropdowns

c4a5ecd

Extract WidgetTagParser and ignore isolated placeholder entries like `select` or `dropdown` from strings.xml

fix: Inconsistent OCR digit normalization

fda1b16

jatezzz force-pushed the fix/ADFA-3910-dropdown-placeholder-string-xml-experimental branch from 9f74584 to fda1b16 Compare May 6, 2026 15:29

Daniel-ADFA reviewed May 6, 2026

View reviewed changes

Comment thread ...e-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt Outdated

refactor: use regex instead of hardcoded text

bba5231

jatezzz mentioned this pull request May 6, 2026

ADFA-3912 | Fix fuzzy attribute parsing bounds and UI element tag filtering #1275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADFA-3910 | Fix single placeholder dropdown array generation in strings file#1274

ADFA-3910 | Fix single placeholder dropdown array generation in strings file#1274
jatezzz wants to merge 3 commits intostagefrom
fix/ADFA-3910-dropdown-placeholder-string-xml-experimental

jatezzz commented May 5, 2026 •

edited by atlassian Bot

Loading

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Release Notes

Features & Improvements

Technical Changes

Risks & Best Practice Concerns

Migration Notes

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jatezzz commented May 5, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Details

Ticket

Observation

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

Features & Improvements

Technical Changes

Risks & Best Practice Concerns

Migration Notes

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jatezzz commented May 5, 2026 •

edited by atlassian Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading