Skip to content

ADFA-3910 | Fix single placeholder dropdown array generation in strings file#1274

Open
jatezzz wants to merge 3 commits intostagefrom
fix/ADFA-3910-dropdown-placeholder-string-xml-experimental
Open

ADFA-3910 | Fix single placeholder dropdown array generation in strings file#1274
jatezzz wants to merge 3 commits intostagefrom
fix/ADFA-3910-dropdown-placeholder-string-xml-experimental

Conversation

@jatezzz
Copy link
Copy Markdown
Collaborator

@jatezzz jatezzz commented May 5, 2026

Description

Fixed an issue where single placeholder labels (e.g., "dropdown", "select", "choose") sketched inside a spinner/dropdown widget incorrectly generated a string-array in strings.xml. Additionally, OCR tag parsing logic has been refactored and centralized into a new WidgetTagParser object.

Details

  • Added isSinglePlaceholderEntry() check in SpinnerWidget to bypass string array creation when the only item is a known placeholder.
  • Extracted and centralized OCR normalization and tag extraction logic from MarginAnnotationParser and WidgetAnnotationMatcher into WidgetTagParser.
  • Improved regex tokenization for splitting entry candidates in AttributeValidator.
document_5174968672900351910.mp4

Ticket

ADFA-3910

Observation

Centralizing the tag parsing logic into WidgetTagParser significantly cleans up the margin and annotation matching domains, ensuring that OCR text normalization (e.g., replacing 'l' with '1', 'O' with '0') is applied consistently across the entire pipeline.

@jatezzz jatezzz requested review from a team and Daniel-ADFA May 5, 2026 18:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b46c43a-22ca-450b-ab8c-3a4f5a31e547

📥 Commits

Reviewing files that changed from the base of the PR and between 19d42ff and bba5231.

📒 Files selected for processing (6)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetAnnotationMatcher.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt

📝 Walkthrough

Release Notes

Features & Improvements

  • Fixed placeholder dropdown handling: Spinner/dropdown widgets containing only a single placeholder label (e.g., "dropdown", "select", "choose") no longer generate unnecessary string-array entries in strings.xml
  • Centralized OCR text parsing: Extracted and consolidated OCR normalization and tag extraction logic from MarginAnnotationParser and WidgetAnnotationMatcher into new WidgetTagParser utility for consistent text processing across the pipeline
  • Enhanced OCR artifact handling: Improved handling of common OCR misreadings including character normalization (e.g., 'l'→'1', 'O'→'0', '8'→'B', '8W'/'S8'→'SW')
  • Improved entry tokenization: Refined regex-based splitting of dropdown/spinner entry candidates in AttributeValidator and new OcrExtensions utility

Technical Changes

  • New WidgetTagParser internal utility (93 lines) provides standardized tag parsing with OCR normalization
  • New OcrExtensions.extractOcrEntries() function for intelligent OCR-aware string tokenization (punctuation-first, with whitespace fallback)
  • AndroidWidget.resolveWidgetId() visibility changed from private to protected open to support widget subclass customization
  • Added resource name sanitization utilities (sanitizeResourceName(), regex patterns) to AndroidWidget
  • Refactored tag extraction in MarginAnnotationParser (±46 lines) and WidgetAnnotationMatcher (±30 lines) to delegate to centralized parser
  • Removed internal extractOrdinal() helper and tag regex patterns from multiple files in favor of WidgetTagParser delegation

Risks & Best Practice Concerns

  • ⚠️ Multi-file refactoring risk: Changes touch 5+ core files (MarginAnnotationParser, WidgetAnnotationMatcher, AttributeValidator, AndroidWidget, plus 2 new files). High potential for regression in OCR parsing and widget annotation matching workflows. Comprehensive testing across all annotation types is critical.
  • ⚠️ Visibility modifier change: resolveWidgetId() changed from private to protected open, increasing API surface area. This enables subclass customization but could allow unintended overrides if not properly documented.
  • ⚠️ Entry parsing behavior changes: Spinner entry parsing now uses OCR-aware extraction with placeholder filtering. May alter results for edge cases with malformed OCR text or unusual delimiter patterns compared to previous comma-split logic.
  • ⚠️ Implicit block filtering change: Removed isCanvasMetadata check from implicit blocks filtering in MarginAnnotationParser. Verify this doesn't inadvertently include unwanted canvas metadata in certain annotation scenarios.
  • ⚠️ OCR normalization scope: Centralization improves consistency but requires validation that all OCR misread mappings (e.g., character → digit conversions) are correct and complete for your OCR engine's typical outputs.
  • Missing test coverage evidence: No test files referenced in change summary. Ensure adequate unit tests for new WidgetTagParser and OcrExtensions utilities, particularly OCR normalization edge cases and placeholder detection logic.

Migration Notes

  • Existing spinner/dropdown widgets with single-item placeholders will no longer generate string resources—this is the intended fix but may affect resource file generation patterns
  • OCR text normalization is now centralized; if custom OCR handling existed elsewhere, verify consistency with WidgetTagParser logic

Walkthrough

This PR centralizes widget tag parsing into a new WidgetTagParser, updates margin/annotation parsing to use it, adds OCR-aware entry extraction utilities, and adjusts Android widget ID resolution and spinner entry handling. Changes refactor tag/ordinal extraction, metadata filtering, and spinner entry parsing.

Changes

Tag Parsing Centralization

Layer / File(s) Summary
New Tag Parser Utility
cv-image-to-xml/src/main/java/.../WidgetTagParser.kt
Adds WidgetTagParser with regex-based tag validation, normalization (including OCR misread fixes like 8→B, 8W/S8→SW), extraction (tag + optional trailing text), and ordinal extraction.
Annotation Matcher Refactoring
cv-image-to-xml/src/main/java/.../WidgetAnnotationMatcher.kt
Removes local tag regex/helpers and delegates normalization, tag detection, and ordinal extraction to WidgetTagParser; updates distinctBy/dedup and matching logic to use WidgetTagParser utilities.
Margin Annotation Refactoring
cv-image-to-xml/src/main/java/.../MarginAnnotationParser.kt
Adds GAP_MULTIPLIER and HEIGHT_FRACTION constants; replaces internal tag/ordinal helpers with WidgetTagParser.extractTag/extractOrdinal; tightens explicit-block requirements (non-blank annotationText + tag) and adjusts metadata filtering for implicit blocks; unresolved-tag sorting uses WidgetTagParser.

OCR-Aware Entry Extraction & Widget Enhancement

Layer / File(s) Summary
New OCR Extension Utility
cv-image-to-xml/src/main/java/.../OcrExtensions.kt
Adds String.extractOcrEntries() that splits OCR text by punctuation/newlines first, falls back to whitespace tokenization, and validates numeric-like OCR tokens.
Attribute Validator Update
cv-image-to-xml/src/main/java/.../grammar/AttributeValidator.kt
Switches entry parsing in EntriesValidator.validate from comma-splitting to extractOcrEntries(); removes isEnclosedInBrackets helper.
Android Widget Enhancement
cv-image-to-xml/src/main/java/.../xml/AndroidWidget.kt
Makes resolveWidgetId protected open; adds sanitizeResourceName and regex helpers; enhances SpinnerWidget to use extractOcrEntries(), guard against placeholder-only entries, normalize dropdown labels, and prefer explicit/requested IDs when resolving spinner IDs.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • avestaadfa
  • Daniel-ADFA

Poem

🐰 I hop through tags and tidy text,

OCR whispers I parse next.
Widgets named and gaps refined,
IDs neat, spinner lists aligned.
A happy rabbit leaves no mess behind.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main objective: preventing single placeholder dropdown labels from incorrectly generating string arrays in strings.xml.
Description check ✅ Passed The description is closely related to the changeset, explaining the primary fix and mentioning the secondary refactoring effort to centralize OCR tag parsing logic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ADFA-3910-dropdown-placeholder-string-xml-experimental

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jatezzz jatezzz requested a review from avestaadfa May 6, 2026 13:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt (1)

12-29: 💤 Low value

Solid OCR-aware splitting; one optional simplification.

The branching (punctuation-first, then digit-only whitespace fallback) sensibly avoids over-splitting non-numeric phrases while still handling space-separated numeric OCR like "1 2 3". The empty-input case (size != 1) gracefully short-circuits to an empty list.

Minor nit: the this.trim() on line 22 is redundant given that WHITESPACE_DELIMITERS = "\\s+" plus filter { it.isNotBlank() } already discards leading/trailing whitespace tokens. Removing it would reduce one allocation per call.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt`
around lines 12 - 29, In extractOcrEntries(), remove the redundant this.trim()
call when creating whitespaceTokens since WHITESPACE_DELIMITERS ("\\s+") plus
filter { it.isNotBlank() } already handles leading/trailing spaces; update the
whitespaceTokens assignment in the function (referencing WHITESPACE_DELIMITERS
and whitespaceTokens) to split directly on WHITESPACE_DELIMITERS and filter
blanks, leaving the rest of the function logic (punctuatedTokens,
isSpaceSeparatedOcrNumbers, and the return) unchanged.
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt (1)

123-178: 💤 Low value

Single-placeholder bypass implemented correctly; consider unifying placeholder knowledge.

The core fix — isSinglePlaceholderEntry() gating string-array creation in processAttributes (line 158) — is precise and matches the PR objective. Nicely scoped: ID derivation/resolution still proceeds normally, only the string-array write is suppressed.

Optional consistency nit: isMeaningfulDropdownText (line 202) only excludes "dropdown", while placeholderEntries covers year/month/day/select/choose/dropdown. As a result, a box.text of "select" will still pass isMeaningfulDropdownText and feed into rawEntries, only to be filtered out later by isSinglePlaceholderEntry. Routing both checks through the same set would make the intent uniform and avoid future drift if the placeholder list grows.

♻️ Suggested alignment
     private fun String.isMeaningfulDropdownText(): Boolean {
-        val cleaned = normalizedDropdownLabel()
-        return cleaned.isNotBlank() && !cleaned.equals("dropdown", ignoreCase = true)
+        val cleaned = normalizedDropdownLabel()
+        return cleaned.isNotBlank() && cleaned.lowercase() !in placeholderEntries
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`
around lines 123 - 178, The placeholder list is duplicated: companion object
placeholderEntries is used by isSinglePlaceholderEntry but
isMeaningfulDropdownText uses a different hardcoded check; unify them by making
isMeaningfulDropdownText consult the same placeholderEntries set (or move
placeholderEntries to a shared util/constant) and ensure both use the same
normalization/lowercasing (call normalizedDropdownLabel().lowercase() or
equivalent) so processAttributes, isSinglePlaceholderEntry, and
isMeaningfulDropdownText all reference the same canonical placeholder set.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`:
- Around line 60-83: normalizeTagToken calls normalizeOcrDigits with an
UPPERCASED token but normalizeOcrDigits currently targets lowercase letters and
only maps O/I/! → digits, causing inconsistency with isNumericLikeOcrChar and
EntriesValidator.cleanNumberArtifacts; fix by making normalizeOcrDigits operate
on the uppercase input and include the same OCR-to-digit mappings (e.g., map 'I'
and '!' and 'L' → '1', 'O' → '0', and add 'Z' → '2', 'S' → '5', 'B' → '6') so
normalizeOcrDigits, isNumericLikeOcrChar, and extractOrdinal behavior are
consistent (update the normalizeOcrDigits implementation referenced by
normalizeTagToken and keep isNumericLikeOcrChar as-is).

---

Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt`:
- Around line 123-178: The placeholder list is duplicated: companion object
placeholderEntries is used by isSinglePlaceholderEntry but
isMeaningfulDropdownText uses a different hardcoded check; unify them by making
isMeaningfulDropdownText consult the same placeholderEntries set (or move
placeholderEntries to a shared util/constant) and ensure both use the same
normalization/lowercasing (call normalizedDropdownLabel().lowercase() or
equivalent) so processAttributes, isSinglePlaceholderEntry, and
isMeaningfulDropdownText all reference the same canonical placeholder set.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt`:
- Around line 12-29: In extractOcrEntries(), remove the redundant this.trim()
call when creating whitespaceTokens since WHITESPACE_DELIMITERS ("\\s+") plus
filter { it.isNotBlank() } already handles leading/trailing spaces; update the
whitespaceTokens assignment in the function (referencing WHITESPACE_DELIMITERS
and whitespaceTokens) to split directly on WHITESPACE_DELIMITERS and filter
blanks, leaving the rest of the function logic (punctuatedTokens,
isSpaceSeparatedOcrNumbers, and the return) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3c853016-4505-4a8b-827a-8ca2c2356296

📥 Commits

Reviewing files that changed from the base of the PR and between d087123 and 19d42ff.

📒 Files selected for processing (6)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetAnnotationMatcher.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/grammar/AttributeValidator.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/xml/AndroidWidget.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/utils/OcrExtensions.kt

jatezzz added 2 commits May 6, 2026 10:29
Extract WidgetTagParser and ignore isolated placeholder entries like `select` or `dropdown` from strings.xml
@jatezzz jatezzz force-pushed the fix/ADFA-3910-dropdown-placeholder-string-xml-experimental branch from 9f74584 to fda1b16 Compare May 6, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants