Fix/linebreaks conversion quotes by Elimpizza · Pull Request #7511 · hypothesis/client

Elimpizza · 2026-02-26T14:19:17Z

Summary

Normalize quote selectors so we store and match what users see. foo bar becomes "foo bar", not "foobar". Prefix/suffix are normalized too.

Details

HTML (TextQuoteAnchor)

DOM walk produces rendered text (spaces at   and block boundaries, whitespace collapsed) plus forward/reverse offset maps between raw textContent and rendered text.

fromRange uses the maps to slice exact / prefix / suffix from rendered text.
toPositionAnchor matches against rendered text and maps offsets back to raw DOM coordinates for TextPositionSelector.

Why use explicit maps instead of reusing translateOffsets

translateOffsets aligns by counting non-whitespace chars, which works for PDF where both strings have the same characters with different spacing. Our rendered text contains synthesized characters (the space at a   has no source in textContent); counting can't align those, and using translateOffsets shifted real anchors by one character. The maps record the correspondence during the walk, so synthesized characters are tracked correctly.

PDF
Selectors and page text normalized via normalizePDFText; match offsets translated back via translateOffsets (no synthesized characters here, so it's the right tool). isSpace/isNotSpace lifted into util/normalize.ts to share with HTML. Redundant [\r\n]+ in normalizePDFText dropped (\s+ covers it).

codecov · 2026-02-26T14:29:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.61%. Comparing base (b07c1b8) to head (00cd6a6).
⚠️ Report is 31 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #7511   +/-   ##
=======================================
  Coverage   99.61%   99.61%           
=======================================
  Files         283      285    +2     
  Lines       11877    11947   +70     
  Branches     2898     2914   +16     
=======================================
+ Hits        11831    11901   +70     
  Misses         46       46

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This pull request normalizes quote selector creation and anchoring for both HTML and PDF documents to ensure that stored selectors match what users see in rendered text. The key change is that line breaks (from   tags and block elements in HTML, or newlines in PDF) are now converted to spaces, and consecutive whitespace is collapsed to single spaces. This prevents issues where text like foo bar was previously stored as "foobar" but is now correctly stored as "foo bar" to match the visual rendering.

Changes:

Introduced rendered-text.ts module that builds normalized text from HTML DOM with offset mappings between raw and normalized positions
Updated TextQuoteAnchor to use normalized text when creating and matching selectors
Applied consistent PDF text normalization in selector creation and anchoring
Normalized quote display in the UI component to match the stored format
Updated test fixtures and baselines to reflect normalized selector output

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/annotator/anchoring/rendered-text.ts	New module providing HTML text normalization with offset mapping for converting between raw and normalized positions
src/annotator/anchoring/types.ts	Updated TextQuoteAnchor to use normalized text for selector creation and matching
src/annotator/anchoring/pdf.ts	Applied consistent PDF text normalization in describe() and anchor() paths
src/sidebar/components/Annotation/AnnotationQuote.tsx	Normalized quote display in UI to match stored format
src/annotator/anchoring/test/rendered-text-test.js	New tests for the rendered-text normalization module
src/annotator/anchoring/test/types-test.js	Updated test expectations to match normalized selector format and relaxed some assertions
src/annotator/anchoring/test/pdf-test.js	Updated test expectations and relaxed some assertions to accommodate normalization
src/annotator/anchoring/test/html-test.js	Added normalization helpers and updated tests to compare normalized selectors
src/annotator/anchoring/test/html-baselines/wikipedia-regression-testing.json	Updated baseline expectations with normalized prefix/suffix values
src/annotator/anchoring/test/html-baselines/minimal.json	Updated baseline expectations with normalized prefix/suffix values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… ' ' for consistency

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…n tests

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

…lity

karenrasmussen · 2026-05-08T16:47:27Z

+   * Rendered/normalized text of the root: collapsed whitespace, with a single
+   * space inserted at each `<br>` and block-tag boundary.


Don't collapse whitespace or block-level tags. Preserve the same behavior as root.textContent, but add a space for   elements

karenrasmussen · 2026-05-08T17:24:24Z

+    // whitespace differences when matching, and keeping them raw preserves
+    // backward compatibility with selectors stored before the rendered-text
+    // normalization landed.
+    const prefix = rawText.slice(Math.max(0, rawStart - contextLen), rawStart);


We should normalize this

Elimpizza added 4 commits February 25, 2026 14:28

fix(quote selectors): preserve quote spacing after linebreaks

cd3c94d

fix(anchoring): normalize quote anchoring for html and pdf

379a9b7

fix(anchoring): normalize quote anchoring and baselines

47f73d9

fix(lint): curl brackets in ifs, removed unused normalized variables

b9aeb5a

Elimpizza added 3 commits February 26, 2026 12:29

fix(coverage): added anchoring normalization coverage

1e2f4a2

fix(coverage): missing 2 lines

6c63734

fix(coverage): added 1 testcase for 1 missing uncovered line

e8b750c

Elimpizza marked this pull request as ready for review February 26, 2026 16:14

Elimpizza requested review from Copilot and karenrasmussen February 26, 2026 16:14

Copilot started reviewing on behalf of Elimpizza February 26, 2026 16:30 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Elimpizza and others added 2 commits February 26, 2026 14:05

Update src/annotator/anchoring/rendered-text.ts

635a18c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix(test): tightened anhoring matchQuote assertions and checks

d42bef5

Elimpizza requested review from Copilot and gmorador-tribu and removed request for karenrasmussen February 26, 2026 17:55

Copilot started reviewing on behalf of Elimpizza February 26, 2026 17:55 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Comment thread src/annotator/anchoring/pdf.ts Outdated

Comment thread src/annotator/anchoring/pdf.ts Outdated

Comment thread src/annotator/anchoring/pdf.ts

Comment thread src/annotator/anchoring/rendered-text.ts Outdated

Comment thread src/annotator/anchoring/rendered-text.ts Outdated

Elimpizza added 3 commits February 27, 2026 09:32

fix(pdf): added trim after normalizaton for consistency

b954e6d

fix(space considering): went back to isNotSpace from char => char !==…

aeebeba

… ' ' for consistency

fix(rawToNorm): prevented type error

1e3dac0

Elimpizza requested a review from Copilot March 23, 2026 13:03

Copilot started reviewing on behalf of Elimpizza March 23, 2026 13:03 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Comment thread src/annotator/anchoring/types.ts Outdated

fix(codebase and testcase) trim normalized exact in textquote selectors

03f24ab

Elimpizza requested a review from Copilot March 23, 2026 14:17

Copilot started reviewing on behalf of Elimpizza March 23, 2026 14:18 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

karenrasmussen changed the title ~~Fix/linebreaks conversion quotes~~ WIP Fix/linebreaks conversion quotes Apr 14, 2026

karenrasmussen changed the title ~~WIP Fix/linebreaks conversion quotes~~ Fix/linebreaks conversion quotes Apr 14, 2026

Elimpizza added 2 commits May 6, 2026 10:10

refactor(anchoring): flatten rendered-text + share isNotSpace, tighte…

6440c17

…n tests

prettier format

b1156fc

Elimpizza self-assigned this May 6, 2026

Elimpizza requested a review from karenrasmussen May 6, 2026 13:32

karenrasmussen requested a review from Copilot May 6, 2026 20:55

Copilot started reviewing on behalf of karenrasmussen May 6, 2026 20:55 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread src/annotator/util/normalize.ts Outdated

Comment thread src/annotator/anchoring/rendered-text.ts Outdated

Comment thread src/annotator/anchoring/pdf.ts Outdated

Comment thread src/sidebar/components/Annotation/AnnotationQuote.tsx Outdated

karenrasmussen reviewed May 7, 2026

View reviewed changes

Comment thread src/annotator/anchoring/types.ts Outdated

renames + mini bug fix

accd8c1

karenrasmussen reviewed May 7, 2026

View reviewed changes

Comment thread src/annotator/anchoring/pdf.ts Outdated

fixes and revert

2f03bbd

karenrasmussen reviewed May 7, 2026

View reviewed changes

Comment thread src/annotator/anchoring/test/html-test.js Outdated

fix N/A testcase and remove helper

1e83174

karenrasmussen reviewed May 7, 2026

View reviewed changes

Comment thread src/annotator/anchoring/test/html-test.js Outdated

fix(anchoring): keep prefix/suffix raw to preserve baseline compatibi…

fc8eb1b

…lity

karenrasmussen reviewed May 8, 2026

View reviewed changes

simpler reimplementation

00cd6a6

		* Rendered/normalized text of the root: collapsed whitespace, with a single
		* space inserted at each `<br>` and block-tag boundary.

Conversation

Elimpizza commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Uh oh!

codecov Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

karenrasmussen May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karenrasmussen May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Elimpizza commented Feb 26, 2026 •

edited

Loading

codecov Bot commented Feb 26, 2026 •

edited

Loading

karenrasmussen May 8, 2026 •

edited

Loading