Skip to content

Fix URL.host decoding for nested IDNA labels#979

Draft
Abdulmumin1 wants to merge 3 commits into
pydantic:mainfrom
Abdulmumin1:fix-url-host-nested-idna
Draft

Fix URL.host decoding for nested IDNA labels#979
Abdulmumin1 wants to merge 3 commits into
pydantic:mainfrom
Abdulmumin1:fix-url-host-nested-idna

Conversation

@Abdulmumin1
Copy link
Copy Markdown

@Abdulmumin1 Abdulmumin1 commented May 22, 2026

Fixes #850.

Summary

  • decode IDNA labels that appear after ASCII labels in URL.host
  • keep malformed punycode labels from leaking idna exceptions from the property
  • add regression coverage for www.égalité-femmes-hommes.gouv.fr

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 22, 2026

Merging this PR will not alter performance

✅ 7 untouched benchmarks


Comparing Abdulmumin1:fix-url-host-nested-idna (45fdb79) with main (72fce81)

Open in CodSpeed

@Abdulmumin1 Abdulmumin1 force-pushed the fix-url-host-nested-idna branch from 9960ca4 to 9bac192 Compare May 22, 2026 16:31
Comment thread src/httpx2/httpx2/_urls.py Outdated
host = idna.decode(host)
if any(label.startswith("xn--") for label in host.split(".")):
try:
host = idna.decode(host)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you decode the complete host, you should probably try to decode each label and assemble them back together.

See the httpxyz fix at https://codeberg.org/httpxyz/httpxyz/src/commit/dc2bdf61f9d2061d89040dcd4babc1686c5de5bc/httpxyz/_urls.py#L194-L205

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i've updated accordingly

@Kludex
Copy link
Copy Markdown
Member

Kludex commented May 24, 2026

I've opened kjd/idna#248 on idna's side. I don't think we should be handling it on our side.

I'll wait for their reply to take a decision here.

@Kludex Kludex marked this pull request as draft May 24, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

URL.host returns Punycode instead of Unicode for some URLS

3 participants