-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
Description
Bug report
Bug description:
Since Python 3.13, \uXXXX escapes that create "surrogates" (values from \uD800 to \uDFFF which cannot be encoded into UTF-8) are not allowed in docstrings when compiling source code. I believe this is due to a change in #106411 where docstrings are first converted to UTF-8 and then dedented:
$ ./python
Python 3.15.0a2+ (heads/main:3db7bf2d18, Dec 8 2025, 09:38:46) [GCC 15.2.1 20251111 (Red Hat 15.2.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def is_ex_parrot(parrot: Parrot) -> bool:
... """Checks if the parrot is \udead"""
...
UnicodeEncodeError: 'utf-8' codec can't encode character '\udead' in position 24: surrogates not allowed
>>>
$ ./python -OO
Python 3.15.0a2+ (heads/main:3db7bf2d18, Dec 8 2025, 09:38:46) [GCC 15.2.1 20251111 (Red Hat 15.2.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def is_ex_parrot(parrot: Parrot) -> bool:
... """Checks if the parrot is \udead"""
...
>>> # no error because -OO turns off docstrings
A admit that this is extremely fringe, but it did break something seemingly unrelated to docstrings in IPython: ipython/ipython#15098
The compile function is documented to only raise SyntaxError when the syntax is invalid and ValueError when the source contains \x00. Perhaps this is expected behaviour, and it should just be additionally documented? (UnicodeDecodeError is a subclass off ValueError already, so maybe IPython should just catch ValueError when calling compile? Or maybe there's simply some better solution for what IPython is doing here).
CPython versions tested on:
3.13, CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status