Skip to content

Musical Eighth Notes in SCC Captions Causing Error #6

@MightyMait

Description

@MightyMait

While attempting to convert SCC captions to SAMI using pycaption-cli, I'm getting the following error:

C:\Users\XXXXXX\AppData\Local\Programs\Python\Python311\Scripts\pycaption.exe : Traceback (most recent call last):
At H:_Projects\Automated_Closed_Captioning\convert_SCC_Captions_to_SAMI_0.9.ps1:22 char:20

  • ... $RawSAMI = C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python311\S ...

  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
    • FullyQualifiedErrorId : NativeCommandError

    File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python311\Scripts\pycaption-script.py", line 33, in
    sys.exit(load_entry_point('pycaption-cli==0.2', 'console_scripts', 'pycaption')())
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\UsersXXXXXXX\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycaption_cli-0.2-py3.11.egg\pycapcli\caption_converter.py", line 63, in main
    File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycaption_cli-0.2-py3.11.egg\pycapcli\caption_converter.py", line 92, in write_captions
    File "C:\Users\XXXXXXX\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    UnicodeEncodeError: 'charmap' codec can't encode character '\u266a' in position 43209: character maps to

Unicode u266a is an eighth note symbol. From an Adobe forum post, I was able to determine that the character is represented as 9137 in SCC captions files. Sample from the caption file:
00:16:52:24 9420 9420 94ae 94ae 9452 9452 9137 2057 e9f4 6820 61ec ec20 efe6 20ef 75f2 94f2 94f2 f175 e573 f4e9 ef6e e96e 6720 f468 61f4 a773 20ef e620 9137

00:16:53:25 942f 942f

00:16:55:08 9420 9420 94ae 94ae 9470 9470 9137 2057 6861 f4a7 7320 6875 f2f4 e96e 6720 61ec ec20 6875 6d61 6e6b e96e 6420 9137

00:16:56:03 942f 942f

00:16:57:09 9420 9420 94ae 94ae 94d0 94d0 91b9 91b9 9137 2052 e561 ec20 70e5 ef70 ece5 2c20 73e5 e56b e96e 6720 68ef f780 9470 9470 91b9 91b9 61ec ec20 ef75 f220 68e5 61f2 f473 20e3 616e 20f2 e561 ece9 676e 2080 9137

00:16:58:22 942f 942f

00:17:01:08 9420 9420 94ae 94ae 9452 9452 91b9 91b9 91b9 9137 2057 e9f4 6820 6d75 73e9 e320 6461 6ee3 e580 94f2 94f2 91b9 91b9 91b9 616e 6420 67ef ef64 20f2 6879 6de5 7320 9137

I could replace the characters before processing with PyCaption-CLI, but it would be nice to be able to specify a character to which to map the symbol. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions