Skip to content

Slavic characters are replaced by ■ #792

@Netvia

Description

@Netvia

Describe the Bug

Slavic characters are replaced by ■. Standard Arial font knows this letters, where should be an error?

pdf is created with unreadable chars. I also tried updated this html (with file "arial.ttf" in the same directory), but there is the same ouptut:

<!DOCTYPE html>
<html>
    <head>
        <meta charset="UTF-8">
            <title>My HTML Document</title>
            <style>
                @font-face {
                    font-family: Arial;
                    src: url('/arial.ttf');
                }
                *{
                    font-family: Arial;
                    color: red;
                    }
            </style
        </head>
    <body>
        <h1>+ěščřžýáíéí</h1>
    </body>
</html>

Minimal Example to Reproduce

file input.html :

<!DOCTYPE html>
<html>
    <head>
        <meta charset="UTF-8">
            <title>My HTML Document</title>
            <style>
                *{
                    font-family: Arial;
                    color: red;
                    }
            </style
        </head>
    <body>
        <h1>+ěščřžýáíéí</h1>
    </body>
</html>

python code (utf-8 enconding too):

  OUT = "output.pdf"
  IN = "input.html"

  from xhtml2pdf import pisa
  with open(IN, 'r', encoding='utf-8') as html_file:
      html_content = html_file.read()

  with open(OUT, 'wb') as pdf_file:
      pisa_status = pisa.CreatePDF(html_content, dest=pdf_file, encoding='utf-8')

  if pisa_status.err:
      return False

Expected Behavior

+ěščřžýáíéí

Actual Behavior

+■š■■žýáíéí

System Information

OS version: Windows 11 24H2
Python version: 3.13.0, also on 3.10.7
XHTML2PDF version: 0.2.16

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions