Skip to content

PDF 中文编码问题 #69

@Dhongli

Description

@Dhongli

问题描述

在windows11系统,claude工具中使用 minimax-pdf skill 生成包含中文内容的 PDF 时,发现以下位置出现 ■■ 方块字符(字符替代符号),原因是 ReportLab 默认字体不支持中文字符:

位置 原因 默认字体
页眉(标题、日期) canv.drawString() 直接使用系统字体 Helvetica/Times-Bold
页脚(作者、页码) canv.drawString() 直接使用系统字体 Helvetica/Times-Bold
代码块注释 ParagraphStyle 使用 Courier 字体 Courier(不支持CJK)
正文内容 ParagraphStyle 使用默认字体 Helvetica

解决思路

  1. 自动检测 Windows 中文字体 - 在 register_fonts() 函数中添加 Windows 字体目录扫描
  2. 注册中文字体到 ReportLab - 使用 pdfmetrics.registerFont(TTFont()) 注册
  3. 修改样式工厂函数 - 在 make_styles() 中优先使用中文字体
  4. 修改页眉页脚渲染 - 在 _decorate() 中使用中文字体

修改的代码文件

文件: 用户目录\.claude\skills\minimax-pdf\scripts\render_body.py

修改 1: 字体注册函数(第66-103行)

# ── Font registration ──────────────────────────────────────────────────────────
def register_fonts(tokens: dict):
    """Register TTF fonts from token font_paths if present."""
    for name, fpath in tokens.get("font_paths", {}).items():
        if os.path.exists(fpath):
            try:
                pdfmetrics.registerFont(TTFont(name, fpath))
            except Exception:
                pass

    # Register Chinese fonts from Windows if available
    font_dirs = []
    if os.name == 'nt':  # Windows
        font_dirs.append(os.path.join(os.environ.get('WINDIR', 'C:\\Windows'), 'Fonts'))
    font_dirs.append('/usr/share/fonts')
    font_dirs.append('/usr/local/share/fonts')

    chinese_font_map = {
        'SimHei': ['simhei.ttf', 'SimHei.ttf'],
        'Microsoft YaHei': ['micross.ttf'],
        'SimSun': ['simsun.ttc', 'STSONG.TTF'],
    }

    for font_name, font_files in chinese_font_map.items():
        try:
            for font_dir in font_dirs:
                for font_file in font_files:
                    font_path = os.path.join(font_dir, font_file)
                    if os.path.exists(font_path):
                        pdfmetrics.registerFont(TTFont(font_name, font_path))
                        # Ensure font_paths dict exists
                        if "font_paths" not in tokens:
                            tokens["font_paths"] = {}
                        if font_name == 'SimHei':
                            tokens['font_paths']['SimHei'] = font_path
                        elif font_name == 'Microsoft YaHei':
                            tokens['font_paths']['Microsoft YaHei'] = font_path
                        break
        except Exception:
            pass

修改 2: 样式工厂函数中的中文字体覆盖(第231-241行)

def make_styles(t: dict) -> dict:
    hf  = t["font_display_rl"]
    bf  = t["font_body_rl"]
    bfb = t["font_body_b_rl"]
    dk  = t["body_text"]
    d   = t["dark"]
    mu  = t["muted"]

    # Override with Chinese fonts if available (for CJK character support)
    if "SimHei" in t.get("font_paths", {}) or "Microsoft YaHei" in t.get("font_paths", {}):
        chinese_font = "SimHei" if "SimHei" in t.get("font_paths", {}) else "Microsoft YaHei"
        hf = chinese_font
        bf = chinese_font
        bfb = chinese_font

    # Use Chinese font for code blocks if available (Courier doesn't support CJK)
    code_font = "Courier"
    if "SimHei" in t.get("font_paths", {}) or "Microsoft YaHei" in t.get("font_paths", {}):
        code_font = "SimHei" if "SimHei" in t.get("font_paths", {}) else "Microsoft YaHei"

修改 3: 页眉页脚渲染函数(第179-216行)

def _decorate(self, canv, doc):
    t   = self._t
    lm  = doc.leftMargin
    rm  = doc.rightMargin
    pw  = doc.pagesize[0]
    ph  = doc.pagesize[1]
    top = ph - doc.topMargin

    canv.saveState()

    # Determine font for header/footer (prefer Chinese font if available)
    header_footer_font = t["font_body_rl"]
    if "SimHei" in t.get("font_paths", {}) or "Microsoft YaHei" in t.get("font_paths", {}):
        header_footer_font = "SimHei" if "SimHei" in t.get("font_paths", {}) else "Microsoft YaHei"

    # Header accent rule
    canv.setStrokeColor(HexColor(t["accent"]))
    canv.setLineWidth(1.5)
    canv.line(lm, top + 12, pw - rm, top + 12)

    # Header: title (left) + date (right)
    canv.setFillColor(HexColor(t["muted"]))
    canv.setFont(header_footer_font, t["size_meta"])  # 使用中文字体
    canv.drawString(lm, top + 16, t["title"].upper())
    canv.drawRightString(pw - rm, top + 16, t.get("date", ""))

    # Footer rule
    canv.setStrokeColor(HexColor("#DDDDDD"))
    canv.setLineWidth(0.5)
    canv.line(lm, doc.bottomMargin - 12, pw - rm, doc.bottomMargin - 12)

    # Footer: author (left) + page number (right)
    canv.setFillColor(HexColor(t["muted"]))
    canv.setFont(header_footer_font, t["size_meta"])  # 使用中文字体
    canv.drawString(lm, doc.bottomMargin - 22, t.get("author", ""))
    canv.drawRightString(pw - rm, doc.bottomMargin - 22, str(doc.page))

    canv.restoreState()

效果对比

指标 修复前 修复后
PDF大小 104 KB 142 KB
中文字符 ■■ 方块 正常显示
代码注释 ■■ 方块 正常显示
页眉页脚 ■■ 方块 正常显示

技术要点

  1. TTFont 注册 - ReportLab 需要显式注册 TTF 字体文件
  2. Windows 字体路径 - C:\Windows\Fonts 目录
  3. 字体优先级 - SimHei > Microsoft YaHei > Courier
  4. 修改点 - 样式工厂函数 make_styles() 和页眉页脚装饰函数 _decorate()

修改前后对比

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions