Skip to content

Locale-dependent date parsing causes crashes on non-English systems #34

@dgunning

Description

@dgunning

Environment

  • httpxthrottlecache version: 0.1.6+
  • Python version: 3.12.11 (also affects earlier versions)
  • Operating System: Windows 11 (affects all platforms with non-English locales)
  • System Locale: Chinese (zh_CN), but affects any non-English locale

Description

The library uses time.strptime() to parse HTTP Last-Modified headers, which is locale-dependent. When users have their system locale set to a non-English language (Chinese, German, French, etc.), date parsing fails with a ValueError because month and day names are in the local language format instead of English.

Root Cause

In httpxthrottlecache/filecache/transport.py, the code uses:

time.strptime(last_modified, "%a, %d %b %Y %H:%M:%S GMT")

The time.strptime() function is locale-dependent. With a Chinese locale, the parsed date string becomes '周五, 10 10月 2025 11:57:10 GMT' (Friday, October in Chinese) instead of 'Fri, 10 Oct 2025 11:57:10 GMT'.

Impact

This is a critical bug that makes httpxthrottlecache (and any library depending on it) completely unusable for users with non-English system locales. The error message is also misleading, making it difficult for users to diagnose the real problem.

Error Message

ValueError: time data '周五, 10 10月 2025 11:57:10 GMT' does not match format '%a, %d %b %Y %H:%M:%S GMT'

Reproduction

Minimal Test Case

import locale
import time

# Simulate Chinese locale (or any non-English locale)
try:
    locale.setlocale(locale.LC_TIME, 'zh_CN.UTF-8')  # Linux/macOS
    # locale.setlocale(locale.LC_TIME, 'Chinese_China.936')  # Windows
except locale.Error:
    print("Chinese locale not available, but the same issue affects any non-English locale")

# This will fail with Chinese locale
date_string = "Fri, 10 Oct 2025 11:57:10 GMT"
try:
    parsed = time.strptime(date_string, "%a, %d %b %Y %H:%M:%S GMT")
    print(f"SUCCESS: {parsed}")
except ValueError as e:
    print(f"FAILED: {e}")

Real-World Impact

This issue was reported by a user of EdgarTools (which depends on httpxthrottlecache) who couldn't use the library at all on their Chinese Windows system. See: dgunning/edgartools#457

Proposed Solutions

Option 1: Use email.utils.parsedate_to_datetime() (Recommended)

This is the standard library function for parsing HTTP date headers and is locale-independent:

from email.utils import parsedate_to_datetime

# Instead of:
# parsed_time = time.strptime(last_modified, "%a, %d %b %Y %H:%M:%S GMT")

# Use:
parsed_datetime = parsedate_to_datetime(last_modified)

Advantages:

  • Locale-independent
  • RFC 2822 compliant (HTTP date format)
  • Returns datetime object directly
  • Handles multiple date formats automatically
  • Part of Python standard library

Option 2: Force C Locale Temporarily

import locale
import time
from contextlib import contextmanager

@contextmanager
def c_locale():
    """Temporarily set LC_TIME to C for locale-independent parsing"""
    old_locale = locale.setlocale(locale.LC_TIME)
    try:
        locale.setlocale(locale.LC_TIME, 'C')
        yield
    finally:
        locale.setlocale(locale.LC_TIME, old_locale)

# Usage:
with c_locale():
    parsed_time = time.strptime(last_modified, "%a, %d %b %Y %H:%M:%S GMT")

Advantages:

  • Minimal code change
  • Keeps existing logic

Disadvantages:

  • More complex than Option 1
  • Thread-safety concerns (locale is process-wide)

Option 3: Use datetime.strptime() with Locale Context

Similar to Option 2 but using datetime.strptime() instead of time.strptime().

Recommended Fix

Option 1 (email.utils.parsedate_to_datetime()) is the best solution because:

  1. It's designed specifically for parsing HTTP date headers
  2. It's locale-independent by design
  3. It's simpler and more maintainable
  4. It's part of the standard library
  5. It handles edge cases better

Additional Context

HTTP date headers follow RFC 2822/RFC 5322 format, which always uses English month and day names regardless of locale. Using locale-dependent parsing for HTTP headers is incorrect and causes failures on international systems.

This affects any user with a non-English system locale, including:

  • Chinese (zh_CN, zh_TW)
  • Japanese (ja_JP)
  • Korean (ko_KR)
  • German (de_DE)
  • French (fr_FR)
  • Spanish (es_ES)
  • And many others

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions