Skip to content

Encoding helpers #52

@JocelynDelalande

Description

@JocelynDelalande

I kinda struggle with edifact encoding, but here what I came up to :

data:

# https://blog.sandro-pereira.com/2009/08/15/edifact-encoding-edi-character-set-support/
# https://www.truugo.com/edifact/d09a/cl0001/
# A bit unsure of how 10646-1 maps exactly to utf-8
EDIFACT_ENCODINGS = {
    "UNOA": "ascii",  # iso-"646",
    "UNOB": "ascii",  # iso-"646",
    "UNOC": "iso-8859-1",
    "UNOD": "iso-8859-2",
    "UNOE": "iso-8859-5",
    "UNOF": "iso-8859-7",
    "UNOG": "iso-8859-3",
    "UNOH": "iso-8859-4",
    "UNOI": "iso-8859-6",
    "UNOJ": "iso-8859-8",
    "UNOK": "iso-8859-9",
    "UNOW": "utf-8",  # "10646-1",
    "UNOX": "iso-2022-jp",  # "2022 2375",
    "UNOY": "utf-8",  # "10646-1",
}

deserializing helper:

def guess_edifact_encoding(stream):
    unb_line = b"\n"
    eof_marker = b""
    while not unb_line.startswith(b"UNB") and unb_line != eof_marker:
        unb_line = stream.readline()

    if not unb_line.startswith(b"UNB"):
        raise ParseError("Missing UNB segment: ")

    else:
        # Must be ASCII-only
        unb_line_s = unb_line.decode()
        parser = Parser()
        unb_segment = list(parser.parse(unb_line_s))[0]
        try:
            # Ignore version, always v1…
            encoding_element = unb_segment.elements[0][0]
            return EDIFACT_ENCODINGS[encoding_element]
        except KeyError:
            raise ParseError(f"Wrong encoding spec : {encoding_element}")

I wonder what pydifact could embed in its scope in terms of :

  • helper (data)
  • serialization helper (like having a Interchange.serialize_to_bytes() helper with automatic encoding selection based on syntax identifier ?)
  • deserialization from bytes handling decoding with a guesser like the one I wrote

Any thought appreciated :-).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions