Native SMS inbox backend (read/unread, ISO 8601, multi-part, UCS-2) by dr-dolomite · Pull Request #22 · dr-dolomite/QManager

dr-dolomite · 2026-05-28T21:16:41Z

Summary

Replaces the SMS inbox read path with a native shell + awk pipeline (no sms_tool binary for reads), keeping the backend localized. Adds the inbox features requested: read/unread status, newest-first sort, and correct multi-part + non-Latin/emoji decoding.

PDU codec (scripts/usr/lib/qmanager/sms_pdu.awk): GSM-7 (incl. extension table), UCS-2 BE → UTF-8 with surrogate pairs (U+FFFD-hardened), UDH concat parsing (reference/part/total), ISO 8601 SCTS with TZ offset, read/unread status from the CMGL stat byte, and a batched decode_list op (one awk invocation per request, not per message).
CGI (scripts/www/cgi-bin/quecmanager/cellular/sms.sh): inbox GET now runs qcmd AT+CMGF=0 + AT+CMGL=4 piped through the codec — locking inherited from qcmd's flock (no self-lock); CR-hardened + +CME/+CMS ERROR guarded parser. Merge carries status (unread-if-any-part) and default-sorts newest-first by timestamp. New mark_read POST action (AT+CMGR=<idx>,0). Fixed the long-standing storage 0/0 bug (now AT+CPMS?).
Contract (types/sms.ts): status: "read" | "unread" required; timestamp documented as ISO 8601.

Scope / out of scope

send / delete / delete_all still use sms_tool (intentionally — separate future plan).
Frontend filter/sort/mark-read UI wiring is a separate follow-up.
Tracked follow-up: harden delete + mark_read with numeric-index validation + consistent temp-file naming.

Test Plan

Codec unit suite: bash tests/sms_pdu_test.sh → passed=13 failed=0 (13 fixtures: GSM-7, UCS-2 BMP + emoji surrogate, UDH concat, lone-surrogate, ISO 8601 TZ, batched decode_list, no-stale-concat invariant)
sh -n clean on sms.sh; bunx tsc --noEmit adds zero new errors
Live-device parity (RM551E, BusyBox awk): content byte-clean vs the pre-migration sms_tool baseline across all 51 merged messages; BusyBox sprintf("%c") UTF-8 emission confirmed; storage fixed to 73/255; newest-first confirmed; mark_read plumbing verified end-to-end; lock coordination clean. Full record in tests/fixtures/sms/parity/README.md.
Re-verify a live unread→read flip when an unread message is present (device had 0 unread at verification time)
Frontend wiring (separate plan)

🤖 Generated with Claude Code

Frozen snapshots of the current sms_tool-based pipeline taken from the live test modem (RM551EGL, 2026-05-28) for use as diff targets in Task 17 (Phase 1) and Task 32 (final) parity verification during the native QMI SMS migration.

- Add trailing newline to baseline.status.txt for cleaner Task 17 diffs - Expand README with concrete jq diff commands for storage field - Note sibling directory layout for decode/encode codec fixtures

Implements Task 2 of the SMS native backend plan. Codec changes (scripts/usr/lib/qmanager/sms_pdu.awk): - hex2dec(), swap_pair(), decode_digits() — hex and semi-octet helpers - decode_scts() — 7-octet timestamp → "MM/DD/YY HH:MM:SS" - decode_gsm7_address() — stub returning [ALPHA_TODO]; full GSM-7 in Task 3 - pop() — consumes N hex chars from front of pdu state string - do_decode_one() — main driver: skips SMSC, reads TPDU first octet (UDHI/MTI), decodes OA length + TON + digit/alpha address, PID, DCS, SCTS, UDL, stubs content as raw UD hex remainder - json_str() — JSON string escaper - Driver block wires op=decode_one to read hex lines and call do_decode_one() Fixture 02 (tests/fixtures/sms/decode/02_header_only.*): - PDU captured from live device (SM storage index 10, AT+CMGR=10) - Sender: 639686973969 (TON 0x91 international E.164 — different OA path from fixture 01's TON 0xD0 alphanumeric sender) - Message is multi-part (UDHI=1); Task 5 will parse UDH; for now content = raw UD hex - Expected JSON uses "index":0 (no -v idx= passed; real propagation in Task 6) - Fixture 02 PASSES fully with this implementation Test runner hardening (tests/sms_pdu_test.sh) — carry-forward from Task 1 review: - Changed shebang from #!/bin/sh to #!/usr/bin/env bash (file uses bash extensions: set -eu, process substitution <(...)) - Added jq and awk preflight guards (fail fast with clear message when missing) Known partial-pass state: - Fixture 01 FAILS on three fields: sender ([ALPHA_TODO] vs SmartApp, Task 3), content (raw hex vs decoded GSM-7 text, Task 3), index (0 vs 30, Task 6) - Timestamp for fixture 01 matches correctly - Both failures resolve in their respective tasks

Device BusyBox awk bitwise capability check (2026-05-28): echo "" | awk 'BEGIN{print and(0xF0,0x0F),or(1,2),lshift(1,4),rshift(16,2)}' Output: 0 3 16 4 — built-in ops confirmed present. Using arithmetic-only wrappers (and_int/or_int/lshift_int/rshift_int) regardless, for portability across BusyBox awk builds and GAWK on the test host. Changes: - sms_pdu.awk: add gsm7_init() (direct array assignments — avoids the split() comma-separator collision at septet 44), gsm7_unpack() with UDH skip-alignment, arithmetic bitwise helpers, and decode_gsm7_address() for TON 0xD0 senders. Wire DCS dispatch in do_decode_one(): alphabet=0→GSM-7, 2→raw hex (Task 4), else raw hex. All new locals properly scoped; bytes[] is a local awk array. - fixture 03: synthetic single-part GSM-7 PDU (no live single-part messages remain in inbox; all available slots are multipart concat fragments). Sender +1234567890, text "Hello from QManager! This is a GSM-7 test message." - fixture 04: synthetic GSM-7 PDU with extension-table characters [ ] ^. Sender +9876543210, text "GSM7 ext: [brackets] and ^caret^." - fixture 01 expected: update index 0→0 (was 30), sender decoded to "SmartApp", content matches real GSM-7 decode. PASS. - fixture 02 expected: DCS=0x00 so decoder now unpacks GSM-7 body with UDH skip-alignment (udhl=5 bytes → skip 7 septets). Content is Lorem Ipsum test message. Task 5 will update once UDH concat fields are parsed. PASS. All 4 fixtures pass: passed=4 failed=0.

Status comes from the +CMGL header line, passed in via -v stat=N (0=unread, default 1=read). Emitted as JSON string "unread"|"read" so the CGI merge and UI can consume it directly.

Replaces sms_tool's "MM/DD/YY HH:MM:SS" format with "YYYY-MM-DDTHH:MM:SS±HH:MM" so the CGI can sort lexicographically (= chronologically) and the UI can use native Date parsing. TZ byte decoded per 3GPP TS 23.040 §9.2.3.11 — bit 3 is sign, remaining bits are semi-octet-swapped BCD quarter-hours. Parity README updated with the new "expected diffs" entry for timestamp format and the new status field added in Task 1.

Two review-feedback fixes on the ISO 8601 SCTS commit: - Parity README bullets referenced `.msg[]` but baseline.cgi.json uses `.messages` at the top level. Update labels (jq commands were already correct). - Document the hardcoded 20xx century prefix in decode_scts.

Parses IEI 0x00 (8-bit ref) and 0x08 (16-bit ref) information elements from the UDH. Emits reference/part/total alongside content so the CGI's existing jq group_by(.sender + "|" + .reference) merger keeps assembling multi-part messages correctly under the new native pipeline.

Review feedback on the UDH concat commit: - The doc comment claimed do_decode_one clears the globals; actually the function self-clears at entry. Corrected so the Task 5 decode_list author isn't misled into pre-clearing at the call site. - Loop guard pos+3<=length makes the 'need a full IEI+IEDL header' invariant explicit and avoids a spurious partial-byte read on a malformed UDHL.

DCS alphabet 0b10 (UCS-2) is the second most common encoding after GSM-7 — used whenever a message contains any non-GSM-7 character (CJK, emoji, etc.). UDH (when present) is byte-aligned in UCS-2 so we just skip (UDHL+1) bytes before reading 16-bit code units. Surrogate pairs reconstruct supplementary plane code points so emoji round-trip correctly.

Lone or mispaired surrogates (0xD800-0xDFFF) were emitted as 3-byte WTF-8, which is invalid UTF-8. In decode_list mode (Task 5) the whole inbox is one JSON array piped through jq at once, so a single corrupt message could make jq reject every message. Now replaced with U+FFFD for deterministic, valid output regardless of jq leniency. Also documents the sprintf(%c) byte-emission portability assumption (re-verified on device in Task 11).

Reads idx|stat|hex lines from stdin and emits {"msg":[...]} — same envelope sms_tool recv -j produces. Lets sms.sh call awk once per CGI request instead of once per message, avoiding ~50 awk cold-starts on a fully-stocked inbox. do_decode_one now builds its JSON into a global record_buf and only prints to stdout when not in buffered mode, so decode_one behavior is unchanged while decode_list collects records into the array.

Adds a UDH-multipart message followed by a plain message in one decode_list batch. Asserts the plain message emits no reference/part/total — guards against a future refactor moving the udh_found reset inside the UDHI branch, which would silently leak concat fields between records.

Inbox GET now calls qcmd "AT+CMGF=0" + qcmd "AT+CMGL=4" and pipes each (idx, stat, pdu) line into sms_pdu.awk op=decode_list. Coordination with the rest of the modem traffic is inherited from qcmd's /var/lock/qmanager.lock flock — no extra locking required (would risk self-deadlock). The send / delete / delete_all POST paths still use sms_tool for now; they share the same lock file via _sms_run and work fine. Replacing them is a follow-up plan.

Review feedback on the native CMGL commit: - Declare raw/pipe_in as local (codebase convention; avoids global pollution if the call site is ever refactored off command substitution). - Strip trailing CR in the CMGL awk parser so \r\n line endings from the modem don't cause OK\r to be mis-consumed as a PDU (which would silently drop the last message). No-op when qcmd already strips CR.

Multi-part merge now carries the status field: result is unread if any part is unread. Timestamp on a merged message is the min (earliest part) so a concat SMS spanning a clock tick still sorts deterministically. Default sort is timestamp desc — replaces the old sort_by(-.indexes[0]) which was storage-slot order, not chronological (Quectel reuses freed slots).

The lexicographic timestamp sort equals chronological order only when all messages share one UTC offset — true for a single modem whose SMSC stamps a consistent TZ. Documents the invariant so cross-timezone correctness isn't assumed by a future maintainer.

POST {"action":"mark_read","indexes":[...]} flips each index from REC UNREAD to REC READ via AT+CMGR=<idx>,0 (mode=0 reads the message *and* clears the unread flag as a documented side effect). Body is discarded; only the status mutation matters. Wrapped in qcmd's existing lock.

The old grep '[0-9]*/[0-9]*' pattern never matched sms_tool's "used: N, total: M" format, so storage.used/total always reported 0 (documented in tests/fixtures/sms/parity/README.md). Native AT+CPMS? returns N,M in a parseable form and stays consistent with the rest of the native pipeline.

Reflects the native backend contract: status is required (consumers must handle "read"|"unread"), timestamp is ISO 8601 with TZ offset so lexicographic sort matches chronological order. Frontend filter UI / mark-read wiring is a follow-up plan.

Aligns the consumer-facing type doc with the CGI source comment — the lexicographic==chronological claim holds only when all messages share one SMSC timezone offset (true for a single modem).

Author-time PDU encoder used to derive the decode/ fixtures (GSM-7 + UCS-2, SMS-DELIVER framing, optional concat UDH). Not shipped to the device and not invoked by the test harness — purely for regenerating known-good fixture hex.

Live-device run on the RM551E test modem: content parity is byte-clean across all 51 merged messages vs the pre-migration sms_tool baseline; storage bug fixed (73/255); newest-first sort confirmed; mark_read plumbing verified end to end; lock coordination clean. BusyBox awk confirmed to emit UTF-8 bytes via sprintf("%c") identically to GAWK. One gap: the inbox had no unread messages, so the actual unread->read flip could not be observed (plumbing proven).

Final-review hardening. The /^ERROR/ guard didn't catch +CME ERROR: / +CMS ERROR: (they start with +). Harmless when an error is the sole response, but a +CMGL: header followed by a +CME ERROR: (storage corruption mid-enumeration) would consume the error line as PDU hex and surface a ghost message. Now skipped, so the corrupt slot is dropped cleanly. Also: renumber duplicate GET handler comment labels and mark the not-yet-implemented encode_* ops as PLANNED.

Defense-in-depth on the delete and mark_read POST handlers (both loop over a client-supplied indexes array): - Reject empty/non-numeric indexes via a case guard, counting each as a failure so an all-bad request reports partial_failure rather than success. - delete now uses a PID-qualified temp file ($$) to match mark_read, closing a concurrent-request race on the previously-static /tmp path. The injection surface was already safe (idx is double-quoted; qcmd/atcli_smd11 write a single AT line), so this is hardening, not a vuln fix.

dr-dolomite added 28 commits May 28, 2026 06:41

test(sms): tidy parity baselines — trailing newline + diff guidance

83dbff0

- Add trailing newline to baseline.status.txt for cleaner Task 17 diffs - Expand README with concrete jq diff commands for storage field - Note sibling directory layout for decode/encode codec fixtures

test(sms): bootstrap PDU codec test harness with first decode fixture

56a1b4e

test(sms): mark sms_pdu_test.sh executable in git index

b30f084

fix(sms-codec): scope udhi/mti as locals in do_decode_one

3a20332

feat(sms-codec): surface read/unread status from CMGL stat byte

548d2e5

Status comes from the +CMGL header line, passed in via -v stat=N (0=unread, default 1=read). Emitted as JSON string "unread"|"read" so the CGI merge and UI can consume it directly.

docs(sms-types): note same-TZ assumption in timestamp JSDoc

e0eba48

Aligns the consumer-facing type doc with the CGI source comment — the lexicographic==chronological claim holds only when all messages share one SMSC timezone offset (true for a single modem).

test(sms): add build_pdu.py fixture generator

6d009ec

Author-time PDU encoder used to derive the decode/ fixtures (GSM-7 + UCS-2, SMS-DELIVER framing, optional concat UDH). Not shipped to the device and not invoked by the test harness — purely for regenerating known-good fixture hex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native SMS inbox backend (read/unread, ISO 8601, multi-part, UCS-2)#22

Native SMS inbox backend (read/unread, ISO 8601, multi-part, UCS-2)#22
dr-dolomite wants to merge 28 commits into
development-homefrom
feature/sms-native-backend

dr-dolomite commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dr-dolomite commented May 28, 2026

Summary

Scope / out of scope

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant