Native SMS inbox backend (read/unread, ISO 8601, multi-part, UCS-2)#22
Open
dr-dolomite wants to merge 28 commits into
Open
Native SMS inbox backend (read/unread, ISO 8601, multi-part, UCS-2)#22dr-dolomite wants to merge 28 commits into
dr-dolomite wants to merge 28 commits into
Conversation
Frozen snapshots of the current sms_tool-based pipeline taken from the live test modem (RM551EGL, 2026-05-28) for use as diff targets in Task 17 (Phase 1) and Task 32 (final) parity verification during the native QMI SMS migration.
- Add trailing newline to baseline.status.txt for cleaner Task 17 diffs - Expand README with concrete jq diff commands for storage field - Note sibling directory layout for decode/encode codec fixtures
Implements Task 2 of the SMS native backend plan. Codec changes (scripts/usr/lib/qmanager/sms_pdu.awk): - hex2dec(), swap_pair(), decode_digits() — hex and semi-octet helpers - decode_scts() — 7-octet timestamp → "MM/DD/YY HH:MM:SS" - decode_gsm7_address() — stub returning [ALPHA_TODO]; full GSM-7 in Task 3 - pop() — consumes N hex chars from front of pdu state string - do_decode_one() — main driver: skips SMSC, reads TPDU first octet (UDHI/MTI), decodes OA length + TON + digit/alpha address, PID, DCS, SCTS, UDL, stubs content as raw UD hex remainder - json_str() — JSON string escaper - Driver block wires op=decode_one to read hex lines and call do_decode_one() Fixture 02 (tests/fixtures/sms/decode/02_header_only.*): - PDU captured from live device (SM storage index 10, AT+CMGR=10) - Sender: 639686973969 (TON 0x91 international E.164 — different OA path from fixture 01's TON 0xD0 alphanumeric sender) - Message is multi-part (UDHI=1); Task 5 will parse UDH; for now content = raw UD hex - Expected JSON uses "index":0 (no -v idx= passed; real propagation in Task 6) - Fixture 02 PASSES fully with this implementation Test runner hardening (tests/sms_pdu_test.sh) — carry-forward from Task 1 review: - Changed shebang from #!/bin/sh to #!/usr/bin/env bash (file uses bash extensions: set -eu, process substitution <(...)) - Added jq and awk preflight guards (fail fast with clear message when missing) Known partial-pass state: - Fixture 01 FAILS on three fields: sender ([ALPHA_TODO] vs SmartApp, Task 3), content (raw hex vs decoded GSM-7 text, Task 3), index (0 vs 30, Task 6) - Timestamp for fixture 01 matches correctly - Both failures resolve in their respective tasks
Device BusyBox awk bitwise capability check (2026-05-28):
echo "" | awk 'BEGIN{print and(0xF0,0x0F),or(1,2),lshift(1,4),rshift(16,2)}'
Output: 0 3 16 4 — built-in ops confirmed present.
Using arithmetic-only wrappers (and_int/or_int/lshift_int/rshift_int) regardless,
for portability across BusyBox awk builds and GAWK on the test host.
Changes:
- sms_pdu.awk: add gsm7_init() (direct array assignments — avoids the split()
comma-separator collision at septet 44), gsm7_unpack() with UDH skip-alignment,
arithmetic bitwise helpers, and decode_gsm7_address() for TON 0xD0 senders.
Wire DCS dispatch in do_decode_one(): alphabet=0→GSM-7, 2→raw hex (Task 4),
else raw hex. All new locals properly scoped; bytes[] is a local awk array.
- fixture 03: synthetic single-part GSM-7 PDU (no live single-part messages
remain in inbox; all available slots are multipart concat fragments). Sender
+1234567890, text "Hello from QManager! This is a GSM-7 test message."
- fixture 04: synthetic GSM-7 PDU with extension-table characters [ ] ^.
Sender +9876543210, text "GSM7 ext: [brackets] and ^caret^."
- fixture 01 expected: update index 0→0 (was 30), sender decoded to "SmartApp",
content matches real GSM-7 decode. PASS.
- fixture 02 expected: DCS=0x00 so decoder now unpacks GSM-7 body with UDH
skip-alignment (udhl=5 bytes → skip 7 septets). Content is Lorem Ipsum
test message. Task 5 will update once UDH concat fields are parsed. PASS.
All 4 fixtures pass: passed=4 failed=0.
Status comes from the +CMGL header line, passed in via -v stat=N (0=unread, default 1=read). Emitted as JSON string "unread"|"read" so the CGI merge and UI can consume it directly.
Replaces sms_tool's "MM/DD/YY HH:MM:SS" format with "YYYY-MM-DDTHH:MM:SS±HH:MM" so the CGI can sort lexicographically (= chronologically) and the UI can use native Date parsing. TZ byte decoded per 3GPP TS 23.040 §9.2.3.11 — bit 3 is sign, remaining bits are semi-octet-swapped BCD quarter-hours. Parity README updated with the new "expected diffs" entry for timestamp format and the new status field added in Task 1.
Two review-feedback fixes on the ISO 8601 SCTS commit: - Parity README bullets referenced `.msg[]` but baseline.cgi.json uses `.messages` at the top level. Update labels (jq commands were already correct). - Document the hardcoded 20xx century prefix in decode_scts.
Parses IEI 0x00 (8-bit ref) and 0x08 (16-bit ref) information elements from the UDH. Emits reference/part/total alongside content so the CGI's existing jq group_by(.sender + "|" + .reference) merger keeps assembling multi-part messages correctly under the new native pipeline.
Review feedback on the UDH concat commit: - The doc comment claimed do_decode_one clears the globals; actually the function self-clears at entry. Corrected so the Task 5 decode_list author isn't misled into pre-clearing at the call site. - Loop guard pos+3<=length makes the 'need a full IEI+IEDL header' invariant explicit and avoids a spurious partial-byte read on a malformed UDHL.
DCS alphabet 0b10 (UCS-2) is the second most common encoding after GSM-7 — used whenever a message contains any non-GSM-7 character (CJK, emoji, etc.). UDH (when present) is byte-aligned in UCS-2 so we just skip (UDHL+1) bytes before reading 16-bit code units. Surrogate pairs reconstruct supplementary plane code points so emoji round-trip correctly.
Lone or mispaired surrogates (0xD800-0xDFFF) were emitted as 3-byte WTF-8, which is invalid UTF-8. In decode_list mode (Task 5) the whole inbox is one JSON array piped through jq at once, so a single corrupt message could make jq reject every message. Now replaced with U+FFFD for deterministic, valid output regardless of jq leniency. Also documents the sprintf(%c) byte-emission portability assumption (re-verified on device in Task 11).
Reads idx|stat|hex lines from stdin and emits {"msg":[...]} — same envelope
sms_tool recv -j produces. Lets sms.sh call awk once per CGI request instead
of once per message, avoiding ~50 awk cold-starts on a fully-stocked inbox.
do_decode_one now builds its JSON into a global record_buf and only prints to
stdout when not in buffered mode, so decode_one behavior is unchanged while
decode_list collects records into the array.
Adds a UDH-multipart message followed by a plain message in one decode_list batch. Asserts the plain message emits no reference/part/total — guards against a future refactor moving the udh_found reset inside the UDHI branch, which would silently leak concat fields between records.
Inbox GET now calls qcmd "AT+CMGF=0" + qcmd "AT+CMGL=4" and pipes each (idx, stat, pdu) line into sms_pdu.awk op=decode_list. Coordination with the rest of the modem traffic is inherited from qcmd's /var/lock/qmanager.lock flock — no extra locking required (would risk self-deadlock). The send / delete / delete_all POST paths still use sms_tool for now; they share the same lock file via _sms_run and work fine. Replacing them is a follow-up plan.
Review feedback on the native CMGL commit: - Declare raw/pipe_in as local (codebase convention; avoids global pollution if the call site is ever refactored off command substitution). - Strip trailing CR in the CMGL awk parser so \r\n line endings from the modem don't cause OK\r to be mis-consumed as a PDU (which would silently drop the last message). No-op when qcmd already strips CR.
Multi-part merge now carries the status field: result is unread if any part is unread. Timestamp on a merged message is the min (earliest part) so a concat SMS spanning a clock tick still sorts deterministically. Default sort is timestamp desc — replaces the old sort_by(-.indexes[0]) which was storage-slot order, not chronological (Quectel reuses freed slots).
The lexicographic timestamp sort equals chronological order only when all messages share one UTC offset — true for a single modem whose SMSC stamps a consistent TZ. Documents the invariant so cross-timezone correctness isn't assumed by a future maintainer.
POST {"action":"mark_read","indexes":[...]} flips each index from REC UNREAD
to REC READ via AT+CMGR=<idx>,0 (mode=0 reads the message *and* clears the
unread flag as a documented side effect). Body is discarded; only the status
mutation matters. Wrapped in qcmd's existing lock.
The old grep '[0-9]*/[0-9]*' pattern never matched sms_tool's "used: N, total: M" format, so storage.used/total always reported 0 (documented in tests/fixtures/sms/parity/README.md). Native AT+CPMS? returns N,M in a parseable form and stays consistent with the rest of the native pipeline.
Reflects the native backend contract: status is required (consumers must handle "read"|"unread"), timestamp is ISO 8601 with TZ offset so lexicographic sort matches chronological order. Frontend filter UI / mark-read wiring is a follow-up plan.
Aligns the consumer-facing type doc with the CGI source comment — the lexicographic==chronological claim holds only when all messages share one SMSC timezone offset (true for a single modem).
Author-time PDU encoder used to derive the decode/ fixtures (GSM-7 + UCS-2, SMS-DELIVER framing, optional concat UDH). Not shipped to the device and not invoked by the test harness — purely for regenerating known-good fixture hex.
Live-device run on the RM551E test modem: content parity is byte-clean across
all 51 merged messages vs the pre-migration sms_tool baseline; storage bug
fixed (73/255); newest-first sort confirmed; mark_read plumbing verified end
to end; lock coordination clean. BusyBox awk confirmed to emit UTF-8 bytes via
sprintf("%c") identically to GAWK. One gap: the inbox had no unread messages,
so the actual unread->read flip could not be observed (plumbing proven).
Final-review hardening. The /^ERROR/ guard didn't catch +CME ERROR: / +CMS ERROR: (they start with +). Harmless when an error is the sole response, but a +CMGL: header followed by a +CME ERROR: (storage corruption mid-enumeration) would consume the error line as PDU hex and surface a ghost message. Now skipped, so the corrupt slot is dropped cleanly. Also: renumber duplicate GET handler comment labels and mark the not-yet-implemented encode_* ops as PLANNED.
Defense-in-depth on the delete and mark_read POST handlers (both loop over a client-supplied indexes array): - Reject empty/non-numeric indexes via a case guard, counting each as a failure so an all-bad request reports partial_failure rather than success. - delete now uses a PID-qualified temp file ($$) to match mark_read, closing a concurrent-request race on the previously-static /tmp path. The injection surface was already safe (idx is double-quoted; qcmd/atcli_smd11 write a single AT line), so this is hardening, not a vuln fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the SMS inbox read path with a native shell + awk pipeline (no
sms_toolbinary for reads), keeping the backend localized. Adds the inbox features requested: read/unread status, newest-first sort, and correct multi-part + non-Latin/emoji decoding.scripts/usr/lib/qmanager/sms_pdu.awk): GSM-7 (incl. extension table), UCS-2 BE → UTF-8 with surrogate pairs (U+FFFD-hardened), UDH concat parsing (reference/part/total), ISO 8601 SCTS with TZ offset, read/unreadstatusfrom the CMGL stat byte, and a batcheddecode_listop (one awk invocation per request, not per message).scripts/www/cgi-bin/quecmanager/cellular/sms.sh): inbox GET now runsqcmd AT+CMGF=0+AT+CMGL=4piped through the codec — locking inherited fromqcmd's flock (no self-lock); CR-hardened ++CME/+CMS ERRORguarded parser. Merge carries status (unread-if-any-part) and default-sorts newest-first by timestamp. Newmark_readPOST action (AT+CMGR=<idx>,0). Fixed the long-standing storage0/0bug (nowAT+CPMS?).types/sms.ts):status: "read" | "unread"required;timestampdocumented as ISO 8601.Scope / out of scope
send/delete/delete_allstill usesms_tool(intentionally — separate future plan).delete+mark_readwith numeric-index validation + consistent temp-file naming.Test Plan
bash tests/sms_pdu_test.sh→passed=13 failed=0(13 fixtures: GSM-7, UCS-2 BMP + emoji surrogate, UDH concat, lone-surrogate, ISO 8601 TZ, batched decode_list, no-stale-concat invariant)sh -nclean onsms.sh;bunx tsc --noEmitadds zero new errorssms_toolbaseline across all 51 merged messages; BusyBoxsprintf("%c")UTF-8 emission confirmed; storage fixed to73/255; newest-first confirmed;mark_readplumbing verified end-to-end; lock coordination clean. Full record intests/fixtures/sms/parity/README.md.🤖 Generated with Claude Code