-
Notifications
You must be signed in to change notification settings - Fork 0
Align safe-string codec with CSM keyword-based %HH escaping and CSM-Wiki VI documented APIs #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
8cad3f8
feat: add reversible safe string hex codec with tests and design doc
Copilot 5cd480c
chore: remove pycache artifacts from safe string module
Copilot 7c283a5
feat: align safe string codec with CSM keyword escaping rules
Copilot 55b6bd9
test: clarify keyword-escape behavior and add unicode adjacency coverage
Copilot d324143
chore: apply review nits for escape formatting and test naming
Copilot 5450d56
Update safe_string_codec/test_safe_string_codec.py
nevstop 38c606e
feat: align safe string API naming and parameters with CSM-Wiki VI docs
Copilot 357d269
chore: refine wording for original argument output docstring
Copilot 585d73b
refactor: keep only CSM-Wiki documented safe string functions
Copilot a8a3e42
docs: add flowcharts for safe string encode/decode functions
Copilot 3907f2c
docs: refine flowchart wording consistency
Copilot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| from .safe_string_codec import make_string_arguments_safe, revert_arguments_safe_string | ||
|
|
||
| __all__ = [ | ||
| "make_string_arguments_safe", | ||
| "revert_arguments_safe_string", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| # 任务原始提示词 | ||
|
|
||
| 我需要设计一对安全字符串的转换函数,可以将字符串中的特殊字符替换为HEXCODE, 作为安全字符串,在应用中使用;并提供安全字符串转换回普通字符串的功能。 | ||
|
|
||
| 请帮我设计这个转换算法,我希望它高效、且考虑全面,能够对所有的字符串转换都支持,尤其是 corner case 的支持。 | ||
|
|
||
| 我希望你生成 python 的代码和测试代码,并将本提示词和设计思路、考虑的corner case 等详细思路文档,都记录下来。 所有的文件放置到根目录下一个合适名称的文件夹 | ||
|
|
||
| --- | ||
|
|
||
| # 设计目标 | ||
|
|
||
| 1. **可逆**:任意输入字符串都能 100% 恢复。 | ||
| 2. **安全**:与 CSM 参数关键字规则保持一致,避免消息解析冲突。 | ||
| 3. **高效**:单次线性扫描,时间复杂度 O(n)。 | ||
| 4. **全面**:覆盖 CSM 关键字字符、控制字符、Unicode、非法输入等边界场景。 | ||
|
|
||
| # 编码算法 | ||
|
|
||
| ## 0) 与 CSM-Wiki VI 名称/参数对齐 | ||
|
|
||
| - `CSM - Make String Arguments Safe.vi` | ||
| - `Argument String` | ||
| - `Ignore Argument Type(F)` | ||
| - `Safe Argument String` | ||
| - `CSM - Revert Arguments-Safe String.vi` | ||
| - `Safe Argument String` | ||
| - `Force Convert (F)` | ||
| - `Origin Argument String` | ||
|
|
||
| Python 实现提供同语义接口: | ||
|
|
||
| - `make_string_arguments_safe(argument_string, ignore_argument_type=False)` | ||
| - `revert_arguments_safe_string(safe_argument_string, force_convert=False)` | ||
|
|
||
| ## 1) 关键字来源 | ||
|
|
||
| 根据 `CSM -- Revert Arguments Safe StringVI` 的关键字说明,涉及关键模式: | ||
|
|
||
| - `->` | ||
| - `->|` | ||
| - `-@` | ||
| - `-&` | ||
| - `<-` | ||
| - `\r` | ||
| - `\n` | ||
| - `//` | ||
| - `>>` | ||
| - `>>>` | ||
| - `;` | ||
| - `,` | ||
|
|
||
| 因此,参与关键字的字符集合为:`-|@&<>\r\n/;,`。 | ||
|
|
||
| ## 2) 输入层 | ||
|
|
||
| 输入为 Python `str`,按字符线性扫描。 | ||
|
|
||
| ## 3) 输出字符策略 | ||
|
|
||
| - 若字符属于关键字字符集合 `-|@&<>\r\n/;,`,转义为 `%HH`(两位十六进制,大写) | ||
| - `%` 本身也转义为 `%25`,保证解码无歧义 | ||
| - 其他字符直接输出(包括普通文本和 Unicode) | ||
| - 这里采用**按字符保守转义**(不是按完整 token 匹配):只要字符属于集合就转义,不依赖上下文 | ||
|
|
||
| 例如: | ||
| - `->` -> `%2D%3E` | ||
| - `>` -> `%3E`(即使单独出现也会转义) | ||
| - `;` -> `%3B` | ||
| - `,` -> `%2C` | ||
| - `%` -> `%25` | ||
|
|
||
| ## 4) 解码策略 | ||
|
|
||
| 按字符线性扫描: | ||
| - 若遇到 `%`,必须紧跟两位十六进制,转为对应字符 | ||
| - 若不是 `%`,直接输出原字符 | ||
| - 不完整 `%` 转义、非法十六进制,抛出 `ValueError` | ||
|
|
||
| ## 5) 两个函数流程图 | ||
|
|
||
| ### `make_string_arguments_safe(argument_string, ignore_argument_type=False)` | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| A[开始] --> B{argument_string 是 str?} | ||
| B -- 否 --> E1[抛出 TypeError] --> Z[结束] | ||
| B -- 是 --> C{ignore_argument_type 是 bool?} | ||
| C -- 否 --> E2[抛出 TypeError] --> Z | ||
| C -- 是 --> D[逐字符扫描 argument_string] | ||
| D --> F{字符在 -|@&<>\\r\\n/;, 或 % ?} | ||
| F -- 是 --> G[输出 %HH 大写十六进制] | ||
| F -- 否 --> H[原样字符输出] | ||
| G --> I{还有下一个字符?} | ||
| H --> I | ||
| I -- 是 --> F | ||
| I -- 否 --> J[拼接 safe_argument_string] | ||
| J --> K{ignore_argument_type ?} | ||
| K -- 是 --> L[返回 safe_argument_string] --> Z | ||
| K -- 否 --> M[返回 <SAFESTR> + safe_argument_string] --> Z | ||
| ``` | ||
|
|
||
| ### `revert_arguments_safe_string(safe_argument_string, force_convert=False)` | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| A[开始] --> B{safe_argument_string 是 str?} | ||
| B -- 否 --> E1[抛出 TypeError] --> Z[结束] | ||
| B -- 是 --> C{force_convert 是 bool?} | ||
| C -- 否 --> E2[抛出 TypeError] --> Z | ||
| C -- 是 --> D{以 <SAFESTR> 开头?} | ||
| D -- 是 --> E[去掉前缀, 得到 encoded_text] | ||
| D -- 否 --> F{force_convert ?} | ||
| F -- 否 --> G[原样返回 safe_argument_string] --> Z | ||
| F -- 是 --> H[encoded_text = safe_argument_string] | ||
| E --> I[逐字符扫描 encoded_text] | ||
| H --> I | ||
| I --> J{当前字符是 % ?} | ||
| J -- 否 --> K[原样字符加入结果] | ||
| J -- 是 --> L{后面有两位字符?} | ||
| L -- 否 --> E3[抛出 ValueError: 不完整转义] --> Z | ||
| L -- 是 --> M{两位都是十六进制?} | ||
| M -- 否 --> E4[抛出 ValueError: 非法十六进制] --> Z | ||
| M -- 是 --> N[%HH 解码为字符并加入结果] | ||
| K --> O{还有下一个字符?} | ||
| N --> O | ||
| O -- 是 --> J | ||
| O -- 否 --> P[返回拼接后的原字符串] --> Z | ||
| ``` | ||
|
|
||
| # 为什么该方案无歧义 | ||
|
|
||
| - `%` 作为唯一转义前缀,固定长度 3(`%HH`)。 | ||
| - `%` 本身会被编码为 `%25`,不会与原文冲突。 | ||
| - 解码器可严格校验格式,避免 silent corruption。 | ||
|
|
||
| # 复杂度与效率 | ||
|
|
||
| - 编码:单次遍历字符,O(n) | ||
| - 解码:单次遍历字符,O(n) | ||
| - 仅使用轻量字符串拼接列表,内存开销可控 | ||
|
|
||
| # 覆盖的 corner cases | ||
|
|
||
| 1. 空字符串 | ||
| 2. 普通 ASCII 文本(不应被改写) | ||
| 3. CSM 关键字字符与组合(`->`、`//`、`>>>`、`;`、`,` 等) | ||
| 4. 转义前缀字符 `%` 本身 | ||
| 5. 控制字符:`\r`、`\n` 以及其他非关键控制字符(如 `\t`、`\x00`) | ||
| 6. Unicode:中文、emoji | ||
| 7. 超长混合字符串 | ||
| 8. ASCII 全量 roundtrip(0x00-0x7F) | ||
| 9. 非法输入处理:不完整转义、非十六进制 | ||
| 10. 入参类型错误(非 `str`) | ||
|
|
||
| # 代码与测试文件 | ||
|
|
||
| - `safe_string_codec/safe_string_codec.py`:核心编码/解码实现 | ||
| - `safe_string_codec/test_safe_string_codec.py`:`unittest` 全面测试 | ||
| - `safe_string_codec/__init__.py`:对外导出接口 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| """Safe string conversion utilities. | ||
|
|
||
| This module provides two functions: | ||
| - make_string_arguments_safe: CSM - Make String Arguments Safe.vi equivalent. | ||
| - revert_arguments_safe_string: CSM - Revert Arguments-Safe String.vi equivalent. | ||
|
|
||
| The escaping behavior follows CSM keyword-safe conventions: | ||
| - Escape CSM keyword characters using ``%HH`` uppercase hex. | ||
| - ``%`` is also escaped to keep decoding unambiguous. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| _HEX_DIGITS = set("0123456789ABCDEFabcdef") | ||
| _ESCAPE = "%" | ||
| _SAFE_STRING_TYPE = "<SAFESTR>" | ||
|
|
||
|
nevstop marked this conversation as resolved.
|
||
| # Character-based conservative escaping for CSM keyword safety. | ||
| # We escape all characters that appear in documented keyword patterns | ||
| # (->, ->|, -@, -&, <-, \r, \n, //, >>, >>>, ;, ,), regardless of context. | ||
| _CSM_KEYWORD_CHARS = set("-|@&<>\r\n/;,") | ||
| _ESCAPED_CHARS = _CSM_KEYWORD_CHARS | {_ESCAPE} | ||
|
|
||
|
|
||
| def make_string_arguments_safe(argument_string: str, ignore_argument_type: bool = False) -> str: | ||
| """CSM - Make String Arguments Safe.vi equivalent. | ||
|
|
||
| Args: | ||
| argument_string: String argument. | ||
| ignore_argument_type: If True, do not prepend ``<SAFESTR>``. | ||
|
|
||
| Returns: | ||
| Safe argument string. | ||
| """ | ||
| if not isinstance(argument_string, str): | ||
| raise TypeError("argument_string must be str") | ||
| if not isinstance(ignore_argument_type, bool): | ||
| raise TypeError("ignore_argument_type must be bool") | ||
|
|
||
| encoded_parts: list[str] = [] | ||
| for ch in argument_string: | ||
| if ch in _ESCAPED_CHARS: | ||
| encoded_parts.append(f"{_ESCAPE}{ord(ch):02X}") | ||
| else: | ||
| encoded_parts.append(ch) | ||
| safe_argument_string = "".join(encoded_parts) | ||
| if ignore_argument_type: | ||
| return safe_argument_string | ||
| return f"{_SAFE_STRING_TYPE}{safe_argument_string}" | ||
|
|
||
|
|
||
| def revert_arguments_safe_string(safe_argument_string: str, force_convert: bool = False) -> str: | ||
| """CSM - Revert Arguments-Safe String.vi equivalent. | ||
|
|
||
| Args: | ||
| safe_argument_string: Safe string argument. | ||
| force_convert: Convert even when argument type is not ``SAFESTR``. | ||
|
|
||
| Returns: | ||
| Original argument string. | ||
|
|
||
| Raises: | ||
| ValueError: If input is malformed. | ||
| """ | ||
| if not isinstance(safe_argument_string, str): | ||
| raise TypeError("safe_argument_string must be str") | ||
| if not isinstance(force_convert, bool): | ||
| raise TypeError("force_convert must be bool") | ||
|
|
||
| encoded_text = safe_argument_string | ||
| if safe_argument_string.startswith(_SAFE_STRING_TYPE): | ||
| encoded_text = safe_argument_string[len(_SAFE_STRING_TYPE) :] | ||
| elif not force_convert: | ||
| return safe_argument_string | ||
|
|
||
| result: list[str] = [] | ||
| i = 0 | ||
| length = len(encoded_text) | ||
|
|
||
| while i < length: | ||
| ch = encoded_text[i] | ||
| if ch == _ESCAPE: | ||
| if i + 2 >= length: | ||
| raise ValueError("Malformed safe string: incomplete escape sequence") | ||
| h1, h2 = encoded_text[i + 1], encoded_text[i + 2] | ||
| if h1 not in _HEX_DIGITS or h2 not in _HEX_DIGITS: | ||
| raise ValueError("Malformed safe string: invalid hex escape") | ||
| result.append(chr(int(h1 + h2, 16))) | ||
| i += 3 | ||
| continue | ||
|
|
||
| result.append(ch) | ||
| i += 1 | ||
|
|
||
| return "".join(result) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| import unittest | ||
|
|
||
| from safe_string_codec import make_string_arguments_safe, revert_arguments_safe_string | ||
|
|
||
|
|
||
| class SafeStringCodecTests(unittest.TestCase): | ||
| def test_package_level_imports_are_available(self): | ||
| self.assertEqual(make_string_arguments_safe.__name__, "make_string_arguments_safe") | ||
| self.assertEqual( | ||
| revert_arguments_safe_string.__name__, "revert_arguments_safe_string" | ||
| ) | ||
|
|
||
| def test_empty_string(self): | ||
| self.assertEqual(make_string_arguments_safe("", ignore_argument_type=True), "") | ||
| self.assertEqual(revert_arguments_safe_string("", force_convert=True), "") | ||
|
|
||
| def test_make_string_arguments_safe_default_adds_type_prefix(self): | ||
| safe = make_string_arguments_safe("A->B") | ||
| self.assertTrue(safe.startswith("<SAFESTR>")) | ||
| self.assertEqual(revert_arguments_safe_string(safe), "A->B") | ||
|
|
||
| def test_make_string_arguments_safe_ignore_type(self): | ||
| safe = make_string_arguments_safe("A->B", ignore_argument_type=True) | ||
| self.assertEqual(safe, "A%2D%3EB") | ||
|
|
||
| def test_revert_without_force_and_without_prefix_keeps_input(self): | ||
| self.assertEqual(revert_arguments_safe_string("A%2D%3EB"), "A%2D%3EB") | ||
|
|
||
| def test_revert_force_convert_without_prefix_decodes(self): | ||
| self.assertEqual( | ||
| revert_arguments_safe_string("A%2D%3EB", force_convert=True), "A->B" | ||
| ) | ||
|
|
||
| def test_ascii_alphanumeric_and_space_kept(self): | ||
| original = "AbcXYZ019_. hello" | ||
| safe = make_string_arguments_safe(original, ignore_argument_type=True) | ||
| self.assertEqual(safe, original) | ||
| self.assertEqual(revert_arguments_safe_string(safe, force_convert=True), original) | ||
|
|
||
| def test_csm_keyword_characters_are_encoded(self): | ||
| original = "->| -@ -& <-\r\n// >> >>> ;," | ||
| safe = make_string_arguments_safe(original, ignore_argument_type=True) | ||
| self.assertNotIn("->", safe) | ||
| self.assertNotIn(">>", safe) | ||
| self.assertNotEqual(safe, original) | ||
| self.assertEqual(revert_arguments_safe_string(safe, force_convert=True), original) | ||
|
|
||
| def test_percent_character_is_always_encoded(self): | ||
| original = "%" | ||
| safe = make_string_arguments_safe(original, ignore_argument_type=True) | ||
| self.assertEqual(safe, "%25") | ||
| self.assertEqual(revert_arguments_safe_string(safe, force_convert=True), original) | ||
|
|
||
| def test_unicode_chinese_and_emoji(self): | ||
| original = "中文😀" | ||
| safe = make_string_arguments_safe(original, ignore_argument_type=True) | ||
| self.assertEqual(revert_arguments_safe_string(safe, force_convert=True), original) | ||
|
|
||
| def test_unicode_with_adjacent_keywords(self): | ||
| original = "前缀->中文😀//后缀" | ||
| safe = make_string_arguments_safe(original, ignore_argument_type=True) | ||
| self.assertNotIn("->", safe) | ||
| self.assertNotIn("//", safe) | ||
| self.assertEqual(revert_arguments_safe_string(safe, force_convert=True), original) | ||
|
|
||
| def test_control_and_null_characters(self): | ||
| original = "line1\nline2\t\x00end" | ||
| safe = make_string_arguments_safe(original, ignore_argument_type=True) | ||
| self.assertEqual(revert_arguments_safe_string(safe, force_convert=True), original) | ||
|
|
||
| def test_long_mixed_string(self): | ||
| original = "A" * 1000 + "🚀" + "\x00" + "終" | ||
| safe = make_string_arguments_safe(original, ignore_argument_type=True) | ||
| self.assertEqual(revert_arguments_safe_string(safe, force_convert=True), original) | ||
|
|
||
| def test_roundtrip_for_all_ascii_values(self): | ||
| original = "".join(chr(i) for i in range(128)) | ||
| safe = make_string_arguments_safe(original, ignore_argument_type=True) | ||
| restored = revert_arguments_safe_string(safe, force_convert=True) | ||
| self.assertEqual(restored, original) | ||
|
|
||
| def test_decode_rejects_incomplete_escape(self): | ||
| with self.assertRaises(ValueError): | ||
| revert_arguments_safe_string("abc%", force_convert=True) | ||
|
|
||
| def test_decode_rejects_bad_hex(self): | ||
| with self.assertRaises(ValueError): | ||
| revert_arguments_safe_string("abc%G1", force_convert=True) | ||
|
|
||
| def test_type_errors(self): | ||
| with self.assertRaises(TypeError): | ||
| make_string_arguments_safe(None) # type: ignore[arg-type] | ||
| with self.assertRaises(TypeError): | ||
| revert_arguments_safe_string(None) # type: ignore[arg-type] | ||
| with self.assertRaises(TypeError): | ||
| make_string_arguments_safe("ok", ignore_argument_type=None) # type: ignore[arg-type] | ||
| with self.assertRaises(TypeError): | ||
| revert_arguments_safe_string("ok", force_convert=None) # type: ignore[arg-type] | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| unittest.main() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR description says the diff should be documentation-only/minimal doc update, but this change introduces new runtime code and a full unittest suite (codec implementation + exports). Please update the PR description/scope (or split into separate PRs) so reviewers and release notes match what’s actually being merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已按该反馈调整 PR 描述范围,现已与实际合并内容一致(实现 + 测试 + 文档,而不仅是最小文档更新)。本次仅更新 PR 描述,无新增代码改动。当前分支提交:
3907f2c。无 UI 变更,截图不适用。