[Bug] TOML string outputs are not properly escaped#6000
[Bug] TOML string outputs are not properly escaped#6000eric-forte-elastic wants to merge 12 commits into
Conversation
Bug - GuidelinesThese guidelines serve as a reminder set of considerations when addressing a bug in the code. Documentation and Context
Code Standards and Practices
Testing
Additional Checks
|
shashank-elastic
left a comment
There was a problem hiding this comment.
Local Testing works as expected
Testing 🟢
detection-rules on 5182-bug-toml-string-outputs-are-not-properly-escaped [$?] is 📦 v1.6.29 via 🐍 v3.12.8 (.venv) on ☁️ shashank.suryanarayana@elastic.co
❯ python -m detection_rules import-rules-to-repo sample_rule.ndjson --required-only
Loaded config file: /Users/shashankks/elastic_workspace/detection-rules/.detection-rules-cfg.json
█▀▀▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄▄▄ ▄ ▄ █▀▀▄ ▄ ▄ ▄ ▄▄▄ ▄▄▄
█ █ █▄▄ █ █▄▄ █ █ █ █ █ █▀▄ █ █▄▄▀ █ █ █ █▄▄ █▄▄
█▄▄▀ █▄▄ █ █▄▄ █▄▄ █ ▄█▄ █▄█ █ ▀▄█ █ ▀▄ █▄▄█ █▄▄ █▄▄ ▄▄█
[+] Building rule for /Users/shashankks/elastic_workspace/detection-rules/rules/sample_escaped_backslash_in_query_string_filter.toml
1 results exported
1 rules converted
0 exceptions exported
0 actions connectors exported
detection-rules on 5182-bug-toml-string-outputs-are-not-properly-escaped [$?] is 📦 v1.6.29 via 🐍 v3.12.8 (.venv) on ☁️ shashank.suryanarayana@elastic.co
❯
|
Updated testing for 0e034aa |
https://github.com/elastic/detection-rules into 5182-bug-toml-string-outputs-are-not-properly-escaped
|
With the current implementation of #6046, we also resolve the primary issue affecting the community member who reported the bug. However, another bug specific to the escape character handling on multi line strings should be opened in its place if we use that PR to address it. |
| # (as long as ``"""`` doesn't appear, which is guarded above via ``TRIPLE_DQ not in v``), | ||
| # but backslashes must be escaped so that e.g. ``Hello\:World`` is serialized as | ||
| # ``Hello\\:World`` -- otherwise ``\:`` is an invalid TOML escape sequence (issue #5182). | ||
| escaped_lines = [line.replace("\\", "\\\\") for line in lines] |
There was a problem hiding this comment.
This will break the fields "note", "setup", "description", "osquery" if they are multiline as they are somehow additionally escaped below in lines 253-274.
Same issue occurred in PR #6074
Query needs to be handled in another way or, from my perspective preferred, multiline should be escaped in general (like in this PR or #6074) but lines 253-274 needs to be removed. As I don't know what impact this might have I will wait for some further feedback for now.
There was a problem hiding this comment.
Removing lines 253-274 without performing a fairly broad refactor of our toml_write process would cause a number of issues in our current pipeline.
The fields you mentioned will always be additionally escaped when written to TOML to counteract the escape invocation that is currently present in toml_write. Having json.dumps creates an issue as you described in #6074; however, adding an additional escape does not necessarily break these other fields. Their values change, but it does not inherently cause them to not function correctly (as the json dumps does). This is only of particular importance to os_query, as it is a functional field rather than a descriptive one. However, given that in the current bug is that os_query is broken when it goes to multi-line (sufficient length), this change is intended as it can address the issue.
There was a problem hiding this comment.
Not sure about the os_query stuff, haven't had such an example so far but you are correct this might be the only field with impact on functionality.
Nevertheless if you use json dumps or your approach above (currently I don't see a difference in things I tested) you will add more escape characters than needed and which will be removed.
e.g. if the input of note field (on system 1) is: "some \ example" you will end up with a note field (on system 2) of: "some \ example"
This distorts the result (even it has no impact on functionality) and will drive you nuts if you compare two rules on two systems.
There was a problem hiding this comment.
100% agree, generally speaking I am going to close this PR and put together a refactor proposal which should resolve this issue in light of:
#6000 (comment)
Pull Request
Issue link(s):
Resolves #5182
Related to: #6045
Summary - What I changed
In
RuleTomlEncoder.dump_str, the multiline branch now escapes backslashes per line before emitting the"""…"""block. Literal newlines and single"characters remain valid inside triple-DQ basic strings and are left alone. Only\is doubled so that a source\:becomes TOML\\:, which the TOML parser decodes back to the original\:.Before (invalid TOML):
After (valid TOML, same decoded value):
The PR also adds this filter to the test_toml.jdson to test for this creating invalid toml.
This formating update also addresses the core bug of #6045 where newlines are introduced going from JSON -> TOML -> JSON. With this PR merged, only the TOML will show the newline. This is not ideal, but it makes this a cosmetic issue rather than breaking a DaC pipeline. For full fix, see #6046
Left is new desired output, and the right is the old output.

Alternatives
Also considered alternatives such as trying to use
NESTED_PRESERVED_FIELD_NAMESor similar approaches as other formatting PRs. However, none were found that would reach the core issue. In the case ofNESTED_PRESERVED_FIELD_NAMES, among other reasons we have a hard-coded skip for any key namedquerynot just the base rulequery(which perhaps should be updated in another PR).How To Test
Either create a rule with a filter similar to that of the issue or use the supplied test ndjson.
issue_5182_rule.ndjson.txt
Once downloaded, run an import rule to repo command to see the newly created TOML and verify.
Also run
make test-remote-clito ensure no formatting issues on testing rules.Details
Checklist
bug,enhancement,schema,maintenance,Rule: New,Rule: Deprecation,Rule: Tuning,Hunt: New, orHunt: Tuningso guidelines can be generatedmeta:rapid-mergelabel if planning to merge within 24 hoursContributor checklist