refactor recursive comment grammar rules with external scanner #8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've been working on a new language extension for the Zed editor for Django templates. I tried using this tree-sitter grammar, but kept running into crashes. The logs from the editor were no help, but after some fumbling around I narrowed it down to the comment rules.
Both
unpaired_commentandpaired_commentuse recursive patterns that I think are the cause of the issue, possibly because Zed extensions get compiled to WASM (though that's just a hunch, no concrete evidence that's the core issue).The problematic patterns:
unpaired_comment:repeat(seq(alias($.unpaired_comment, ""), repeat(/.|\s/)))paired_comment:repeat(seq(alias($.paired_comment, ""), repeat(/.|\s/)))To fix this, I made two changes, one small to unpaired comments and one large to paired comments.
For unpaired comments, I changed to a simple
token()pattern -- Django just ignores everything between{#and#}, so no recursion needed.For paired comments, I added an external C scanner inspired by tree-sitter-liquid, but took a different approach to preserve the original parsing behavior. The scanner uses depth tracking to find the balanced closing
{% endcomment %}, incrementing depth when it sees nested{% comment %}tags and decrementing for{% endcomment %}. This maintains the exact same tree structure as the original grammar (single comment node), just without the recursive patterns that caused crashes.