-
-
Notifications
You must be signed in to change notification settings - Fork 25
Allow extension handlers to fall through to default logic #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Allow extension handlers to fall through to default logic #45
Conversation
|
Hey! Can you expand on your actual use case? Sounds like a lot of work to overwrite all handlers. Why not reach for a |
|
Sure!
My specific use case is that I am working on a Markdown editor and so I need to know the locations in code of many token types. I could just use I could handle the
The
If extensions defined after this one define handlers for the same tokens, they would fall through in the same way:
Here's a very simplified example (in practice I will handle many more token types and have a more generic handler): function handleQuoteMarker(token) {
this.stack.findLast(n => n.type === 'quote')?.children.push({
type: "quoteMarker",
position: {
end: token.end,
start: token.start,
},
});
return false // allow the parser to still handle this token, if necessary, to build quote nodes
};
const tree = fromMarkdown(markdown, "utf8", {
mdastExtensions: [{
exit: {
blockQuoteMarker: handleQuoteMarker
}
}],
});
That would also work by introducing new events that are less likely to conflict, although I would still be concerned about the possibility that the package internals or other extensions might start to depend on these in the future. I still wouldn't want my extension to override any default functionality, just augment it. Perhaps the design for these new events could specify that all registered handlers for a token are always called in a consistent order, regardless of conflicts across extensions? Would it be confusing though if these work in a different way from |
|
Hello @wooorm, I'm still interested in shipping this feature or some form of it if possible. We have been working around it by patching the installed NPM dependency but of course that's definitely not ideal. Any chance we might be able to get this across the line? Happy to iterate on it and change the approach if you prefer something else. |
|
Ah, very cool that GH is building exciting things on this :) Thanks for your patience! From your OG goal above, maybe the tokens are indeed what you should use. To get around: “that would require inefficiently tokenizing the document twice”, perhaps an alternative is to get access to the original tokens from here?
That’s interesting! What is this
Handlers (in the existing sense) are supposed to create something, mutate the result, and can be overwritten to do things differently. They assume control. From your But, your example seems to be about a new node, so I don’t quite see how a fall through is needed, as no extension will handle it? Sorry for the wait and all the questions! |
How would we get access to the original tokens though? There's no token information on the resulting AST, so when we walk the AST, how could we locate the tokens that 'belong' to each node? In other words, say we build an I guess with your "listeners" concept we could record each token as it is encountered as a side effect, so that we'd have the tokens saved for reuse when walking the AST. That would still be a rather complicated / expensive solution even without the retokenization step:
That's a typo in my example - it should be
I think this summarizes well what the goal is with this change - in addition to supporting handlers that create something, it would add support for handlers that mutate something that was already created. The only way to do that at present is to re-walk the AST after creating it, which has the problem of lack of access to the original tokens as mentioned earlier.
In my opinion this is an acceptable risk. It's not unlike the risk of a handler that introduces a fully custom new node type -- the parent still might not expect that node to be there but we accept that risk when creating such low-level extensions.
Sorry again for the confusion here! Definitely meant for this to be the built-in node. |
|
If you are interested, here's the actual full source code of the micromark extension we have written to insert /** All Micromark token types that we convert to delimiter nodes. */
const delimiterTokenTypes = [
'blockQuotePrefix',
'strongSequence',
'emphasisSequence',
'codeTextSequence',
'codeTextPadding',
'strikethroughSequence',
'thematicBreakSequence',
'codeFencedFence',
'atxHeadingSequence',
'whitespace',
'setextHeadingLine',
'labelMarker',
'labelImageMarker',
'resource',
'resourceDestination',
'linePrefix',
'listItemIndent',
'listItemPrefix',
'taskListCheck',
'tableCellDivider',
'tableDelimiterRow',
'tableDelimiter',
'autolinkMarker',
'hardBreakEscape',
'escapeMarker',
'characterReference',
]
/** Util that returns the lowest open node on the stack - the one that we should push children to. */
function getParent(context) {
// First look for any open delimiters in the delimiter stack
if (context.data.delimiterStack?.length) return context.data.delimiterStack.at(-1)!
const parent = context.stack.at(-1)
// micromark uses Fragment nodes as temporary buckets to push things to. For most nodes we can just push to the fragment
// and mdast will keep our data when it moves those nodes to the correct parent. But for Literal nodes (nodes with
// a 'value' property), it will discard nodes we push to the fragment as it builds the plain string value. So in
// that case we push to the fragment's parent (the Literal node) instead.
if (parent?.type === 'fragment') {
const grandparent = context.stack.at(-2)
if (grandparent && 'value' in grandparent) return grandparent
}
return parent
}
/** Create a delimiter node and push it to the currently open node. */
const handleDelimiterEnter = function (this, token) {
const node = {
type: 'delimiter',
tokenType: token.type,
raw: this.sliceSerialize(token),
position: {
start: token.start,
end: token.end,
},
}
// Normally, we could just enter the delimiter using the built-in helper: `this.enter(delimiter, token)`
// However, this would push the delimiter onto the main stack, which breaks micromark's expectations.
// To work around this, we maintain a separate `delimiterStack` just for delimiters.
const parent = getParent(this)
if (parent) {
parent.children ??= []
parent.children.push(node)
this.data.delimiterStack ??= []
this.data.delimiterStack.push(node)
}
// Return false to fall through to the default logic to allow mdast to still create regular nodes from this token
// (only works with our custom patch!)
return false
}
/** Remove the delimiter from the stack to close it out. */
const handleDelimiterExit = function (this, token) {
const node = this.data.delimiterStack?.pop()
// This should be impossible, because we should never exit a node we didn't just enter
if (node?.tokenType !== token.type) throw new Error(`Expected open '${token.type}' delimiter on stack`)
return false
}
export function mdastDelimitersExtension() {
return {
enter: Object.fromEntries(delimiterTokenTypes.map(type => [type, handleDelimiterEnter])),
exit: Object.fromEntries(delimiterTokenTypes.map(type => [type, handleDelimiterExit])),
}
} |
Initial checklist
Description of changes
TLDR: Adds support for extensions'
enterandexithandlers to returnfalseto fall back to earlier extensions or the default logic.Context
I am working on a project where I'd like to attach additional information to nodes based on the information already in the tokenization output. In this particular case, I'm trying to record data about the specific syntax that formed a node in the tree (essentially building a sort of concrete syntax tree on top of the AST).
While building an extension that provides handlers for the relevant syntax tokens and attaches them to nodes in the stack, I noticed that I was breaking the built-in parsing functionality. Upon digging I realized that this was because my
enterhandlers were overriding and disabling the default handlers, even though I only wanted to encounter the tokens without doing the parsing and tree building myself.I realized that a very useful feature of extension handlers would be the ability to 'fall through' to the default behavior if desired. To me the most intuitive approach for this was to support returning
falseto indicate that a token was not handled by the handler. This is a flexible API that allows for handlers to fall through conditionally, allowing them to choose to only take over the handling of a token in certain circumstances.Since the code change here is fairly straightforward, I chose to go ahead and open a PR rather than start with an issue. I am very open to feedback here and happy to change the approach or reject the change entirely if you don't think it's useful.