Skip to content

update ABAP lexer#2260

Open
mlaggner wants to merge 4 commits intorouge-ruby:maint.update-abapfrom
mlaggner:maint.update-abap
Open

update ABAP lexer#2260
mlaggner wants to merge 4 commits intorouge-ruby:maint.update-abapfrom
mlaggner:maint.update-abap

Conversation

@mlaggner
Copy link
Copy Markdown
Contributor

Updated existing PR #2257 by

  • Updated list of keywords and builtins (removed false positives, added RAP syntax tokens)
    • added source(s) of the keywords to the header of the lexer
  • Fixed the regular expressions for comment detection (removed the trailing newline, because this was visually disturbing)
  • Added the two types of comments to the demo page

rule %r/".*/, Comment::Single
rule %r(^\*.*), Comment::Multiline
rule %r/"[^\r\n]*/, Comment::Single
rule %r/^\*[^\r\n]*/, Comment::Single
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the /m flag, . in regexes doesn't typically match \n anyways.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I have been just mislead by the optical demo (http://localhost:9292/abap) which shows a newline after every comment, whereas the overview page (http://localhost:9292/) does not show them.

After comparing with some other languages I see that this is an issue for almost every other language too, so I might think this will work without the newline regexp - thanks for the hint

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show me a screenshot of "shows a newline after every comment"?

Copy link
Copy Markdown
Contributor Author

@mlaggner mlaggner Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I added a devcontainer (because I am not allowed to install ruby on my work device) mcr.microsoft.com/devcontainers/ruby:3
  2. ran puma and opened http://localhost:9292/
  3. in the overview page, the snipptes look good (for ABAP an other languages - just look at single line comments)
image image 4. then looking into the detail pages, I see line feeds after every single line comment (which are not in the sourcecode!). ABAP ( https://github.com/rouge-ruby/rouge/blob/main/spec/visual/samples/abap ): image

C-Sharp ( https://github.com/rouge-ruby/rouge/blob/main/spec/visual/samples/csharp ):
image

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What browser are you viewing these in? I definitely don't see those in Firefox on Mac. I'm kind of shocked that a browser would render a lone \r as a full newline in 2026.

# https://help.sap.com/doc/abapdocu_758_index_htm/7.58/en-US/index.htm?file=abenabap_shortref.htm
# https://help.sap.com/doc/abapdocu_758_index_htm/7.58/en-us/abenbuilt_in_functions_overview.htm
# https://help.sap.com/doc/abapdocu_758_index_htm/7.58/en-US/abencds_language_elements.htm
# https://help.sap.com/doc/abapdocu_758_index_htm/7.58/en-us/abenabap_words.htm
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these keyword/builtin lists the same as in the sources? If so, I might take a moment to write a docs parser for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second and the last link probably contain everything - but I cannot guarantee 100%, since SAP mixes some builtin data types into the list of the keywords which I would not see as a keyword, but they may be in a different context (and this is the pain in the ass for ABAP/SAP - some words can be a keyword in a special context, but may be just some names/variables in another context - it is not as simple as C, Java...)

Since those lists barely change (a few new keywords may appear with greater SAP releases), the content of the list is rather stable - whereas the layout of the page may not be. Since I am not an expert in writing docs parser and the list is complete for the latest SAP release, I do not see the need to invest more work here than needed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm asking is - did you manually edit the contents of these pages to source the lists? Or are these lists just the lists of keywords from the docs?

Copy link
Copy Markdown
Contributor Author

@mlaggner mlaggner Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not 1:1 the same, since SAP is mixing some builtin-data types into the last list (https://help.sap.com/doc/abapdocu_758_index_htm/7.58/en-us/abenabap_words.htm) which are not keywords per se - but (depending on the usage), some of them may be a keyword a special context 🤦
e.g. CHAR is a built in data type (but not as you know a string or similar from other languages), but CL_DBI_UTILITIES is even a SAP delivered class. I try to mimic the syntax highlighting of the ABAP editor here...

But your approach is still good (but out of my scope here): I saw that I still missed some SQL functions in the long list of KEYWORDS - this shows me that manually maintaining it is already hard. But on the other side: writing a script/parser fetches in false positives. I don't know that the lesser evil is

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - I found one more documentation for ABAP in S/4 HANA cloud where more keywords are in which are not available in ABAP on premise (https://help.sap.com/doc/abapdocu_cp_index_htm/CLOUD/en-US/ABENABAP_REFERENCE.html)...

I think I use my vacation to think about a parser. Where do you see a parser located? Are there examples in this project?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I can take a look. The other doc parsers are in tasks/builtins/*.rake. Some of them use simple regexes but the newer apache.rake (for httpd) one brings in Nokogiri to parse their XML docs. I would be more than happy to write one if there's a good source, and if the docs parser needs to hard-code a list of exceptions to filter out, that's fine by me too, presuming it's a smaller list that's easier to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants