-
Notifications
You must be signed in to change notification settings - Fork 570
Description
The program below works as expected when reading from stdin, but segfaults when it is instead lexing a buffer.
The key thing about this example is that one of the rules uses input() to gobble from "!" to EOF (yes, it looks as if I could use a "!".* pattern, but that doesn't produce the intended results in the real case; the lexer needs to balance braces, and if I hit EOF when trying to do that, I want to recover gracefully).
When run, reading from stdin, I get
$ flex -o eof.c eof.lex
$ cc -o eof eof.c
$ echo -n 'one two!three four' | ./eof
word:<one>
-> 1
-> 2
word:<two>
-> 1
buf=<three four>
-> 3
That's fine, but when I instead ./eof 'one two !three four', which scans the contents of a buffer set up by yy_scan_string, I get identical program output, followed by a segfault inside yy_get_next_buffer.
I can't work out which part of the flex manual is telling me I should expect that to happen.
The sequence of events seems to be that the lexer is finding its way to the end of file, as expected (and an <<EOF>> action confirms this), but not stopping there, despite the presence of the noyywrap option, and collapsing when it can't find a ‘next’ buffer.
Points:
- Option -d doesn't illuminate.
- It is, of course, a little hard to follow what the generated code is doing, but looking at the location of the segfault, it is indeed around the place where the code is checking for
yywrap, so it should be getting the message that there is no more input coming. - The only real illustration of using
input(), in the flex manual, is in a case where hitting EOF is reported as an error. Here, I'm doing essentially the same as in that example, but regarding EOF as an acceptable end of the scan. - The same behaviour appears when using a reentrant scanner.
- It's worth noting that
input()returns 0, notEOF, at EOF, despite what Sect.8 illustrates (cf. flex repo issue, and links there), and despite the rather mysterious note about a ‘“real” end-of-file’ in Sect.20. I have a suspicion that this remark in Sect.20 is telling me something terribly important, but I can't work out what. - This is with flex 2.6.4 and clang 15 on macOS, and 2.6.4 and gcc on (a RHEL-derived) Linux (I can confirm the precise gcc version if that would be helpful, but this doesn't look obviously compiler dependent)..
Program:
ALPHABETIC [a-zA-Z]
WS [^a-zA-Z!]
%option noyywrap nounput
%%
{ALPHABETIC}+ {
printf("word:<%s>\n", yytext);
return 1;
}
{WS}+ {
return 2;
}
"!" { // gobble to end of input
char buf[80];
for (int idx=0; (buf[idx] = input()); idx++) /* empty */ ;
printf("buf=<%s>\n", buf);
// YY_FLUSH_BUFFER; /* makes no difference */
return 3;
}
%%
int main(int argc, char** argv)
{
switch (argc) {
case 1: break;
case 2:
yy_scan_string(argv[1]);
break;
default:
fprintf(stderr, "Usage: %s [string]\n", argv[0]);
exit(1);
}
int token;
while ((token = yylex()) != 0) {
printf("-> %d\n", token);
}
}