Skip to content

Fix dune build#34

Open
mgree wants to merge 388 commits intomasterfrom
fix-dune-build
Open

Fix dune build#34
mgree wants to merge 388 commits intomasterfrom
fix-dune-build

Conversation

@mgree
Copy link
Copy Markdown
Collaborator

@mgree mgree commented Apr 29, 2026

Trying to fix the dune build... and recover from some old squash merges that make rebasing history hard.

herbertx and others added 30 commits June 8, 2024 11:43
use tee(2) to peek at pipes in order to avoid reading one byte at
a time.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove the meta detection in expandmeta and rely on the detection
in expmeta instead.

Replace the open-coded meta detection with one based on strpbrk.
This is slightly inaccurate with bracket expressions but the
difference is minor (only affecting patterns with an unquoted
']').

Move int_pending to the end of the loop so that it is only executed
after some work has been done.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If the directory pointer is not a directory, a symlink or an unknown
entity, do not recurse into expmeta.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling pungetc upon PEOF must cause the next pgetc call to return
PEOF.  This was broken by the multi-byte pungetc patch.  Fix it by
adding the EOF logic to pgetc.

Note that pungetn will always disregard the PEOF.

Fixes: 2c92409 ("input: Allow MB_LEN_MAX calls to pungetc")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Bail out of getmbc if the first character is PEOF.

Fixes: 6c44f4e ("parser: Add support for multi-byte characters")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the rare case of a literal dollar sign to the end of the
parsesub block.  This eliminates a duplicate USTPUTC call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Elminate the first chkeofmark branch by moving the CTLVAR to the
end of the parsesub block and always doing STADJUST.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for $' quoting, including \u and \U.  The code is shared
with printf, so printf (both format and %b) will recognise the new
escape codes (except \c) too.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When leading white spaces are detected in ifsbreakup ifsspc needs
to be cleared.

Reported-by: Martijn Dekker <martijn@inlv.org>
Fixes: c0674f4 ("expand: Support multi-byte characters during field splitting")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Op 22-06-2024 om 15:25 schreef Martijn Dekker:
> memrchr(3) is non-standard, and has been ported from glibc to FreeBSD, NetSBD
> and OpenBSD, but not to macOS, at least as of 12.7.5. So we need a test for
> it. As far as I can tell, *name is a zero-terminated C string, so it should
> work to use strrchr(3) as a fallback.

Reading the code more closely, that's nonsense, because 'p' does not point to
the end of the string if metacharacters are found.

Guess the best we can do is provide a simple local fallback implementation of
memrchr(3). Patch v2 attached.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
MBCHAR should be preserved in argstr if the EXP_MBCHAR bit is
set.  This broke case statements.

Reported-by: Martijn Dekker <martijn@inlv.org>
Fixes: 6c44f4e ("parser: Add support for multi-byte characters")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The function dollarsq_escape may read past the current escape
code in order to provide enough data to the underlying escape
code processing function.  This is OK because we will call unget
to return any unused characters.  However, if this occurs at
the end of a quoted string, this may prompt the user for more
input which is wrong.

Fix this by terminating the loop whenever we see a single quote.
Even if this is an escaped single quote and thus does not indicate
the end of the whole quoted string, it's still OK because no single
escape code can continue after a single quote.

Reported-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Fixes: 776424a ("parser: Add dollar single quote")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
117067  s1 = s2  True if the strings s1 and s2 are identical; otherwise, false.
117068  s1 != s2 True if the strings s1 and s2 are not identical; otherwise, false.
117069  s1 > s2  True if s1 collates after s2 in the current locale; otherwise, false.
117070  s1 < s2  True if s1 collates before s2 in the current locale; otherwise, false.

"identical" does not mean "collate equally";
this is the difference between sort | uniq and sort -u

Fixes: 597850a ("shell: Use strcoll instead of strcmp where applicable")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For everything but the first component of a pipeline, the input
needs to be reset because it is no longer equal to that of the
parent shell.

Reported-by: arĉi <arcxi@dismail.de>
Fixes: b1864ee ("input: Use lseek on stdin when possible")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For background jobs where the stdin is redirected to /dev/null,
a reset_input may be needed in future.  For the time being there
is no reason to do this as all possible states for stdin will work
correctly with /dev/null.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
117027  pathname1 −nt pathname2
117028    True if pathname1 resolves to an existing file and pathname2 cannot be resolved, or if
117029    both resolve to existing files and pathname1 is newer than pathname2 according to
117030    their last data modification timestamps; otherwise, false.
117031  pathname1 −ot pathname2
117032    True if pathname2 resolves to an existing file and pathname1 cannot be resolved, or if
117033    both resolve to existing files and pathname1 is older than pathname2 according to
117034    their last data modification timestamps; otherwise, false.

The correct output is
  $ [ 2024 -nt 2023 ] && echo yes
  yes
  $ [ 2023 -nt 2024 ] && echo yes
  $ [ 2023 -nt ENOENT ] && echo yes
  yes
  $ [ ENOENT -nt 2024 ] && echo yes
and
  $ [ 2024 -ot 2023 ] && echo yes
  $ [ 2023 -ot 2024 ] && echo yes
  yes
  $ [ 2023 -ot ENOENT ] && echo yes
  $ [ ENOENT -ot 2024 ] && echo yes
  yes
but dash currently returned only the first yes out of both blocks.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As can be seen in the `man` page for `el_set`, using `EL_PROMPT_ESC` for
the op is the same as `EL_PROMPT`, but it allows escape characters to be
expanded in the prompt the same way they are when used with `echo` or
`printf(1)`.

As far as I know, this is not specified by POSIX, but neither is the
emacs editing mode (please correct me if I am wrong), so I think this is
a justified change to make it align with the behaviour or `echo` and
`printf(1)`.

Given that this is not specified by POSIX, there isn't much of a
precident for what the value of the start/stop character should be. From
what I have seen, 0o001 is common, so that is what I have included in
the patch, but it may not be the most fitting. Taking a look at how
ASCII defines its control characters, I believe any characters between
0o034 and 0o037 may be a more suitable choice, but this could be up for
debate.

Signed-off-by: Sebastien Peterson-Boudreau <sebastien.peterson.boudreau@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A lot of scripts (in particular, autoconf) relies on echo keeping
undefined backslash sequences intact.  Preserve this behaviour by
only interpreting the few sequences required for dollar single quote.

Repoted-by: Дилян Палаузов <dilyan.palauzov@aegee.org>
Fixes: 776424a ("parser: Add dollar single quote")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The jump table is unnecessarily large for a function that is
not performance-critical.  Move some of the cases out of the
switch statement to reduce its size.

Move the value = ch assignment to the common path.

Merge the code for '\a', '\b' and '\f'.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When our own pmatch is used, loc2 is unused in scanleft/right
when quotes is true.  However, it is still needed when quotes
is false.

Fix the scanleft/right code so that loc2 is always updated (so
it will be garbage when quotes is true) but only returned depending
on the value of quotes.

Fixes: c5bf970 ("expand: Add multi-byte support to pmatch")
Reported-by: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With C23 and LTO, we get the following warning (or error if promoted to such):
```
src/builtins.c:28:5: error: type of ‘timescmd’ does not match original declaration [-Werror=lto-type-mismatch]
   28 | int timescmd(int, char **);
      |     ^
src/bltin/times.c:15:5: note: type mismatch in parameter 1
src/bltin/times.c:15:5: note: type ‘void’ should match type ‘int’
```

Make the two consistent. This didn't show up before because pre-C23
had unprototyped functions.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Johannes Altmanninger <aclopte@gmail.com> wrote:
> I noticed another regression in c5bf970 (expand: Add multi-byte
> support to pmatch, 2024-06-02).
>
> This command now prints "abc-def" but used to print "ef".
>
>        x=abc-def
>        y="${x##*d}"
>        echo "$y"

Fix this by setting s to the correct value in scanright based
on FNMATCH_IS_ENABLED.

Fixes: c5bf970 ("expand: Add multi-byte support to pmatch")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jan Pechanec <Jan.Pechanec@oracle.com> wrote:
>
> thank you for working on dash.  I was testing it recently and it worked
> really well.
>
> However, I noticed the dash code from github does filename pattern
> matching even for code like "[ x = x ] && echo ok".  I believe the
> unquoted space after '[' should not trigger pattern matching but rather
> only to invoke the test/[ utility, as before.  It seems it works fine
> though and only doing some extra unneeded work which may not be
> immediatelly noticeable.
>
> dash installed on my Oracle Linux 9:
>
> janp:len49:~/_INST/dash$ strings /usr/bin/dash | grep dash
> dash-0.5.11.5-4.el9.x86_64.debug
> janp:len49:~/_INST/dash$ time dash -c 'i=0; while :; do : $((i=i+1)); [ $i -eq 500000 ] && break; done'
>
> real    0m0.752s
> user    0m0.748s
> sys     0m0.002s
>
> dash from github (commit b3e38ad) take
> way more time to do the same thing:
>
> janp:len49:~/_INST/dash$ time ./src/dash -c 'i=0; while :; do : $((i=i+1)); [ $i -eq 500000 ] && break; done'
>
> real    0m4.202s
> user    0m1.361s
> sys     0m2.804s
>
> For the latter, strace shows open, fstat, getdents*, and close system
> calls for each iteration and it depends on number of files in the
> current directory.  With more files, it takes more time:
>
> janp:len49:/etc$ time ~/_INST/dash/src/dash -c 'i=0; while :; do : $((i=i+1)); [ $i -eq 500000 ] && break; done'
> real    0m15.591s
> user    0m5.704s
> sys     0m9.828s
>
> If I change [ to test, the dash github version behaves as before, and
> possibly even faster:
>
> janp:len49:~/_INST/dash$ time ~/_INST/dash/src/dash -c 'i=0; while :; do : $((i=i+1)); test $i -eq 500000 && break; done'
>
> real    0m0.662s
> user    0m0.659s
> sys     0m0.002s
>
> Even bash would be faster than the current github version of dash:
>
> janp:len49:~/_INST/dash$ time bash -c 'i=0; while :; do : $((i=i+1)); [ $i -eq 500000 ] && break; done'
> real    0m1.943s
> user    0m1.939s
> sys     0m0.002s

Fix performance regression for idiomatic "[ ... ]" expression by
adding a bypass for a literal "]" in pathname expansion.

Reported-by: Jan Pechanec <Jan.Pechanec@oracle.com>
Fixes: 8d0eca2 ("expand: Rewrite expmeta meta detection")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
…g in pmatch

strpbrk() accepts two null-terminated string arguments. stop[] is char
array that is not null-terminated but is still passed as a second
argument to strpbrk. This causes buffer overread, which is detected by
AddressSanitizer.

This commit adds an explicit null-terminated to the end of the array.

Signed-off-by: Zurab Kvachadze <zurabid2016@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move the stop array closer to the strpbrk(3) call in pmatch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ensure that the EOF state is reset in reset_input as otherwise
the new stdin may be treated as empty.

Reported-by: Nathan Royce <nroycea+kernel@gmail.com>
Fixes: 69786bc ("input: Fix pungetc on PEOF")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As pointed out by Denys Vlasenko, we can avoid blocking signals on
vfork() by making the signal handler of a vfork child immediately
return. This saves a syscall.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Recent versions of VxWorks support fork() and as result can support dash.
For example,  to cross compile for IA with this patch applied,  and your VSB environment sourced (aka sysroot)

./configure --build=x86_64-pc-linux-gnu --host=x86_64-wrs-vxworks --prefix=/usr \
    CC=wr-cc CXX=wr-c++ LD=wr-ld AR=wr-ar NM=wr-nm OBJCOPY=wr-objcopy OBJDUMP=wr-objdump RANLIB=wr-ranlib READELF=wr-readelf SIZE=wr-size STRIP= wr-strip \
    ac_cv_func_faccessat=no    \
    CFLAGS="-DJOBS=0 "

make install DESTDIR=${VSB}/usr/3pp/develop

For other architectures update your <host> appropriately.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
procargs(int argc, char **argv)

argc is used in just one place:
        if (argc > 0)
                xargv++;

Trivially replaceable by if(xargv[0] != NULL), so can avoid passing
this argument.

        char **xargv;
        xargv = argv;

xargv is always equal to argv, so why having a separate variable?

        const char *xminusc;
        xminusc = minusc;

Similar situation with xminusc being equal to minusc
during the range where it is live, they diverge here:

        if (xminusc) {
                minusc = *xargv++;

but after this, xminusc is not used.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
mgree and others added 30 commits April 28, 2026 20:23
Signed-off-by: Michael Greenberg <michael.greenberg@stevens.edu>
There are now three quoting modes in pretty printing: unquoted, quoted (only escape special characters, including `"`), and heredoc (only escape special characters, excluding `"`).

We now no longer treat `!` as an escaped character, which is correct on non-interactive shells but will break interactive scripts. Longer term, we need to know when pretty-printing who is consuming our output. But right now the only real client is non-interactive, so here we are.
Directly invoking `setup.py` was causing a build failure on macOS; using `pip3` solves the problem.

Signed-off-by: Michael Greenberg <michael.greenberg@stevens.edu>
Signed-off-by: Michael Greenberg <michael@greenberg.science>
Fix up escaping of `$`; revise tests to support.
---------

Signed-off-by: Bolun Thompson <bolunthompson@ucla.edu>
Fix: Nested shell in subshell

Signed-off-by: Bolun Thompson <bolunthompson@ucla.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.