Skip to content

Conversation

@daviesrob
Copy link
Member

Fixes various issues in the kmemmem() interface and its
kstrstr() and kstrnstr() wrappers:

  • Calling kmemmem(), kstrstr() or kstrnstr() with a zero-length pattern could lead to a zero-length allocation in the ksBM_prep() function and, if that returns a non-NULL suff pointer, an invalid write to suff[-1]. Thanks to Harrison Green for reporting this.

    Fixed by catching zero-length patterns and returning the correct result - a pointer to the start of the input.

    As an additional optimisation, patterns of length 1 are now passed to memchr() as it's likely to be more efficient in that case.

    Finally if the pattern is longer that the input, NULL is returned early as it can never match. This is also useful for some of the other changes, as it ensures subtracting the two lengths will never give a negative result.

  • The kmemmem() interface uses int for the sizes of the pattern and input buffers, but the kstrstr() wrapper passed in the output of strlen() which could overflow.

    The implementation is renamed and altered to use size_t for the length of the str buffer. kmemmem() is now a simple wrapper around it that handles the possibility of n being negative. The length of the pat buffer is left as an int as otherwise the type and size of the prep array would need to be changed. This would be wasteful as the pattern is very likely to be short.

  • It was difficult to tell the difference between the algorithm failing because it couldn't allocate memory, or the pattern simply not being present (both returned NULL).

    This is fixed by providing Karp-Rabin as a backup algorithm. It's fairly simple and doesn't need to allocate any memory so should always work. It's also used to cover the unlikely case of the kstrstr() pattern being longer than INT_MAX.

  • kstrnstr() did not check for a NUL in the first n bytes of its input. This could lead to it finding false matches beyond the strict end of its input, or possibly reads of uninitialised memory.

    Fixed by checking for a NUL in the first n bytes and adjusting n if one is found. It also returns NULL if the pattern is longer than the revised n which both avoids unnecessary work and ensures the pattern length <= INT_MAX.

test/test_kstring.c is updated to add some simple kmemmem, kstrstr, and kstrnstr tests. It's also made to include kstring.c directly so that it can unit test non-exported internal functions. This allows some tests for the Karp-Rabin implementation to be added.

Fixes various issues in the kmemmem() interface and its
kstrstr() and kstrnstr() wrappers:

 * Calling kmemmem(), kstrstr() or kstrnstr() with a zero-length
   pattern could lead to a zero-length allocation in the
   ksBM_prep() function and, if that returns a non-NULL `suff`
   pointer, an invalid write to suff[-1].  Thanks to
   Harrison Green for reporting this.

   Fixed by catching zero-length patterns and returning the
   correct result - a pointer to the start of the input.

   As an additional optimisation, patterns of length 1 are
   now passed to memchr() as it's likely to be more efficient
   in that case.

   Finally if the pattern is longer that the input, NULL is
   returned early as it can never match.  This is also useful
   for some of the other changes, as it ensures subtracting
   the two lengths will never give a negative result.

 * The kmemmem() interface uses `int` for the sizes of the
   pattern and input buffers, but the kstrstr() wrapper
   passed in the output of strlen() which could overflow.

   The implementation is renamed and altered to use `size_t`
   for the length of the `str` buffer.  kmemmem() is now a
   simple wrapper around it that handles the possibility
   of `n` being negative.  The length of the `pat` buffer is
   left as an int as otherwise the type and size of the `prep`
   array would need to be changed.  This would be wasteful
   as the pattern is very likely to be short.

 * It was difficult to tell the difference between the algorithm
   failing because it couldn't allocate memory, or the pattern
   simply not being present (both returned NULL).

   This is fixed by providing Karp-Rabin as a backup algorithm.
   It's fairly simple and doesn't need to allocate any memory
   so should always work.  It's also used to cover the unlikely
   case of the kstrstr() pattern being longer than INT_MAX.

 * kstrnstr() did not check for a NUL in the first n bytes of
   its input.  This could lead to it finding false matches
   beyond the strict end of its input, or possibly reads of
   uninitialised memory.

   Fixed by checking for a NUL in the first n bytes and
   adjusting n if one is found.  It also returns NULL
   if the pattern is longer than the revised n which both
   avoids unnecessary work and ensures n <= INT_MAX.
Adjusts test_kstring to directly include kstring.c so that it
can unit test non-exported internal functions.
@whitwham whitwham merged commit 1274083 into samtools:develop Jan 19, 2026
9 checks passed
@daviesrob daviesrob deleted the kmemmem branch January 19, 2026 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants