I saw menu structures of <ul> nodes pop up alongside the desired text body in libextract's results. This could be easily mitigated through defining a minimum threshold of node content length. Something like 15 characters possibly. Will do a PR if this is a wanted improvement. Not sure if the project is still alive.