Skip to content

Host discovery is slow on medium/large target ranges #16

@bandrel

Description

@bandrel

Observed on branch feature/scanning-rework.

Internal host discovery (_dual_internal_host_discovery in spoonmap.py) dominates
wall time before any port scan begins. On /18 it takes ~12 min; on /16 it takes
~35 min; on /14 it takes ~45 min. Breaking down where the time goes:

Bottlenecks

1. Two masscan sweeps run sequentially

_calibrate_internal_source_port (spoonmap.py:337-412) runs sweep #1 with
-g 88 (Kerberos source port), then sweep #2 with no source port. They run
back-to-back, not in parallel. Doubles the masscan portion of discovery time
on every range size.

2. nmap -sn pass runs serially after both masscan sweeps

For target sets ≤ HOST_DISCOVERY_NMAP_THRESHOLD = 65_536,
_nmap_host_discovery (spoonmap.py:518-568) runs after both masscan sweeps
complete. nmap uses kernel-bound sockets and could run concurrently with
masscan without RST/cookie interference, but currently doesn't.

3. Fixed --wait 3 on every masscan invocation

Each of the two calibration sweeps appends --wait 3 (spoonmap.py:298, 376).
Material for small ranges where the scan itself completes in seconds.

4. Calibration sweep is a full second pass over the entire range

The no-source-port sweep re-scans every target on every port to compare host
counts against the -g 88 sweep. There is no caching or partial reuse between
the two sweeps even though they're scanning the same address space.

Estimated wall time

Range Hosts Masscan ×2 + nmap -sn Total
/24 256 ~12 s ~20 s ~30 s
/22 1,024 ~26 s ~60 s ~1.5 min
/20 4,096 ~88 s ~2–3 min ~4 min
/18 16,384 ~5.5 min ~5–10 min ~12 min
/16 65,536 ~22 min ~10–15 min ~35 min
/14 262,144 ~44 min (5-port trim) skipped ~45 min

Note on parallelizing masscan sweeps

Two masscan instances on the same NIC do not share TCP state (raw sockets
for TX, libpcap for RX), but they do require kernel RST suppression for
both source-port ranges simultaneously — -g 88 plus masscan's randomized
default range. Without that, the parallel version silently loses results to
kernel RSTs originating from the other instance's traffic. Not a blocker,
but a sharp edge worth noting before pursuing parallelization as a fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions