feat: add database output support for storing scan results #2363

dogancanbakir · 2026-01-06T09:54:46Z

Summary

Adds support for storing httpx scan results directly to databases (MongoDB, PostgreSQL, MySQL) with both CLI flags and YAML config file options.

MongoDB: Stores results as BSON documents with indexes on timestamp, url, host, status_code, and tech
PostgreSQL: Creates table with 62 columns matching Result struct fields, uses JSONB for complex types and TEXT[] for arrays
MySQL: Similar schema to PostgreSQL with JSON type for complex fields and appropriate TEXT types

Features

Batched writes for performance (configurable batch size, default 100)
Auto-flush with configurable interval (default 1 minute)
Individual columns for each Result field (not a single JSON blob)
Environment variable support: HTTPX_DB_CONNECTION_STRING
Option to omit raw request/response data (-rdbor)

New CLI Flags (OUTPUT group)

Flag	Description
`-rdb, -result-db`	Enable database storage
`-rdbc, -result-db-config`	Path to YAML config file
`-rdbt, -result-db-type`	Database type (mongodb, postgres, mysql)
`-rdbcs, -result-db-conn`	Connection string
`-rdbn, -result-db-name`	Database name (default: httpx)
`-rdbtb, -result-db-table`	Table/collection name (default: results)
`-rdbbs, -result-db-batch-size`	Batch size (default: 100)
`-rdbor, -result-db-omit-raw`	Omit raw request/response data

Usage Examples

# MongoDB
httpx -l hosts.txt -rdb -rdbt mongodb -rdbcs "mongodb://localhost:27017"

# PostgreSQL
httpx -l hosts.txt -rdb -rdbt postgres -rdbcs "postgres://user:pass@localhost:5432/httpx"

# MySQL
httpx -l hosts.txt -rdb -rdbt mysql -rdbcs "user:pass@tcp(localhost:3306)/httpx"

# Using config file
httpx -l hosts.txt -rdb -rdbc /path/to/db-config.yaml

# Using environment variable
export HTTPX_DB_CONNECTION_STRING="mongodb://localhost:27017"
httpx -l hosts.txt -rdb -rdbt mongodb

Example Config File (db-config.yaml)

type: mongodb
connection-string: "mongodb://localhost:27017"
database-name: httpx
table-name: results
batch-size: 100
flush-interval: 1m
omit-raw: false

Testing with Docker

1. Start Database Containers

# MongoDB
docker run -d --name httpx-mongo -p 27017:27017 mongo:latest

# PostgreSQL
docker run -d --name httpx-postgres -p 5432:5432 -e POSTGRES_PASSWORD=password -e POSTGRES_DB=httpx postgres:latest

# MySQL
docker run -d --name httpx-mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=password -e MYSQL_DATABASE=httpx mysql:latest

2. Test httpx with Each Database

# Test MongoDB
echo "https://example.com" | go run ./cmd/httpx -rdb -rdbt mongodb -rdbcs "mongodb://localhost:27017"

# Test PostgreSQL
echo "https://example.com" | go run ./cmd/httpx -rdb -rdbt postgres -rdbcs "postgres://postgres:password@localhost:5432/httpx?sslmode=disable"

# Test MySQL
echo "https://example.com" | go run ./cmd/httpx -rdb -rdbt mysql -rdbcs "root:password@tcp(localhost:3306)/httpx"

3. Verify Results in Database

# MongoDB - check results
docker exec -it httpx-mongo mongosh --eval "db.getSiblingDB('httpx').results.find().pretty()"

# PostgreSQL - check results
docker exec -it httpx-postgres psql -U postgres -d httpx -c "SELECT url, status_code, title FROM results;"

# MySQL - check results
docker exec -it httpx-mysql mysql -uroot -ppassword -e "SELECT url, status_code, title FROM httpx.results;"

4. Cleanup

docker rm -f httpx-mongo httpx-postgres httpx-mysql

Test plan

Test MongoDB storage with Docker container
Test PostgreSQL storage with Docker container
Test MySQL storage with Docker container
Verify all Result fields are stored correctly
Verify batching works as expected
Verify graceful shutdown flushes remaining results

Closes #1973
Closes #2360
Closes #2361
Closes #2362

Summary by CodeRabbit

New Features
- Optional database output for storing results (MongoDB, PostgreSQL, MySQL) with configurable type, connection, DB/table name, batch size, flush interval, omit-raw option, and graceful shutdown; runs alongside existing result handlers.
New Features (CLI)
- New CLI flags and config support to enable/configure database output.
Documentation
- CLI help text reformatted and clarified (spacing, alignment); protocol options and proxy/header/TLS descriptions updated; some flags hidden/renamed for clarity.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-06T09:54:58Z

Walkthrough

Adds pluggable database result storage (MongoDB, PostgreSQL, MySQL), CLI flags and config loading, a registry/factory, batched background writer, schema management, and runner integration to persist scan results. (49 words)

Changes

Cohort / File(s)	Summary
Core DB infra & config `internal/db/db.go`, `internal/db/config.go`, `internal/db/writer.go`	Adds database registry/factory, Config/Options with YAML loading, defaults and validation, and a Writer that batches results, runs a background flush loop, and provides lifecycle, logging, and Close semantics. Attention: validation/defaulting and background shutdown correctness.
MongoDB adapter `internal/db/mongodb.go`	New MongoDB implementation: client connect/ping, collection selection, index creation, result→BSON conversion, and InsertMany batch writes. Attention: index definitions and BSON marshaling approach.
PostgreSQL adapter `internal/db/postgres.go`	New PostgreSQL implementation: connection management, EnsureSchema (table/indexes), transactional batch inserts, JSONB/array serialization, and error handling. Attention: parameter binding, pq.Array usage, and schema SQL.
MySQL adapter `internal/db/mysql.go`	New MySQL implementation: connection, schema creation, transactional multi-row inserts, JSON marshaling for nested fields, prepared statements, and error handling. Attention: multi-row INSERT construction and identifier quoting.
Runner & CLI integration `runner/options.go`, `cmd/httpx/httpx.go`	Adds runner Options fields and CLI flags for DB config (`--result-db`, `--result-db-config`, `--result-db-type`, `--result-db-conn`, etc.); `setupDatabaseOutput` wires writer into `OnResult`/`OnClose`. Attention: callback wrapping and error handling during startup.
Deps & docs `go.mod`, `README.md`	Adds DB drivers and related deps; README/help text reformatting and flag description updates. Attention: dependency additions and help text alignment only.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Runner as httpx Runner
    participant Writer as DB Writer
    participant DB as Database (Mongo/Postgres/MySQL)

    CLI->>Runner: start with --result-db / --result-db-config
    Runner->>Writer: NewWriter(ctx, cfg) and register callbacks

    loop per scan result
        Runner->>Writer: OnResult(result) (callback)
        Writer->>Writer: enqueue result (buffered channel)
    end

    alt flush by batch size or interval
        Writer->>DB: InsertBatch(batch)
        DB-->>Writer: success / error
        Writer->>Writer: update counters/log
    end

    CLI->>Runner: finish / interrupt
    Runner->>Writer: OnClose() -> Writer.Close()
    Writer->>Writer: cancel, flush remaining
    Writer->>DB: final InsertBatch
    Writer->>DB: Close connection

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I hop through flags and YAML lines,
Batching results in tidy fines,
Mongo, Postgres, MySQL in queue,
Flush and store each precious view,
I twitch my nose—your data grew.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main feature added: database output support for storing scan results.
Linked Issues check	✅ Passed	The pull request fully implements all linked issue requirements: adds -rdb CLI flag [`#1973`], MongoDB backend with indexing [`#2360`], PostgreSQL backend with JSONB schema [`#2361`], and MySQL backend with JSON schema [`#2362`].
Out of Scope Changes check	✅ Passed	All changes directly support the database output feature. README updates document the new flags, dependencies support database connectivity, and internal/db package implements the database layer.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI Agents

In @internal/db/mysql.go:
- Around line 47-152: EnsureSchema currently injects m.cfg.TableName directly
into the CREATE TABLE SQL; validate or safely quote the identifier before
interpolation to prevent SQL injection (e.g., enforce a safe regex like
^[A-Za-z0-9_]+$ on m.cfg.TableName and/or wrap the identifier in backticks after
rejecting/escaping any backticks). Update mysqlDatabase.EnsureSchema to use the
validated/quoted table name variable when building the schema string (replace
direct use of m.cfg.TableName). Also add a matching index on url (e.g., add
`INDEX idx_url (url)`) to the CREATE TABLE statement for parity with the
Postgres schema.

In @internal/db/postgres.go:
- Around line 48-161: The SQL builds in the schema variable interpolate
p.cfg.TableName directly (and into index names like idx_%s_timestamp and later
INSERT queries), which risks SQL injection; update the code to quote identifiers
using pq.QuoteIdentifier(p.cfg.TableName) (or an equivalent safe
identifier-quoting helper) everywhere the table name or derived index names are
inserted (including schema, index names, and any INSERT/SELECT/DROP/ALTER
statements) so that the table and index identifiers are safely escaped before
passing into fmt.Sprintf.

In @internal/db/writer.go:
- Around line 130-137: The Close method races with sends from the callback
returned by GetWriterCallback: after calling w.cancel() but before close(w.data)
a resumed callback can see both w.ctx.Done() and w.data ready and attempt to
send into a closed channel, causing a panic; update GetWriterCallback (the
sending goroutine) to check w.closed (or use an atomic/boolean helper like
w.isClosed()) before selecting/send and bail out if closed, or instead change
Close to close w.data before/atomically with cancellation (or use a mutex to
synchronize) so that the callback never attempts a send after closure; modify
either Writer.Close, GetWriterCallback, or both to perform this
closed-check/synchronization using the existing w.closed, w.cancel, w.data, and
w.ctx symbols.

🧹 Nitpick comments (12)

internal/db/config.go (1)
39-49: Consider stricter validation for numeric fields.

The current validation only checks database type and connection string. BatchSize and FlushInterval are validated indirectly through ApplyDefaults (which replaces non-positive values with defaults). While this works, it means explicitly setting negative values gets silently corrected rather than rejected.

Consider adding explicit validation to reject negative values as invalid input, which would make configuration errors more apparent to users.
🔎 Example stricter validation
 func (c *Config) Validate() error {
 	if !c.Type.IsValid() {
 		return fmt.Errorf("invalid database type: %s (supported: %v)", c.Type, SupportedDatabases())
 	}
 
 	if c.ConnectionString == "" {
 		return fmt.Errorf("connection string is required")
 	}
+
+	if c.BatchSize < 0 {
+		return fmt.Errorf("batch size cannot be negative")
+	}
+
+	if c.FlushInterval < 0 {
+		return fmt.Errorf("flush interval cannot be negative")
+	}
 
 	return nil
 }
internal/db/mongodb.go (1)

115-127: Consider direct BSON marshaling for efficiency.

The JSON round-trip (marshal to JSON, then unmarshal to BSON) is simple and reliable but adds overhead. For scanning workloads with moderate throughput, this approach is acceptable and prioritizes maintainability.

If performance profiling reveals this as a bottleneck, consider using BSON tags on the runner.Result struct and marshaling directly to BSON.
internal/db/writer.go (2)
67-71: Consider also omitting RawHeaders when omitRaw is enabled.

The omitRaw option clears Raw, Request, and ResponseBody, but RawHeaders (which contains the raw HTTP response headers) is not cleared. If the intent is to reduce storage of raw HTTP data, this field should likely be included.
🔎 Proposed fix
 		if w.omitRaw {
 			r.Raw = ""
 			r.Request = ""
 			r.ResponseBody = ""
+			r.RawHeaders = ""
 		}
96-101: Failed batch data is discarded without retry.

When InsertBatch fails, the error is logged but the batch is cleared at line 103, permanently losing those results. For reliability, consider implementing a retry mechanism or dead-letter queue for failed batches.
internal/db/db.go (2)
22-29: IsValid() and SupportedDatabases() may diverge if new types are added.

Both functions enumerate the valid database types independently. If a new database type is added, both must be updated. Consider deriving one from the other to ensure consistency.
🔎 Proposed fix
+var supportedDatabases = []DatabaseType{MongoDB, PostgreSQL, MySQL}
+
 func (d DatabaseType) IsValid() bool {
-	switch d {
-	case MongoDB, PostgreSQL, MySQL:
-		return true
-	default:
-		return false
+	for _, db := range supportedDatabases {
+		if d == db {
+			return true
+		}
 	}
+	return false
 }

 func SupportedDatabases() []DatabaseType {
-	return []DatabaseType{MongoDB, PostgreSQL, MySQL}
+	return supportedDatabases
 }
Also applies to: 68-70

43-49: Registry map lacks synchronization, but safe given init-only writes.

The registry map is written to only during init() calls, which are guaranteed to run sequentially before main(). This is safe, but adding a comment documenting this assumption would help future maintainers.
runner/options.go (1)

680-801: No validation added for database options.

ValidateOptions() doesn't validate the new database-related flags. Consider adding validation to ensure:

If -rdb is enabled, either -rdbc or (-rdbt + -rdbcs) must be provided

Database type is one of the supported values
internal/db/postgres.go (2)
218-227: JSON marshaling errors are silently ignored.

All json.Marshal calls ignore the error return value. While marshaling struct types rarely fails, nil pointer dereferences or custom MarshalJSON implementations could fail. Consider at minimum logging errors or using a helper that returns []byte("null") on failure.
🔎 Proposed helper pattern
func mustMarshalJSON(v interface{}) []byte {
    data, err := json.Marshal(v)
    if err != nil {
        return []byte("null")
    }
    return data
}
171-252: Consider using COPY for better batch insert performance.

For PostgreSQL, using COPY protocol (via pq.CopyIn) is significantly faster for bulk inserts compared to prepared statement execution in a loop. This could improve performance for large batch sizes.
internal/db/mysql.go (3)

56-60: VARCHAR length constraints may cause data truncation.

Several VARCHAR columns have fixed limits that may truncate real-world data:

host VARCHAR(255) - most hostnames fit, but some edge cases exist

method VARCHAR(10) - custom methods could exceed this

port VARCHAR(10) - reasonable

scheme VARCHAR(10) - reasonable

Consider using TEXT for method or increasing the limit to handle custom HTTP methods.

209-227: JSON marshaling errors silently ignored.

Same issue as PostgreSQL implementation - all json.Marshal errors are discarded. Apply the same fix pattern as suggested for postgres.go.

162-252: Consider multi-value INSERT for better MySQL batch performance.

MySQL supports multi-value INSERT syntax (INSERT INTO t VALUES (...), (...), ...) which is faster than executing prepared statements in a loop. This could significantly improve batch insert performance.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc2c7a2 and e64dfcb.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (10)

README.md
cmd/httpx/httpx.go
go.mod
internal/db/config.go
internal/db/db.go
internal/db/mongodb.go
internal/db/mysql.go
internal/db/postgres.go
internal/db/writer.go
runner/options.go

🧰 Additional context used

🧬 Code graph analysis (5)

internal/db/writer.go (4)

internal/db/db.go (2)

Database (31-41)

NewDatabase (51-66)

internal/db/config.go (1)

Config (23-37)

runner/types.go (1)

Result (35-105)

runner/options.go (1)

OnResultCallback (51-51)

internal/db/postgres.go (5)

internal/db/db.go (4)

Register (47-49)

PostgreSQL (14-14)

Database (31-41)

DatabaseType (10-10)

internal/db/config.go (1)

Config (23-37)

runner/types.go (2)

Result (35-105)

Trace (107-123)

common/httpx/csp.go (1)

CSPData (25-28)

common/httpx/proto.go (1)

HTTP2 (8-8)

internal/db/db.go (2)

runner/types.go (1)

Result (35-105)

internal/db/config.go (1)

Config (23-37)

internal/db/mysql.go (5)

internal/db/db.go (4)

Register (47-49)

MySQL (15-15)

Database (31-41)

DatabaseType (10-10)

internal/db/config.go (1)

Config (23-37)

runner/types.go (2)

Result (35-105)

Trace (107-123)

common/httpx/csp.go (1)

CSPData (25-28)

common/httpx/proto.go (1)

HTTP2 (8-8)

internal/db/config.go (1)

internal/db/db.go (2)

DatabaseType (10-10)

SupportedDatabases (68-70)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Functional Test (macOS-latest)
GitHub Check: Functional Test (windows-latest)
GitHub Check: Functional Test (ubuntu-latest)
GitHub Check: Lint Test
GitHub Check: Analyze (go)
GitHub Check: release-test

🔇 Additional comments (19)

README.md (2)

100-166: LGTM - Documentation alignment improvements.

The cosmetic adjustments to spacing and alignment improve readability and consistency across the CLI help sections.

206-213: LGTM - Database output flags properly documented.

The new database-related flags (-rdb, -rdbc, -rdbt, etc.) are clearly documented with appropriate defaults and descriptions, aligning well with the PR objectives to add database persistence support.

internal/db/config.go (3)

11-21: LGTM - Well-chosen defaults.

The default values are sensible: batch size of 100 balances throughput and memory, 1-minute flush interval prevents data loss, and the naming conventions are clear.

69-91: LGTM - Correct configuration loading flow.

The function properly handles YAML parsing, environment variable fallback for connection strings, default application, and validation in the correct order. Error wrapping provides good context.

104-125: LGTM - Consistent CLI-to-config conversion.

The ToConfig method correctly mirrors the file-based loading logic, ensuring both configuration paths (file vs. CLI flags) go through the same validation and defaulting pipeline.

cmd/httpx/httpx.go (2)

68-69: LGTM - Correct integration placement.

The database output setup is called at the appropriate point in the initialization sequence, after asset upload setup and before runner creation, allowing it to configure the result callbacks properly.

151-208: LGTM - Correct callback chaining and error handling.

The implementation properly:

Loads configuration from file or CLI options with appropriate error handling

Chains callbacks to preserve existing handlers (asset upload, etc.)

Orders operations correctly: existing OnResult callback executes first, writer closes before existing OnClose

Uses Fatal for setup errors, which is appropriate at initialization time

internal/db/mongodb.go (3)

15-28: LGTM - Standard factory registration pattern.

The init-based registration integrates cleanly with the database registry, and the constructor properly defers connection establishment to the explicit Connect() call.

30-59: LGTM - Robust connection handling with appropriate timeouts.

The Connect method uses reasonable timeouts (10s for connection/selection, ping verification) and Close properly handles cleanup with a 5s timeout to prevent blocking during shutdown.

61-87: No action needed. MongoDB's CreateMany() is idempotent—re-creating indexes that already exist with identical definitions is a no-op and succeeds without error. The index definitions are appropriate for the query patterns, and the implementation will work correctly on subsequent application runs.

go.mod (1)

55-61: Dependency versions are appropriate; no critical security concerns identified.

The versions in go.mod are current and safe:

mysql (v1.9.3), lib/pq (v1.10.9): Latest stable releases. CVE-2025-24787 and CVE-2025-1094/4207 are in upstream components (WhoDB and PostgreSQL libpq C library respectively), not these Go drivers.

mongo-driver (v1.17.6): On the maintained v1.x branch; v2.4 is available but v1.x remains actively supported.

mapstructure/v2 (v2.4.0): Confirmed as the patched version that fixed the information-leak vulnerability (present in versions < 2.4.0).

gocsv and publicsuffix-go: Latest versions with no reported security vulnerabilities.

internal/db/writer.go (2)

26-59: LGTM!

The NewWriter function properly initializes the database connection, ensures schema, and starts the background goroutine with appropriate lifecycle management.

146-148: LGTM!

Simple atomic counter accessor.

internal/db/db.go (2)

31-41: LGTM!

The Database interface is well-defined with clear lifecycle methods and batch insert support.

51-66: LGTM!

Good defensive validation before factory invocation.

runner/options.go (2)

354-361: LGTM!

New database configuration fields are appropriately added to the Options struct.

498-505: Environment variable HTTPX_DB_CONNECTION_STRING is properly wired.

The flag description for -rdbcs correctly mentions the environment variable. While the flag definition doesn't use goflags.EnvVar, the environment variable is properly handled as a fallback in internal/db/config.go (lines 81 and 115): if no connection string is provided via the flag, the code reads from HTTPX_DB_CONNECTION_STRING via os.Getenv(). This is a valid design pattern where the flag takes precedence and the environment variable serves as a fallback.

Likely an incorrect or invalid review comment.

internal/db/postgres.go (1)

26-38: LGTM!

Connect implementation properly opens connection and pings to verify connectivity.

internal/db/mysql.go (1)

26-38: LGTM!

Connect implementation correctly opens and pings MySQL database.

internal/db/mysql.go

internal/db/postgres.go

internal/db/writer.go

Mzack9999

overall lgtm, some minor suggestion plus what do you think about moving the query to sqlc for easier maintenance later (moving the query to sqlc file and generate glue code with sqlc generate)?

internal/db/writer.go

runner/options.go

Add support for storing httpx scan results directly to databases with both CLI flags and YAML config file options. Supported databases: - MongoDB - PostgreSQL - MySQL Features: - Batched writes for performance (configurable batch size) - Auto-flush with configurable interval - Individual columns for each Result field (not JSON blob) - Support for environment variable HTTPX_DB_CONNECTION_STRING - Option to omit raw request/response data (-rdbor) New CLI flags under OUTPUT group: - -rdb, -result-db: enable database storage - -rdbc, -result-db-config: path to YAML config file - -rdbt, -result-db-type: database type (mongodb, postgres, mysql) - -rdbcs, -result-db-conn: connection string - -rdbn, -result-db-name: database name (default: httpx) - -rdbtb, -result-db-table: table/collection name (default: results) - -rdbbs, -result-db-batch-size: batch size (default: 100) - -rdbor, -result-db-omit-raw: omit raw data Closes #1973 Closes #2360 Closes #2361 Closes #2362

- Fix SQL injection in postgres.go using pq.QuoteIdentifier for table/index names - Fix SQL injection in mysql.go using custom quoteIdentifier function - Fix race condition in writer.go by checking closed state before channel send - Add missing idx_url index in mysql.go for parity with PostgreSQL schema - Include RawHeaders in omitRaw check for consistency

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In @internal/db/config.go:
- Around line 104-121: The Options.ToConfig() implementation doesn't fall back
to the environment variable for ConnectionString like LoadConfigFromFile() does;
update ToConfig() (method Options.ToConfig) to read
os.Getenv(EnvConnectionString) when o.ConnectionString is empty before
constructing cfg so CLI-based config honors the HTTPX_DB_CONNECTION_STRING env
var; ensure you reference EnvConnectionString and set cfg.ConnectionString
accordingly prior to cfg.ApplyDefaults()/cfg.Validate().

In @internal/db/postgres.go:
- Around line 225-235: The json.Marshal calls inside the loop over results (for
_, r := range results) currently discard errors (e.g., hashJSON, _ :=
json.Marshal(r.Hashes), asnJSON, _ := json.Marshal(r.ASN), etc.); change each
marshal to capture the error, check it, and either return the error from the
enclosing function or log a warning and continue (depending on desired
behavior). Specifically, replace each pattern like XJSON, _ := json.Marshal(...)
with XJSON, err := json.Marshal(...); if err != nil { return fmt.Errorf("marshal
%s for record %v: %w", "<fieldName>", err) } (or call logger.Warnf and continue
for non-fatal fields), ensuring every field (Hashes, ASN, TLSData, CSPData,
ResponseHeaders, Extracts, Chain, KnowledgeBase, LinkRequest, Trace) has its
marshal error handled rather than discarded.

🧹 Nitpick comments (3)

internal/db/config.go (1)

93-102: Consider adding FlushInterval to Options struct for CLI parity with YAML config.

The Config struct includes FlushInterval (line 34) which can be set via YAML config, but the Options struct lacks this field. CLI users cannot configure the flush interval and are limited to the default (1 minute). This may be intentional to keep the CLI simple, but worth confirming.

cmd/httpx/httpx.go (1)

184-187: Consider propagating a cancellable context for database writer initialization.

Using context.Background() means there's no way to cancel the database connection/schema setup if it hangs. For a CLI tool this is generally acceptable, but for long-running or interrupt-sensitive scenarios, passing a context derived from the signal handler could improve responsiveness.

internal/db/mongodb.go (1)

115-127: JSON round-trip conversion is functional but has overhead.

The resultToDocument function marshals to JSON then unmarshals to BSON. This works correctly and ensures field names match JSON tags, but incurs double serialization overhead. For high-volume scenarios, consider using bson.Marshal directly with appropriate BSON struct tags, or accept this trade-off for code simplicity and consistency with the JSON output format.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f010d6 and c1c1fd0.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (10)

README.md
cmd/httpx/httpx.go
go.mod
internal/db/config.go
internal/db/db.go
internal/db/mongodb.go
internal/db/mysql.go
internal/db/postgres.go
internal/db/writer.go
runner/options.go

🚧 Files skipped from review as they are similar to previous changes (4)

internal/db/db.go
runner/options.go
internal/db/mysql.go
go.mod

🧰 Additional context used

🧬 Code graph analysis (3)

internal/db/mongodb.go (3)

internal/db/db.go (4)

Register (47-49)

MongoDB (13-13)

Database (31-41)

DatabaseType (10-10)

internal/db/config.go (1)

Config (23-37)

runner/types.go (1)

Result (35-108)

cmd/httpx/httpx.go (4)

internal/db/config.go (3)

Options (93-102)

Config (23-37)

LoadConfigFromFile (69-91)

runner/options.go (1)

Options (177-373)

internal/db/writer.go (2)

Writer (13-20)

NewWriter (22-54)

runner/types.go (1)

Result (35-108)

internal/db/config.go (1)

internal/db/db.go (2)

DatabaseType (10-10)

SupportedDatabases (68-70)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Functional Test (ubuntu-latest)
GitHub Check: Functional Test (macOS-latest)
GitHub Check: Functional Test (windows-latest)
GitHub Check: release-test
GitHub Check: Analyze (go)

🔇 Additional comments (16)

README.md (1)

206-213: LGTM! New database output flags are well-documented.

The new CLI flags for database output (-rdb, -rdbc, -rdbt, -rdbcs, -rdbn, -rdbtb, -rdbbs, -rdbor) are clearly documented with appropriate descriptions and default values that align with the implementation in internal/db/config.go.

internal/db/config.go (2)

39-49: LGTM!

Validation correctly ensures a valid database type and non-empty connection string. The error messages are helpful, including the list of supported databases.

51-67: LGTM!

ApplyDefaults() correctly sets sensible defaults for missing configuration values and handles the <= 0 check for numeric fields.

internal/db/postgres.go (3)

47-176: LGTM! Schema design is comprehensive and well-structured.

The table schema correctly maps all Result struct fields with appropriate PostgreSQL types:

JSONB for complex nested objects (hashes, ASN, TLS, etc.)

TEXT[] for string arrays

BYTEA for screenshot bytes

Appropriate indexes including a GIN index on tech for efficient array queries

The use of pq.QuoteIdentifier for table and index names properly prevents SQL injection.

183-189: Transaction rollback in defer is correctly implemented.

The defer tx.Rollback() pattern is correct - after a successful tx.Commit(), the rollback becomes a no-op. This ensures cleanup on any error path.

26-38: LGTM! Connection handling is correct.

Properly opens connection, verifies with PingContext, and stores the handle. Error messages are descriptive.

cmd/httpx/httpx.go (3)

189-196: LGTM! Callback chaining correctly preserves existing handlers.

The implementation properly captures and invokes the existing OnResult callback before the database writer callback, ensuring both PDCP upload and database output can work together.

198-205: LGTM! Close order ensures database flush before upstream close.

Calling writer.Close() before existingClose() is the correct order - this ensures all buffered results are flushed to the database before any upstream resources (like PDCP upload) are closed.

161-182: LGTM! Configuration loading with file vs CLI fallback is well-structured.

The logic correctly prioritizes config file when specified, otherwise builds configuration from CLI options. Both paths properly handle errors with fatal logging.

internal/db/mongodb.go (3)

61-87: LGTM! Index strategy is appropriate for common query patterns.

The indexes on timestamp (descending for recent-first queries), url, host, and status_code cover typical filtering needs. The sparse index on tech is a good choice since not all results will have technology detection data.

30-50: LGTM! Connection setup with appropriate timeouts.

The 10-second connect and server selection timeouts are reasonable defaults. The ping verification ensures the connection is actually usable before returning success.

89-109: LGTM! Batch insertion with proper error propagation.

Unlike the PostgreSQL implementation, errors from resultToDocument are properly checked and propagated. The early return on empty results avoids unnecessary database calls.

internal/db/writer.go (4)

22-54: LGTM!

Clean initialization sequence with proper resource cleanup on EnsureSchema failure. The batcher configuration using BatchSize and FlushInterval from config is appropriate.

93-106: LGTM!

Proper idempotent close using CompareAndSwap, correct shutdown sequence with Stop followed by WaitDone, and appropriate final stats logging.

108-110: LGTM!

Thread-safe accessor for the results counter.

72-91: No issue here — batcher.Append is safe to call after Stop.

The batcher package from github.com/projectdiscovery/utils is designed for concurrent use and implements Append and Stop with well-defined thread-safe semantics. Calling Append after Stop has been called is safe and does not result in undefined behavior. The callback can safely execute in parallel with Close() without risk.

Likely an incorrect or invalid review comment.

coderabbitai · 2026-01-12T09:35:02Z

internal/db/config.go

+func (o *Options) ToConfig() (*Config, error) {
+	cfg := &Config{
+		Type:             DatabaseType(o.Type),
+		ConnectionString: o.ConnectionString,
+		DatabaseName:     o.DatabaseName,
+		TableName:        o.TableName,
+		BatchSize:        o.BatchSize,
+		OmitRaw:          o.OmitRaw,
+	}
+
+	cfg.ApplyDefaults()
+
+	if err := cfg.Validate(); err != nil {
+		return nil, err
+	}
+
+	return cfg, nil
+}


⚠️ Potential issue | 🟡 Minor

Missing environment variable fallback for ConnectionString in ToConfig().

LoadConfigFromFile() (lines 80-82) falls back to os.Getenv(EnvConnectionString) when the connection string is empty, but ToConfig() does not apply the same fallback. This inconsistency means users relying on CLI options cannot use the HTTPX_DB_CONNECTION_STRING environment variable.

Proposed fix

func (o *Options) ToConfig() (*Config, error) { + connStr := o.ConnectionString + if connStr == "" { + connStr = os.Getenv(EnvConnectionString) + } + cfg := &Config{ Type: DatabaseType(o.Type), - ConnectionString: o.ConnectionString, + ConnectionString: connStr, DatabaseName: o.DatabaseName, TableName: o.TableName, BatchSize: o.BatchSize, OmitRaw: o.OmitRaw, }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (o *Options) ToConfig() (*Config, error) {

cfg := &Config{

Type: DatabaseType(o.Type),

ConnectionString: o.ConnectionString,

DatabaseName: o.DatabaseName,

TableName: o.TableName,

BatchSize: o.BatchSize,

OmitRaw: o.OmitRaw,

}

cfg.ApplyDefaults()

if err := cfg.Validate(); err != nil {

return nil, err

}

return cfg, nil

}

func (o *Options) ToConfig() (*Config, error) {

connStr := o.ConnectionString

if connStr == "" {

connStr = os.Getenv(EnvConnectionString)

}

cfg := &Config{

Type: DatabaseType(o.Type),

ConnectionString: connStr,

DatabaseName: o.DatabaseName,

TableName: o.TableName,

BatchSize: o.BatchSize,

OmitRaw: o.OmitRaw,

}

cfg.ApplyDefaults()

if err := cfg.Validate(); err != nil {

return nil, err

}

return cfg, nil

}

🤖 Prompt for AI Agents

In @internal/db/config.go around lines 104 - 121, The Options.ToConfig() implementation doesn't fall back to the environment variable for ConnectionString like LoadConfigFromFile() does; update ToConfig() (method Options.ToConfig) to read os.Getenv(EnvConnectionString) when o.ConnectionString is empty before constructing cfg so CLI-based config honors the HTTPX_DB_CONNECTION_STRING env var; ensure you reference EnvConnectionString and set cfg.ConnectionString accordingly prior to cfg.ApplyDefaults()/cfg.Validate().

internal/db/postgres.go

coderabbitai · 2026-01-12T09:35:02Z

internal/db/writer.go

+func (w *Writer) flush(batch []runner.Result) {
+	if len(batch) == 0 {
+		return
+	}
+
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer cancel()
+
+	if err := w.db.InsertBatch(ctx, batch); err != nil {
+		gologger.Error().Msgf("Failed to insert batch to database: %v", err)
+	} else {
+		w.counter.Add(int64(len(batch)))
+		gologger.Verbose().Msgf("Inserted %d results to database (total: %d)", len(batch), w.counter.Load())
+	}
+}


⚠️ Potential issue | 🟠 Major

Failed batch insertions result in data loss.

When InsertBatch fails, the batch is logged but discarded without retry or fallback. For a scan results persistence feature, this could silently lose data during transient database issues.

Consider:

Adding a configurable retry mechanism with exponential backoff.

Falling back to a local file buffer for failed batches.

At minimum, tracking failed result counts in Stats().

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@README.md`:
- Around line 205-214: The README claims support for http2 and http3 but
ValidateOptions in runner/options.go (function ValidateOptions) currently only
allows "unknown" and "http11"; update the validation to accept "http2" and
"http3" (and mark experimental if needed) by adding those values to the allowed
protocol set in ValidateOptions, and ensure any flag help text/usage strings
(the protocol flag handling) reflect the new valid values; alternatively, if you
prefer to not enable them yet, remove http2 and http3 from README/protocol flag
help text so docs match the current ValidateOptions behavior.

coderabbitai · 2026-01-19T14:22:58Z

README.md

+   -pr, -protocol string                  protocol to use (unknown, http11, http2, http3)
   -fepp, -filter-error-page-path string  path to store filtered error pages (default "filtered_error_page.json")
-   -lof, -list-output-fields              list available output field names for filtering
-   -eof, -exclude-output-fields string[]  exclude specified output fields from results
+   -rdb, -result-db                       store results in database
+   -rdbc, -result-db-config string        path to database config file
+   -rdbt, -result-db-type string          database type (mongodb, postgres, mysql)
+   -rdbcs, -result-db-conn string         database connection string (env: HTTPX_DB_CONNECTION_STRING)
+   -rdbn, -result-db-name string          database name (default "httpx")
+   -rdbtb, -result-db-table string        table/collection name (default "results")
+   -rdbbs, -result-db-batch-size int      batch size for database inserts (default 100)
+   -rdbor, -result-db-omit-raw            omit raw request/response data from database


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for protocol-related validation and constants rg -n "ValidateOptions|protocol" -S -g'*.go' | head -50

Repository: projectdiscovery/httpx

Length of output: 4320

🏁 Script executed:

# Search for specific protocol values rg -n "http3|http2|HTTP2|HTTP3|UNKNOWN|HTTP11|http11" -S -g'*.go' | head -100

Repository: projectdiscovery/httpx

Length of output: 2626

🏁 Script executed:

# Look for protocol type/enum definitions rg -n "type.*Protocol|const.*Protocol" -S -g'*.go'

Repository: projectdiscovery/httpx

Length of output: 48

🏁 Script executed:

# Check README lines 205-214 sed -n '205,214p' README.md

Repository: projectdiscovery/httpx

Length of output: 927

🏁 Script executed:

# Get full ValidateOptions function to see complete validation logic rg -A 50 "func (options \*Options) ValidateOptions" runner/options.go | head -70

Repository: projectdiscovery/httpx

Length of output: 48

🏁 Script executed:

# Get the ValidateOptions function more explicitly sed -n '694,850p' runner/options.go

Repository: projectdiscovery/httpx

Length of output: 6007

Protocol validation does not support http2 and http3 despite documentation claiming they are available. ValidateOptions at runner/options.go:816 only accepts unknown and http11, but README.md lists all four protocols. The flag help text also incorrectly lists http2 and http3 as [experimental] options. Either enable these protocols in the validation logic or remove them from documentation.

🤖 Prompt for AI Agents

In `@README.md` around lines 205 - 214, The README claims support for http2 and http3 but ValidateOptions in runner/options.go (function ValidateOptions) currently only allows "unknown" and "http11"; update the validation to accept "http2" and "http3" (and mark experimental if needed) by adding those values to the allowed protocol set in ValidateOptions, and ensure any flag help text/usage strings (the protocol flag handling) reflect the new valid values; alternatively, if you prefer to not enable them yet, remove http2 and http3 from README/protocol flag help text so docs match the current ValidateOptions behavior.

auto-assign bot requested a review from dwisiswant0 January 6, 2026 09:54

dogancanbakir requested review from Mzack9999 and knakul853 January 6, 2026 09:55

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

internal/db/mysql.go Outdated Show resolved Hide resolved

internal/db/postgres.go Show resolved Hide resolved

internal/db/writer.go Outdated Show resolved Hide resolved

Mzack9999 requested changes Jan 8, 2026

View reviewed changes

internal/db/writer.go Outdated Show resolved Hide resolved

runner/options.go Outdated Show resolved Hide resolved

knakul853 approved these changes Jan 9, 2026

View reviewed changes

dogancanbakir added 3 commits January 12, 2026 12:25

refactor: use batcher utility and StringVarEnv

c1c1fd0

dogancanbakir force-pushed the feature/database-output branch from 4f010d6 to c1c1fd0 Compare January 12, 2026 09:29

dogancanbakir requested a review from Mzack9999 January 12, 2026 09:29

coderabbitai bot reviewed Jan 12, 2026

View reviewed changes

Merge branch 'dev' into feature/database-output

96ac054

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add database output support for storing scan results #2363

feat: add database output support for storing scan results #2363

Uh oh!

dogancanbakir commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mzack9999 left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 12, 2026

Uh oh!

Uh oh!

coderabbitai bot Jan 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add database output support for storing scan results #2363

Are you sure you want to change the base?

feat: add database output support for storing scan results #2363

Uh oh!

Conversation

dogancanbakir commented Jan 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

New CLI Flags (OUTPUT group)

Usage Examples

Example Config File (db-config.yaml)

Testing with Docker

1. Start Database Containers

2. Test httpx with Each Database

3. Verify Results in Database

4. Cleanup

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mzack9999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dogancanbakir commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 6, 2026 •

edited

Loading