Skip to content

Conversation

@dogancanbakir
Copy link
Member

@dogancanbakir dogancanbakir commented Jan 6, 2026

Summary

Adds support for storing httpx scan results directly to databases (MongoDB, PostgreSQL, MySQL) with both CLI flags and YAML config file options.

  • MongoDB: Stores results as BSON documents with indexes on timestamp, url, host, status_code, and tech
  • PostgreSQL: Creates table with 62 columns matching Result struct fields, uses JSONB for complex types and TEXT[] for arrays
  • MySQL: Similar schema to PostgreSQL with JSON type for complex fields and appropriate TEXT types

Features

  • Batched writes for performance (configurable batch size, default 100)
  • Auto-flush with configurable interval (default 1 minute)
  • Individual columns for each Result field (not a single JSON blob)
  • Environment variable support: HTTPX_DB_CONNECTION_STRING
  • Option to omit raw request/response data (-rdbor)

New CLI Flags (OUTPUT group)

Flag Description
-rdb, -result-db Enable database storage
-rdbc, -result-db-config Path to YAML config file
-rdbt, -result-db-type Database type (mongodb, postgres, mysql)
-rdbcs, -result-db-conn Connection string
-rdbn, -result-db-name Database name (default: httpx)
-rdbtb, -result-db-table Table/collection name (default: results)
-rdbbs, -result-db-batch-size Batch size (default: 100)
-rdbor, -result-db-omit-raw Omit raw request/response data

Usage Examples

# MongoDB
httpx -l hosts.txt -rdb -rdbt mongodb -rdbcs "mongodb://localhost:27017"

# PostgreSQL
httpx -l hosts.txt -rdb -rdbt postgres -rdbcs "postgres://user:pass@localhost:5432/httpx"

# MySQL
httpx -l hosts.txt -rdb -rdbt mysql -rdbcs "user:pass@tcp(localhost:3306)/httpx"

# Using config file
httpx -l hosts.txt -rdb -rdbc /path/to/db-config.yaml

# Using environment variable
export HTTPX_DB_CONNECTION_STRING="mongodb://localhost:27017"
httpx -l hosts.txt -rdb -rdbt mongodb

Example Config File (db-config.yaml)

type: mongodb
connection-string: "mongodb://localhost:27017"
database-name: httpx
table-name: results
batch-size: 100
flush-interval: 1m
omit-raw: false

Testing with Docker

1. Start Database Containers

# MongoDB
docker run -d --name httpx-mongo -p 27017:27017 mongo:latest

# PostgreSQL
docker run -d --name httpx-postgres -p 5432:5432 -e POSTGRES_PASSWORD=password -e POSTGRES_DB=httpx postgres:latest

# MySQL
docker run -d --name httpx-mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=password -e MYSQL_DATABASE=httpx mysql:latest

2. Test httpx with Each Database

# Test MongoDB
echo "https://example.com" | go run ./cmd/httpx -rdb -rdbt mongodb -rdbcs "mongodb://localhost:27017"

# Test PostgreSQL
echo "https://example.com" | go run ./cmd/httpx -rdb -rdbt postgres -rdbcs "postgres://postgres:password@localhost:5432/httpx?sslmode=disable"

# Test MySQL
echo "https://example.com" | go run ./cmd/httpx -rdb -rdbt mysql -rdbcs "root:password@tcp(localhost:3306)/httpx"

3. Verify Results in Database

# MongoDB - check results
docker exec -it httpx-mongo mongosh --eval "db.getSiblingDB('httpx').results.find().pretty()"

# PostgreSQL - check results
docker exec -it httpx-postgres psql -U postgres -d httpx -c "SELECT url, status_code, title FROM results;"

# MySQL - check results
docker exec -it httpx-mysql mysql -uroot -ppassword -e "SELECT url, status_code, title FROM httpx.results;"

4. Cleanup

docker rm -f httpx-mongo httpx-postgres httpx-mysql

Test plan

  • Test MongoDB storage with Docker container
  • Test PostgreSQL storage with Docker container
  • Test MySQL storage with Docker container
  • Verify all Result fields are stored correctly
  • Verify batching works as expected
  • Verify graceful shutdown flushes remaining results

Closes #1973
Closes #2360
Closes #2361
Closes #2362

Summary by CodeRabbit

  • New Features

    • Optional database output for storing results (MongoDB, PostgreSQL, MySQL) with configurable type, connection, DB/table name, batch size, flush interval, omit-raw option, and graceful shutdown; runs alongside existing result handlers.
  • New Features (CLI)

    • New CLI flags and config support to enable/configure database output.
  • Documentation

    • CLI help text reformatted and clarified (spacing, alignment); protocol options and proxy/header/TLS descriptions updated; some flags hidden/renamed for clarity.

✏️ Tip: You can customize this high-level summary in your review settings.

@auto-assign auto-assign bot requested a review from dwisiswant0 January 6, 2026 09:54
@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Walkthrough

Adds pluggable database result storage (MongoDB, PostgreSQL, MySQL), CLI flags and config loading, a registry/factory, batched background writer, schema management, and runner integration to persist scan results. (49 words)

Changes

Cohort / File(s) Summary
Core DB infra & config
internal/db/db.go, internal/db/config.go, internal/db/writer.go
Adds database registry/factory, Config/Options with YAML loading, defaults and validation, and a Writer that batches results, runs a background flush loop, and provides lifecycle, logging, and Close semantics. Attention: validation/defaulting and background shutdown correctness.
MongoDB adapter
internal/db/mongodb.go
New MongoDB implementation: client connect/ping, collection selection, index creation, result→BSON conversion, and InsertMany batch writes. Attention: index definitions and BSON marshaling approach.
PostgreSQL adapter
internal/db/postgres.go
New PostgreSQL implementation: connection management, EnsureSchema (table/indexes), transactional batch inserts, JSONB/array serialization, and error handling. Attention: parameter binding, pq.Array usage, and schema SQL.
MySQL adapter
internal/db/mysql.go
New MySQL implementation: connection, schema creation, transactional multi-row inserts, JSON marshaling for nested fields, prepared statements, and error handling. Attention: multi-row INSERT construction and identifier quoting.
Runner & CLI integration
runner/options.go, cmd/httpx/httpx.go
Adds runner Options fields and CLI flags for DB config (--result-db, --result-db-config, --result-db-type, --result-db-conn, etc.); setupDatabaseOutput wires writer into OnResult/OnClose. Attention: callback wrapping and error handling during startup.
Deps & docs
go.mod, README.md
Adds DB drivers and related deps; README/help text reformatting and flag description updates. Attention: dependency additions and help text alignment only.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Runner as httpx Runner
    participant Writer as DB Writer
    participant DB as Database (Mongo/Postgres/MySQL)

    CLI->>Runner: start with --result-db / --result-db-config
    Runner->>Writer: NewWriter(ctx, cfg) and register callbacks

    loop per scan result
        Runner->>Writer: OnResult(result) (callback)
        Writer->>Writer: enqueue result (buffered channel)
    end

    alt flush by batch size or interval
        Writer->>DB: InsertBatch(batch)
        DB-->>Writer: success / error
        Writer->>Writer: update counters/log
    end

    CLI->>Runner: finish / interrupt
    Runner->>Writer: OnClose() -> Writer.Close()
    Writer->>Writer: cancel, flush remaining
    Writer->>DB: final InsertBatch
    Writer->>DB: Close connection
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I hop through flags and YAML lines,
Batching results in tidy fines,
Mongo, Postgres, MySQL in queue,
Flush and store each precious view,
I twitch my nose—your data grew.

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main feature added: database output support for storing scan results.
Linked Issues check ✅ Passed The pull request fully implements all linked issue requirements: adds -rdb CLI flag [#1973], MongoDB backend with indexing [#2360], PostgreSQL backend with JSONB schema [#2361], and MySQL backend with JSON schema [#2362].
Out of Scope Changes check ✅ Passed All changes directly support the database output feature. README updates document the new flags, dependencies support database connectivity, and internal/db package implements the database layer.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI Agents
In @internal/db/mysql.go:
- Around line 47-152: EnsureSchema currently injects m.cfg.TableName directly
into the CREATE TABLE SQL; validate or safely quote the identifier before
interpolation to prevent SQL injection (e.g., enforce a safe regex like
^[A-Za-z0-9_]+$ on m.cfg.TableName and/or wrap the identifier in backticks after
rejecting/escaping any backticks). Update mysqlDatabase.EnsureSchema to use the
validated/quoted table name variable when building the schema string (replace
direct use of m.cfg.TableName). Also add a matching index on url (e.g., add
`INDEX idx_url (url)`) to the CREATE TABLE statement for parity with the
Postgres schema.

In @internal/db/postgres.go:
- Around line 48-161: The SQL builds in the schema variable interpolate
p.cfg.TableName directly (and into index names like idx_%s_timestamp and later
INSERT queries), which risks SQL injection; update the code to quote identifiers
using pq.QuoteIdentifier(p.cfg.TableName) (or an equivalent safe
identifier-quoting helper) everywhere the table name or derived index names are
inserted (including schema, index names, and any INSERT/SELECT/DROP/ALTER
statements) so that the table and index identifiers are safely escaped before
passing into fmt.Sprintf.

In @internal/db/writer.go:
- Around line 130-137: The Close method races with sends from the callback
returned by GetWriterCallback: after calling w.cancel() but before close(w.data)
a resumed callback can see both w.ctx.Done() and w.data ready and attempt to
send into a closed channel, causing a panic; update GetWriterCallback (the
sending goroutine) to check w.closed (or use an atomic/boolean helper like
w.isClosed()) before selecting/send and bail out if closed, or instead change
Close to close w.data before/atomically with cancellation (or use a mutex to
synchronize) so that the callback never attempts a send after closure; modify
either Writer.Close, GetWriterCallback, or both to perform this
closed-check/synchronization using the existing w.closed, w.cancel, w.data, and
w.ctx symbols.
🧹 Nitpick comments (12)
internal/db/config.go (1)

39-49: Consider stricter validation for numeric fields.

The current validation only checks database type and connection string. BatchSize and FlushInterval are validated indirectly through ApplyDefaults (which replaces non-positive values with defaults). While this works, it means explicitly setting negative values gets silently corrected rather than rejected.

Consider adding explicit validation to reject negative values as invalid input, which would make configuration errors more apparent to users.

🔎 Example stricter validation
 func (c *Config) Validate() error {
 	if !c.Type.IsValid() {
 		return fmt.Errorf("invalid database type: %s (supported: %v)", c.Type, SupportedDatabases())
 	}
 
 	if c.ConnectionString == "" {
 		return fmt.Errorf("connection string is required")
 	}
+
+	if c.BatchSize < 0 {
+		return fmt.Errorf("batch size cannot be negative")
+	}
+
+	if c.FlushInterval < 0 {
+		return fmt.Errorf("flush interval cannot be negative")
+	}
 
 	return nil
 }
internal/db/mongodb.go (1)

115-127: Consider direct BSON marshaling for efficiency.

The JSON round-trip (marshal to JSON, then unmarshal to BSON) is simple and reliable but adds overhead. For scanning workloads with moderate throughput, this approach is acceptable and prioritizes maintainability.

If performance profiling reveals this as a bottleneck, consider using BSON tags on the runner.Result struct and marshaling directly to BSON.

internal/db/writer.go (2)

67-71: Consider also omitting RawHeaders when omitRaw is enabled.

The omitRaw option clears Raw, Request, and ResponseBody, but RawHeaders (which contains the raw HTTP response headers) is not cleared. If the intent is to reduce storage of raw HTTP data, this field should likely be included.

🔎 Proposed fix
 		if w.omitRaw {
 			r.Raw = ""
 			r.Request = ""
 			r.ResponseBody = ""
+			r.RawHeaders = ""
 		}

96-101: Failed batch data is discarded without retry.

When InsertBatch fails, the error is logged but the batch is cleared at line 103, permanently losing those results. For reliability, consider implementing a retry mechanism or dead-letter queue for failed batches.

internal/db/db.go (2)

22-29: IsValid() and SupportedDatabases() may diverge if new types are added.

Both functions enumerate the valid database types independently. If a new database type is added, both must be updated. Consider deriving one from the other to ensure consistency.

🔎 Proposed fix
+var supportedDatabases = []DatabaseType{MongoDB, PostgreSQL, MySQL}
+
 func (d DatabaseType) IsValid() bool {
-	switch d {
-	case MongoDB, PostgreSQL, MySQL:
-		return true
-	default:
-		return false
+	for _, db := range supportedDatabases {
+		if d == db {
+			return true
+		}
 	}
+	return false
 }

 func SupportedDatabases() []DatabaseType {
-	return []DatabaseType{MongoDB, PostgreSQL, MySQL}
+	return supportedDatabases
 }

Also applies to: 68-70


43-49: Registry map lacks synchronization, but safe given init-only writes.

The registry map is written to only during init() calls, which are guaranteed to run sequentially before main(). This is safe, but adding a comment documenting this assumption would help future maintainers.

runner/options.go (1)

680-801: No validation added for database options.

ValidateOptions() doesn't validate the new database-related flags. Consider adding validation to ensure:

  • If -rdb is enabled, either -rdbc or (-rdbt + -rdbcs) must be provided
  • Database type is one of the supported values
internal/db/postgres.go (2)

218-227: JSON marshaling errors are silently ignored.

All json.Marshal calls ignore the error return value. While marshaling struct types rarely fails, nil pointer dereferences or custom MarshalJSON implementations could fail. Consider at minimum logging errors or using a helper that returns []byte("null") on failure.

🔎 Proposed helper pattern
func mustMarshalJSON(v interface{}) []byte {
    data, err := json.Marshal(v)
    if err != nil {
        return []byte("null")
    }
    return data
}

171-252: Consider using COPY for better batch insert performance.

For PostgreSQL, using COPY protocol (via pq.CopyIn) is significantly faster for bulk inserts compared to prepared statement execution in a loop. This could improve performance for large batch sizes.

internal/db/mysql.go (3)

56-60: VARCHAR length constraints may cause data truncation.

Several VARCHAR columns have fixed limits that may truncate real-world data:

  • host VARCHAR(255) - most hostnames fit, but some edge cases exist
  • method VARCHAR(10) - custom methods could exceed this
  • port VARCHAR(10) - reasonable
  • scheme VARCHAR(10) - reasonable

Consider using TEXT for method or increasing the limit to handle custom HTTP methods.


209-227: JSON marshaling errors silently ignored.

Same issue as PostgreSQL implementation - all json.Marshal errors are discarded. Apply the same fix pattern as suggested for postgres.go.


162-252: Consider multi-value INSERT for better MySQL batch performance.

MySQL supports multi-value INSERT syntax (INSERT INTO t VALUES (...), (...), ...) which is faster than executing prepared statements in a loop. This could significantly improve batch insert performance.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc2c7a2 and e64dfcb.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • README.md
  • cmd/httpx/httpx.go
  • go.mod
  • internal/db/config.go
  • internal/db/db.go
  • internal/db/mongodb.go
  • internal/db/mysql.go
  • internal/db/postgres.go
  • internal/db/writer.go
  • runner/options.go
🧰 Additional context used
🧬 Code graph analysis (5)
internal/db/writer.go (4)
internal/db/db.go (2)
  • Database (31-41)
  • NewDatabase (51-66)
internal/db/config.go (1)
  • Config (23-37)
runner/types.go (1)
  • Result (35-105)
runner/options.go (1)
  • OnResultCallback (51-51)
internal/db/postgres.go (5)
internal/db/db.go (4)
  • Register (47-49)
  • PostgreSQL (14-14)
  • Database (31-41)
  • DatabaseType (10-10)
internal/db/config.go (1)
  • Config (23-37)
runner/types.go (2)
  • Result (35-105)
  • Trace (107-123)
common/httpx/csp.go (1)
  • CSPData (25-28)
common/httpx/proto.go (1)
  • HTTP2 (8-8)
internal/db/db.go (2)
runner/types.go (1)
  • Result (35-105)
internal/db/config.go (1)
  • Config (23-37)
internal/db/mysql.go (5)
internal/db/db.go (4)
  • Register (47-49)
  • MySQL (15-15)
  • Database (31-41)
  • DatabaseType (10-10)
internal/db/config.go (1)
  • Config (23-37)
runner/types.go (2)
  • Result (35-105)
  • Trace (107-123)
common/httpx/csp.go (1)
  • CSPData (25-28)
common/httpx/proto.go (1)
  • HTTP2 (8-8)
internal/db/config.go (1)
internal/db/db.go (2)
  • DatabaseType (10-10)
  • SupportedDatabases (68-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Functional Test (macOS-latest)
  • GitHub Check: Functional Test (windows-latest)
  • GitHub Check: Functional Test (ubuntu-latest)
  • GitHub Check: Lint Test
  • GitHub Check: Analyze (go)
  • GitHub Check: release-test
🔇 Additional comments (19)
README.md (2)

100-166: LGTM - Documentation alignment improvements.

The cosmetic adjustments to spacing and alignment improve readability and consistency across the CLI help sections.


206-213: LGTM - Database output flags properly documented.

The new database-related flags (-rdb, -rdbc, -rdbt, etc.) are clearly documented with appropriate defaults and descriptions, aligning well with the PR objectives to add database persistence support.

internal/db/config.go (3)

11-21: LGTM - Well-chosen defaults.

The default values are sensible: batch size of 100 balances throughput and memory, 1-minute flush interval prevents data loss, and the naming conventions are clear.


69-91: LGTM - Correct configuration loading flow.

The function properly handles YAML parsing, environment variable fallback for connection strings, default application, and validation in the correct order. Error wrapping provides good context.


104-125: LGTM - Consistent CLI-to-config conversion.

The ToConfig method correctly mirrors the file-based loading logic, ensuring both configuration paths (file vs. CLI flags) go through the same validation and defaulting pipeline.

cmd/httpx/httpx.go (2)

68-69: LGTM - Correct integration placement.

The database output setup is called at the appropriate point in the initialization sequence, after asset upload setup and before runner creation, allowing it to configure the result callbacks properly.


151-208: LGTM - Correct callback chaining and error handling.

The implementation properly:

  • Loads configuration from file or CLI options with appropriate error handling
  • Chains callbacks to preserve existing handlers (asset upload, etc.)
  • Orders operations correctly: existing OnResult callback executes first, writer closes before existing OnClose
  • Uses Fatal for setup errors, which is appropriate at initialization time
internal/db/mongodb.go (3)

15-28: LGTM - Standard factory registration pattern.

The init-based registration integrates cleanly with the database registry, and the constructor properly defers connection establishment to the explicit Connect() call.


30-59: LGTM - Robust connection handling with appropriate timeouts.

The Connect method uses reasonable timeouts (10s for connection/selection, ping verification) and Close properly handles cleanup with a 5s timeout to prevent blocking during shutdown.


61-87: No action needed. MongoDB's CreateMany() is idempotent—re-creating indexes that already exist with identical definitions is a no-op and succeeds without error. The index definitions are appropriate for the query patterns, and the implementation will work correctly on subsequent application runs.

go.mod (1)

55-61: Dependency versions are appropriate; no critical security concerns identified.

The versions in go.mod are current and safe:

  • mysql (v1.9.3), lib/pq (v1.10.9): Latest stable releases. CVE-2025-24787 and CVE-2025-1094/4207 are in upstream components (WhoDB and PostgreSQL libpq C library respectively), not these Go drivers.
  • mongo-driver (v1.17.6): On the maintained v1.x branch; v2.4 is available but v1.x remains actively supported.
  • mapstructure/v2 (v2.4.0): Confirmed as the patched version that fixed the information-leak vulnerability (present in versions < 2.4.0).
  • gocsv and publicsuffix-go: Latest versions with no reported security vulnerabilities.
internal/db/writer.go (2)

26-59: LGTM!

The NewWriter function properly initializes the database connection, ensures schema, and starts the background goroutine with appropriate lifecycle management.


146-148: LGTM!

Simple atomic counter accessor.

internal/db/db.go (2)

31-41: LGTM!

The Database interface is well-defined with clear lifecycle methods and batch insert support.


51-66: LGTM!

Good defensive validation before factory invocation.

runner/options.go (2)

354-361: LGTM!

New database configuration fields are appropriately added to the Options struct.


498-505: Environment variable HTTPX_DB_CONNECTION_STRING is properly wired.

The flag description for -rdbcs correctly mentions the environment variable. While the flag definition doesn't use goflags.EnvVar, the environment variable is properly handled as a fallback in internal/db/config.go (lines 81 and 115): if no connection string is provided via the flag, the code reads from HTTPX_DB_CONNECTION_STRING via os.Getenv(). This is a valid design pattern where the flag takes precedence and the environment variable serves as a fallback.

Likely an incorrect or invalid review comment.

internal/db/postgres.go (1)

26-38: LGTM!

Connect implementation properly opens connection and pings to verify connectivity.

internal/db/mysql.go (1)

26-38: LGTM!

Connect implementation correctly opens and pings MySQL database.

Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, some minor suggestion plus what do you think about moving the query to sqlc for easier maintenance later (moving the query to sqlc file and generate glue code with sqlc generate)?

Add support for storing httpx scan results directly to databases with
both CLI flags and YAML config file options.

Supported databases:
- MongoDB
- PostgreSQL
- MySQL

Features:
- Batched writes for performance (configurable batch size)
- Auto-flush with configurable interval
- Individual columns for each Result field (not JSON blob)
- Support for environment variable HTTPX_DB_CONNECTION_STRING
- Option to omit raw request/response data (-rdbor)

New CLI flags under OUTPUT group:
- -rdb, -result-db: enable database storage
- -rdbc, -result-db-config: path to YAML config file
- -rdbt, -result-db-type: database type (mongodb, postgres, mysql)
- -rdbcs, -result-db-conn: connection string
- -rdbn, -result-db-name: database name (default: httpx)
- -rdbtb, -result-db-table: table/collection name (default: results)
- -rdbbs, -result-db-batch-size: batch size (default: 100)
- -rdbor, -result-db-omit-raw: omit raw data

Closes #1973
Closes #2360
Closes #2361
Closes #2362
- Fix SQL injection in postgres.go using pq.QuoteIdentifier for table/index names
- Fix SQL injection in mysql.go using custom quoteIdentifier function
- Fix race condition in writer.go by checking closed state before channel send
- Add missing idx_url index in mysql.go for parity with PostgreSQL schema
- Include RawHeaders in omitRaw check for consistency
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @internal/db/config.go:
- Around line 104-121: The Options.ToConfig() implementation doesn't fall back
to the environment variable for ConnectionString like LoadConfigFromFile() does;
update ToConfig() (method Options.ToConfig) to read
os.Getenv(EnvConnectionString) when o.ConnectionString is empty before
constructing cfg so CLI-based config honors the HTTPX_DB_CONNECTION_STRING env
var; ensure you reference EnvConnectionString and set cfg.ConnectionString
accordingly prior to cfg.ApplyDefaults()/cfg.Validate().

In @internal/db/postgres.go:
- Around line 225-235: The json.Marshal calls inside the loop over results (for
_, r := range results) currently discard errors (e.g., hashJSON, _ :=
json.Marshal(r.Hashes), asnJSON, _ := json.Marshal(r.ASN), etc.); change each
marshal to capture the error, check it, and either return the error from the
enclosing function or log a warning and continue (depending on desired
behavior). Specifically, replace each pattern like XJSON, _ := json.Marshal(...)
with XJSON, err := json.Marshal(...); if err != nil { return fmt.Errorf("marshal
%s for record %v: %w", "<fieldName>", err) } (or call logger.Warnf and continue
for non-fatal fields), ensuring every field (Hashes, ASN, TLSData, CSPData,
ResponseHeaders, Extracts, Chain, KnowledgeBase, LinkRequest, Trace) has its
marshal error handled rather than discarded.
🧹 Nitpick comments (3)
internal/db/config.go (1)

93-102: Consider adding FlushInterval to Options struct for CLI parity with YAML config.

The Config struct includes FlushInterval (line 34) which can be set via YAML config, but the Options struct lacks this field. CLI users cannot configure the flush interval and are limited to the default (1 minute). This may be intentional to keep the CLI simple, but worth confirming.

cmd/httpx/httpx.go (1)

184-187: Consider propagating a cancellable context for database writer initialization.

Using context.Background() means there's no way to cancel the database connection/schema setup if it hangs. For a CLI tool this is generally acceptable, but for long-running or interrupt-sensitive scenarios, passing a context derived from the signal handler could improve responsiveness.

internal/db/mongodb.go (1)

115-127: JSON round-trip conversion is functional but has overhead.

The resultToDocument function marshals to JSON then unmarshals to BSON. This works correctly and ensures field names match JSON tags, but incurs double serialization overhead. For high-volume scenarios, consider using bson.Marshal directly with appropriate BSON struct tags, or accept this trade-off for code simplicity and consistency with the JSON output format.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f010d6 and c1c1fd0.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • README.md
  • cmd/httpx/httpx.go
  • go.mod
  • internal/db/config.go
  • internal/db/db.go
  • internal/db/mongodb.go
  • internal/db/mysql.go
  • internal/db/postgres.go
  • internal/db/writer.go
  • runner/options.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/db/db.go
  • runner/options.go
  • internal/db/mysql.go
  • go.mod
🧰 Additional context used
🧬 Code graph analysis (3)
internal/db/mongodb.go (3)
internal/db/db.go (4)
  • Register (47-49)
  • MongoDB (13-13)
  • Database (31-41)
  • DatabaseType (10-10)
internal/db/config.go (1)
  • Config (23-37)
runner/types.go (1)
  • Result (35-108)
cmd/httpx/httpx.go (4)
internal/db/config.go (3)
  • Options (93-102)
  • Config (23-37)
  • LoadConfigFromFile (69-91)
runner/options.go (1)
  • Options (177-373)
internal/db/writer.go (2)
  • Writer (13-20)
  • NewWriter (22-54)
runner/types.go (1)
  • Result (35-108)
internal/db/config.go (1)
internal/db/db.go (2)
  • DatabaseType (10-10)
  • SupportedDatabases (68-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Functional Test (ubuntu-latest)
  • GitHub Check: Functional Test (macOS-latest)
  • GitHub Check: Functional Test (windows-latest)
  • GitHub Check: release-test
  • GitHub Check: Analyze (go)
🔇 Additional comments (16)
README.md (1)

206-213: LGTM! New database output flags are well-documented.

The new CLI flags for database output (-rdb, -rdbc, -rdbt, -rdbcs, -rdbn, -rdbtb, -rdbbs, -rdbor) are clearly documented with appropriate descriptions and default values that align with the implementation in internal/db/config.go.

internal/db/config.go (2)

39-49: LGTM!

Validation correctly ensures a valid database type and non-empty connection string. The error messages are helpful, including the list of supported databases.


51-67: LGTM!

ApplyDefaults() correctly sets sensible defaults for missing configuration values and handles the <= 0 check for numeric fields.

internal/db/postgres.go (3)

47-176: LGTM! Schema design is comprehensive and well-structured.

The table schema correctly maps all Result struct fields with appropriate PostgreSQL types:

  • JSONB for complex nested objects (hashes, ASN, TLS, etc.)
  • TEXT[] for string arrays
  • BYTEA for screenshot bytes
  • Appropriate indexes including a GIN index on tech for efficient array queries

The use of pq.QuoteIdentifier for table and index names properly prevents SQL injection.


183-189: Transaction rollback in defer is correctly implemented.

The defer tx.Rollback() pattern is correct - after a successful tx.Commit(), the rollback becomes a no-op. This ensures cleanup on any error path.


26-38: LGTM! Connection handling is correct.

Properly opens connection, verifies with PingContext, and stores the handle. Error messages are descriptive.

cmd/httpx/httpx.go (3)

189-196: LGTM! Callback chaining correctly preserves existing handlers.

The implementation properly captures and invokes the existing OnResult callback before the database writer callback, ensuring both PDCP upload and database output can work together.


198-205: LGTM! Close order ensures database flush before upstream close.

Calling writer.Close() before existingClose() is the correct order - this ensures all buffered results are flushed to the database before any upstream resources (like PDCP upload) are closed.


161-182: LGTM! Configuration loading with file vs CLI fallback is well-structured.

The logic correctly prioritizes config file when specified, otherwise builds configuration from CLI options. Both paths properly handle errors with fatal logging.

internal/db/mongodb.go (3)

61-87: LGTM! Index strategy is appropriate for common query patterns.

The indexes on timestamp (descending for recent-first queries), url, host, and status_code cover typical filtering needs. The sparse index on tech is a good choice since not all results will have technology detection data.


30-50: LGTM! Connection setup with appropriate timeouts.

The 10-second connect and server selection timeouts are reasonable defaults. The ping verification ensures the connection is actually usable before returning success.


89-109: LGTM! Batch insertion with proper error propagation.

Unlike the PostgreSQL implementation, errors from resultToDocument are properly checked and propagated. The early return on empty results avoids unnecessary database calls.

internal/db/writer.go (4)

22-54: LGTM!

Clean initialization sequence with proper resource cleanup on EnsureSchema failure. The batcher configuration using BatchSize and FlushInterval from config is appropriate.


93-106: LGTM!

Proper idempotent close using CompareAndSwap, correct shutdown sequence with Stop followed by WaitDone, and appropriate final stats logging.


108-110: LGTM!

Thread-safe accessor for the results counter.


72-91: No issue here — batcher.Append is safe to call after Stop.

The batcher package from github.com/projectdiscovery/utils is designed for concurrent use and implements Append and Stop with well-defined thread-safe semantics. Calling Append after Stop has been called is safe and does not result in undefined behavior. The callback can safely execute in parallel with Close() without risk.

Likely an incorrect or invalid review comment.

Comment on lines +104 to +121
func (o *Options) ToConfig() (*Config, error) {
cfg := &Config{
Type: DatabaseType(o.Type),
ConnectionString: o.ConnectionString,
DatabaseName: o.DatabaseName,
TableName: o.TableName,
BatchSize: o.BatchSize,
OmitRaw: o.OmitRaw,
}

cfg.ApplyDefaults()

if err := cfg.Validate(); err != nil {
return nil, err
}

return cfg, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing environment variable fallback for ConnectionString in ToConfig().

LoadConfigFromFile() (lines 80-82) falls back to os.Getenv(EnvConnectionString) when the connection string is empty, but ToConfig() does not apply the same fallback. This inconsistency means users relying on CLI options cannot use the HTTPX_DB_CONNECTION_STRING environment variable.

Proposed fix
 func (o *Options) ToConfig() (*Config, error) {
+	connStr := o.ConnectionString
+	if connStr == "" {
+		connStr = os.Getenv(EnvConnectionString)
+	}
+
 	cfg := &Config{
 		Type:             DatabaseType(o.Type),
-		ConnectionString: o.ConnectionString,
+		ConnectionString: connStr,
 		DatabaseName:     o.DatabaseName,
 		TableName:        o.TableName,
 		BatchSize:        o.BatchSize,
 		OmitRaw:          o.OmitRaw,
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (o *Options) ToConfig() (*Config, error) {
cfg := &Config{
Type: DatabaseType(o.Type),
ConnectionString: o.ConnectionString,
DatabaseName: o.DatabaseName,
TableName: o.TableName,
BatchSize: o.BatchSize,
OmitRaw: o.OmitRaw,
}
cfg.ApplyDefaults()
if err := cfg.Validate(); err != nil {
return nil, err
}
return cfg, nil
}
func (o *Options) ToConfig() (*Config, error) {
connStr := o.ConnectionString
if connStr == "" {
connStr = os.Getenv(EnvConnectionString)
}
cfg := &Config{
Type: DatabaseType(o.Type),
ConnectionString: connStr,
DatabaseName: o.DatabaseName,
TableName: o.TableName,
BatchSize: o.BatchSize,
OmitRaw: o.OmitRaw,
}
cfg.ApplyDefaults()
if err := cfg.Validate(); err != nil {
return nil, err
}
return cfg, nil
}
🤖 Prompt for AI Agents
In @internal/db/config.go around lines 104 - 121, The Options.ToConfig()
implementation doesn't fall back to the environment variable for
ConnectionString like LoadConfigFromFile() does; update ToConfig() (method
Options.ToConfig) to read os.Getenv(EnvConnectionString) when o.ConnectionString
is empty before constructing cfg so CLI-based config honors the
HTTPX_DB_CONNECTION_STRING env var; ensure you reference EnvConnectionString and
set cfg.ConnectionString accordingly prior to
cfg.ApplyDefaults()/cfg.Validate().

Comment on lines +56 to +70
func (w *Writer) flush(batch []runner.Result) {
if len(batch) == 0 {
return
}

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

if err := w.db.InsertBatch(ctx, batch); err != nil {
gologger.Error().Msgf("Failed to insert batch to database: %v", err)
} else {
w.counter.Add(int64(len(batch)))
gologger.Verbose().Msgf("Inserted %d results to database (total: %d)", len(batch), w.counter.Load())
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Failed batch insertions result in data loss.

When InsertBatch fails, the batch is logged but discarded without retry or fallback. For a scan results persistence feature, this could silently lose data during transient database issues.

Consider:

  1. Adding a configurable retry mechanism with exponential backoff.
  2. Falling back to a local file buffer for failed batches.
  3. At minimum, tracking failed result counts in Stats().

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 205-214: The README claims support for http2 and http3 but
ValidateOptions in runner/options.go (function ValidateOptions) currently only
allows "unknown" and "http11"; update the validation to accept "http2" and
"http3" (and mark experimental if needed) by adding those values to the allowed
protocol set in ValidateOptions, and ensure any flag help text/usage strings
(the protocol flag handling) reflect the new valid values; alternatively, if you
prefer to not enable them yet, remove http2 and http3 from README/protocol flag
help text so docs match the current ValidateOptions behavior.

Comment on lines +205 to +214
-pr, -protocol string protocol to use (unknown, http11, http2, http3)
-fepp, -filter-error-page-path string path to store filtered error pages (default "filtered_error_page.json")
-lof, -list-output-fields list available output field names for filtering
-eof, -exclude-output-fields string[] exclude specified output fields from results
-rdb, -result-db store results in database
-rdbc, -result-db-config string path to database config file
-rdbt, -result-db-type string database type (mongodb, postgres, mysql)
-rdbcs, -result-db-conn string database connection string (env: HTTPX_DB_CONNECTION_STRING)
-rdbn, -result-db-name string database name (default "httpx")
-rdbtb, -result-db-table string table/collection name (default "results")
-rdbbs, -result-db-batch-size int batch size for database inserts (default 100)
-rdbor, -result-db-omit-raw omit raw request/response data from database
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for protocol-related validation and constants
rg -n "ValidateOptions|protocol" -S -g'*.go' | head -50

Repository: projectdiscovery/httpx

Length of output: 4320


🏁 Script executed:

# Search for specific protocol values
rg -n "http3|http2|HTTP2|HTTP3|UNKNOWN|HTTP11|http11" -S -g'*.go' | head -100

Repository: projectdiscovery/httpx

Length of output: 2626


🏁 Script executed:

# Look for protocol type/enum definitions
rg -n "type.*Protocol|const.*Protocol" -S -g'*.go'

Repository: projectdiscovery/httpx

Length of output: 48


🏁 Script executed:

# Check README lines 205-214
sed -n '205,214p' README.md

Repository: projectdiscovery/httpx

Length of output: 927


🏁 Script executed:

# Get full ValidateOptions function to see complete validation logic
rg -A 50 "func (options \*Options) ValidateOptions" runner/options.go | head -70

Repository: projectdiscovery/httpx

Length of output: 48


🏁 Script executed:

# Get the ValidateOptions function more explicitly
sed -n '694,850p' runner/options.go

Repository: projectdiscovery/httpx

Length of output: 6007


Protocol validation does not support http2 and http3 despite documentation claiming they are available. ValidateOptions at runner/options.go:816 only accepts unknown and http11, but README.md lists all four protocols. The flag help text also incorrectly lists http2 and http3 as [experimental] options. Either enable these protocols in the validation logic or remove them from documentation.

🤖 Prompt for AI Agents
In `@README.md` around lines 205 - 214, The README claims support for http2 and
http3 but ValidateOptions in runner/options.go (function ValidateOptions)
currently only allows "unknown" and "http11"; update the validation to accept
"http2" and "http3" (and mark experimental if needed) by adding those values to
the allowed protocol set in ValidateOptions, and ensure any flag help text/usage
strings (the protocol flag handling) reflect the new valid values;
alternatively, if you prefer to not enable them yet, remove http2 and http3 from
README/protocol flag help text so docs match the current ValidateOptions
behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants