-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: add database output support for storing scan results #2363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
WalkthroughAdds pluggable database result storage (MongoDB, PostgreSQL, MySQL), CLI flags and config loading, a registry/factory, batched background writer, schema management, and runner integration to persist scan results. (49 words) Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant Runner as httpx Runner
participant Writer as DB Writer
participant DB as Database (Mongo/Postgres/MySQL)
CLI->>Runner: start with --result-db / --result-db-config
Runner->>Writer: NewWriter(ctx, cfg) and register callbacks
loop per scan result
Runner->>Writer: OnResult(result) (callback)
Writer->>Writer: enqueue result (buffered channel)
end
alt flush by batch size or interval
Writer->>DB: InsertBatch(batch)
DB-->>Writer: success / error
Writer->>Writer: update counters/log
end
CLI->>Runner: finish / interrupt
Runner->>Writer: OnClose() -> Writer.Close()
Writer->>Writer: cancel, flush remaining
Writer->>DB: final InsertBatch
Writer->>DB: Close connection
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI Agents
In @internal/db/mysql.go:
- Around line 47-152: EnsureSchema currently injects m.cfg.TableName directly
into the CREATE TABLE SQL; validate or safely quote the identifier before
interpolation to prevent SQL injection (e.g., enforce a safe regex like
^[A-Za-z0-9_]+$ on m.cfg.TableName and/or wrap the identifier in backticks after
rejecting/escaping any backticks). Update mysqlDatabase.EnsureSchema to use the
validated/quoted table name variable when building the schema string (replace
direct use of m.cfg.TableName). Also add a matching index on url (e.g., add
`INDEX idx_url (url)`) to the CREATE TABLE statement for parity with the
Postgres schema.
In @internal/db/postgres.go:
- Around line 48-161: The SQL builds in the schema variable interpolate
p.cfg.TableName directly (and into index names like idx_%s_timestamp and later
INSERT queries), which risks SQL injection; update the code to quote identifiers
using pq.QuoteIdentifier(p.cfg.TableName) (or an equivalent safe
identifier-quoting helper) everywhere the table name or derived index names are
inserted (including schema, index names, and any INSERT/SELECT/DROP/ALTER
statements) so that the table and index identifiers are safely escaped before
passing into fmt.Sprintf.
In @internal/db/writer.go:
- Around line 130-137: The Close method races with sends from the callback
returned by GetWriterCallback: after calling w.cancel() but before close(w.data)
a resumed callback can see both w.ctx.Done() and w.data ready and attempt to
send into a closed channel, causing a panic; update GetWriterCallback (the
sending goroutine) to check w.closed (or use an atomic/boolean helper like
w.isClosed()) before selecting/send and bail out if closed, or instead change
Close to close w.data before/atomically with cancellation (or use a mutex to
synchronize) so that the callback never attempts a send after closure; modify
either Writer.Close, GetWriterCallback, or both to perform this
closed-check/synchronization using the existing w.closed, w.cancel, w.data, and
w.ctx symbols.
🧹 Nitpick comments (12)
internal/db/config.go (1)
39-49: Consider stricter validation for numeric fields.The current validation only checks database type and connection string. BatchSize and FlushInterval are validated indirectly through ApplyDefaults (which replaces non-positive values with defaults). While this works, it means explicitly setting negative values gets silently corrected rather than rejected.
Consider adding explicit validation to reject negative values as invalid input, which would make configuration errors more apparent to users.
🔎 Example stricter validation
func (c *Config) Validate() error { if !c.Type.IsValid() { return fmt.Errorf("invalid database type: %s (supported: %v)", c.Type, SupportedDatabases()) } if c.ConnectionString == "" { return fmt.Errorf("connection string is required") } + + if c.BatchSize < 0 { + return fmt.Errorf("batch size cannot be negative") + } + + if c.FlushInterval < 0 { + return fmt.Errorf("flush interval cannot be negative") + } return nil }internal/db/mongodb.go (1)
115-127: Consider direct BSON marshaling for efficiency.The JSON round-trip (marshal to JSON, then unmarshal to BSON) is simple and reliable but adds overhead. For scanning workloads with moderate throughput, this approach is acceptable and prioritizes maintainability.
If performance profiling reveals this as a bottleneck, consider using BSON tags on the runner.Result struct and marshaling directly to BSON.
internal/db/writer.go (2)
67-71: Consider also omittingRawHeaderswhenomitRawis enabled.The
omitRawoption clearsRaw,Request, andResponseBody, butRawHeaders(which contains the raw HTTP response headers) is not cleared. If the intent is to reduce storage of raw HTTP data, this field should likely be included.🔎 Proposed fix
if w.omitRaw { r.Raw = "" r.Request = "" r.ResponseBody = "" + r.RawHeaders = "" }
96-101: Failed batch data is discarded without retry.When
InsertBatchfails, the error is logged but the batch is cleared at line 103, permanently losing those results. For reliability, consider implementing a retry mechanism or dead-letter queue for failed batches.internal/db/db.go (2)
22-29:IsValid()andSupportedDatabases()may diverge if new types are added.Both functions enumerate the valid database types independently. If a new database type is added, both must be updated. Consider deriving one from the other to ensure consistency.
🔎 Proposed fix
+var supportedDatabases = []DatabaseType{MongoDB, PostgreSQL, MySQL} + func (d DatabaseType) IsValid() bool { - switch d { - case MongoDB, PostgreSQL, MySQL: - return true - default: - return false + for _, db := range supportedDatabases { + if d == db { + return true + } } + return false } func SupportedDatabases() []DatabaseType { - return []DatabaseType{MongoDB, PostgreSQL, MySQL} + return supportedDatabases }Also applies to: 68-70
43-49: Registry map lacks synchronization, but safe given init-only writes.The
registrymap is written to only duringinit()calls, which are guaranteed to run sequentially beforemain(). This is safe, but adding a comment documenting this assumption would help future maintainers.runner/options.go (1)
680-801: No validation added for database options.
ValidateOptions()doesn't validate the new database-related flags. Consider adding validation to ensure:
- If
-rdbis enabled, either-rdbcor (-rdbt+-rdbcs) must be provided- Database type is one of the supported values
internal/db/postgres.go (2)
218-227: JSON marshaling errors are silently ignored.All
json.Marshalcalls ignore the error return value. While marshaling struct types rarely fails, nil pointer dereferences or customMarshalJSONimplementations could fail. Consider at minimum logging errors or using a helper that returns[]byte("null")on failure.🔎 Proposed helper pattern
func mustMarshalJSON(v interface{}) []byte { data, err := json.Marshal(v) if err != nil { return []byte("null") } return data }
171-252: Consider using COPY for better batch insert performance.For PostgreSQL, using
COPYprotocol (viapq.CopyIn) is significantly faster for bulk inserts compared to prepared statement execution in a loop. This could improve performance for large batch sizes.internal/db/mysql.go (3)
56-60: VARCHAR length constraints may cause data truncation.Several VARCHAR columns have fixed limits that may truncate real-world data:
host VARCHAR(255)- most hostnames fit, but some edge cases existmethod VARCHAR(10)- custom methods could exceed thisport VARCHAR(10)- reasonablescheme VARCHAR(10)- reasonableConsider using
TEXTformethodor increasing the limit to handle custom HTTP methods.
209-227: JSON marshaling errors silently ignored.Same issue as PostgreSQL implementation - all
json.Marshalerrors are discarded. Apply the same fix pattern as suggested for postgres.go.
162-252: Consider multi-value INSERT for better MySQL batch performance.MySQL supports multi-value INSERT syntax (
INSERT INTO t VALUES (...), (...), ...) which is faster than executing prepared statements in a loop. This could significantly improve batch insert performance.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (10)
README.mdcmd/httpx/httpx.gogo.modinternal/db/config.gointernal/db/db.gointernal/db/mongodb.gointernal/db/mysql.gointernal/db/postgres.gointernal/db/writer.gorunner/options.go
🧰 Additional context used
🧬 Code graph analysis (5)
internal/db/writer.go (4)
internal/db/db.go (2)
Database(31-41)NewDatabase(51-66)internal/db/config.go (1)
Config(23-37)runner/types.go (1)
Result(35-105)runner/options.go (1)
OnResultCallback(51-51)
internal/db/postgres.go (5)
internal/db/db.go (4)
Register(47-49)PostgreSQL(14-14)Database(31-41)DatabaseType(10-10)internal/db/config.go (1)
Config(23-37)runner/types.go (2)
Result(35-105)Trace(107-123)common/httpx/csp.go (1)
CSPData(25-28)common/httpx/proto.go (1)
HTTP2(8-8)
internal/db/db.go (2)
runner/types.go (1)
Result(35-105)internal/db/config.go (1)
Config(23-37)
internal/db/mysql.go (5)
internal/db/db.go (4)
Register(47-49)MySQL(15-15)Database(31-41)DatabaseType(10-10)internal/db/config.go (1)
Config(23-37)runner/types.go (2)
Result(35-105)Trace(107-123)common/httpx/csp.go (1)
CSPData(25-28)common/httpx/proto.go (1)
HTTP2(8-8)
internal/db/config.go (1)
internal/db/db.go (2)
DatabaseType(10-10)SupportedDatabases(68-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Functional Test (macOS-latest)
- GitHub Check: Functional Test (windows-latest)
- GitHub Check: Functional Test (ubuntu-latest)
- GitHub Check: Lint Test
- GitHub Check: Analyze (go)
- GitHub Check: release-test
🔇 Additional comments (19)
README.md (2)
100-166: LGTM - Documentation alignment improvements.The cosmetic adjustments to spacing and alignment improve readability and consistency across the CLI help sections.
206-213: LGTM - Database output flags properly documented.The new database-related flags (
-rdb,-rdbc,-rdbt, etc.) are clearly documented with appropriate defaults and descriptions, aligning well with the PR objectives to add database persistence support.internal/db/config.go (3)
11-21: LGTM - Well-chosen defaults.The default values are sensible: batch size of 100 balances throughput and memory, 1-minute flush interval prevents data loss, and the naming conventions are clear.
69-91: LGTM - Correct configuration loading flow.The function properly handles YAML parsing, environment variable fallback for connection strings, default application, and validation in the correct order. Error wrapping provides good context.
104-125: LGTM - Consistent CLI-to-config conversion.The ToConfig method correctly mirrors the file-based loading logic, ensuring both configuration paths (file vs. CLI flags) go through the same validation and defaulting pipeline.
cmd/httpx/httpx.go (2)
68-69: LGTM - Correct integration placement.The database output setup is called at the appropriate point in the initialization sequence, after asset upload setup and before runner creation, allowing it to configure the result callbacks properly.
151-208: LGTM - Correct callback chaining and error handling.The implementation properly:
- Loads configuration from file or CLI options with appropriate error handling
- Chains callbacks to preserve existing handlers (asset upload, etc.)
- Orders operations correctly: existing OnResult callback executes first, writer closes before existing OnClose
- Uses Fatal for setup errors, which is appropriate at initialization time
internal/db/mongodb.go (3)
15-28: LGTM - Standard factory registration pattern.The init-based registration integrates cleanly with the database registry, and the constructor properly defers connection establishment to the explicit Connect() call.
30-59: LGTM - Robust connection handling with appropriate timeouts.The Connect method uses reasonable timeouts (10s for connection/selection, ping verification) and Close properly handles cleanup with a 5s timeout to prevent blocking during shutdown.
61-87: No action needed. MongoDB'sCreateMany()is idempotent—re-creating indexes that already exist with identical definitions is a no-op and succeeds without error. The index definitions are appropriate for the query patterns, and the implementation will work correctly on subsequent application runs.go.mod (1)
55-61: Dependency versions are appropriate; no critical security concerns identified.The versions in go.mod are current and safe:
- mysql (v1.9.3), lib/pq (v1.10.9): Latest stable releases. CVE-2025-24787 and CVE-2025-1094/4207 are in upstream components (WhoDB and PostgreSQL libpq C library respectively), not these Go drivers.
- mongo-driver (v1.17.6): On the maintained v1.x branch; v2.4 is available but v1.x remains actively supported.
- mapstructure/v2 (v2.4.0): Confirmed as the patched version that fixed the information-leak vulnerability (present in versions < 2.4.0).
- gocsv and publicsuffix-go: Latest versions with no reported security vulnerabilities.
internal/db/writer.go (2)
26-59: LGTM!The
NewWriterfunction properly initializes the database connection, ensures schema, and starts the background goroutine with appropriate lifecycle management.
146-148: LGTM!Simple atomic counter accessor.
internal/db/db.go (2)
31-41: LGTM!The
Databaseinterface is well-defined with clear lifecycle methods and batch insert support.
51-66: LGTM!Good defensive validation before factory invocation.
runner/options.go (2)
354-361: LGTM!New database configuration fields are appropriately added to the Options struct.
498-505: Environment variableHTTPX_DB_CONNECTION_STRINGis properly wired.The flag description for
-rdbcscorrectly mentions the environment variable. While the flag definition doesn't usegoflags.EnvVar, the environment variable is properly handled as a fallback ininternal/db/config.go(lines 81 and 115): if no connection string is provided via the flag, the code reads fromHTTPX_DB_CONNECTION_STRINGviaos.Getenv(). This is a valid design pattern where the flag takes precedence and the environment variable serves as a fallback.Likely an incorrect or invalid review comment.
internal/db/postgres.go (1)
26-38: LGTM!Connect implementation properly opens connection and pings to verify connectivity.
internal/db/mysql.go (1)
26-38: LGTM!Connect implementation correctly opens and pings MySQL database.
Mzack9999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, some minor suggestion plus what do you think about moving the query to sqlc for easier maintenance later (moving the query to sqlc file and generate glue code with sqlc generate)?
Add support for storing httpx scan results directly to databases with both CLI flags and YAML config file options. Supported databases: - MongoDB - PostgreSQL - MySQL Features: - Batched writes for performance (configurable batch size) - Auto-flush with configurable interval - Individual columns for each Result field (not JSON blob) - Support for environment variable HTTPX_DB_CONNECTION_STRING - Option to omit raw request/response data (-rdbor) New CLI flags under OUTPUT group: - -rdb, -result-db: enable database storage - -rdbc, -result-db-config: path to YAML config file - -rdbt, -result-db-type: database type (mongodb, postgres, mysql) - -rdbcs, -result-db-conn: connection string - -rdbn, -result-db-name: database name (default: httpx) - -rdbtb, -result-db-table: table/collection name (default: results) - -rdbbs, -result-db-batch-size: batch size (default: 100) - -rdbor, -result-db-omit-raw: omit raw data Closes #1973 Closes #2360 Closes #2361 Closes #2362
- Fix SQL injection in postgres.go using pq.QuoteIdentifier for table/index names - Fix SQL injection in mysql.go using custom quoteIdentifier function - Fix race condition in writer.go by checking closed state before channel send - Add missing idx_url index in mysql.go for parity with PostgreSQL schema - Include RawHeaders in omitRaw check for consistency
4f010d6 to
c1c1fd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @internal/db/config.go:
- Around line 104-121: The Options.ToConfig() implementation doesn't fall back
to the environment variable for ConnectionString like LoadConfigFromFile() does;
update ToConfig() (method Options.ToConfig) to read
os.Getenv(EnvConnectionString) when o.ConnectionString is empty before
constructing cfg so CLI-based config honors the HTTPX_DB_CONNECTION_STRING env
var; ensure you reference EnvConnectionString and set cfg.ConnectionString
accordingly prior to cfg.ApplyDefaults()/cfg.Validate().
In @internal/db/postgres.go:
- Around line 225-235: The json.Marshal calls inside the loop over results (for
_, r := range results) currently discard errors (e.g., hashJSON, _ :=
json.Marshal(r.Hashes), asnJSON, _ := json.Marshal(r.ASN), etc.); change each
marshal to capture the error, check it, and either return the error from the
enclosing function or log a warning and continue (depending on desired
behavior). Specifically, replace each pattern like XJSON, _ := json.Marshal(...)
with XJSON, err := json.Marshal(...); if err != nil { return fmt.Errorf("marshal
%s for record %v: %w", "<fieldName>", err) } (or call logger.Warnf and continue
for non-fatal fields), ensuring every field (Hashes, ASN, TLSData, CSPData,
ResponseHeaders, Extracts, Chain, KnowledgeBase, LinkRequest, Trace) has its
marshal error handled rather than discarded.
🧹 Nitpick comments (3)
internal/db/config.go (1)
93-102: Consider addingFlushIntervaltoOptionsstruct for CLI parity with YAML config.The
Configstruct includesFlushInterval(line 34) which can be set via YAML config, but theOptionsstruct lacks this field. CLI users cannot configure the flush interval and are limited to the default (1 minute). This may be intentional to keep the CLI simple, but worth confirming.cmd/httpx/httpx.go (1)
184-187: Consider propagating a cancellable context for database writer initialization.Using
context.Background()means there's no way to cancel the database connection/schema setup if it hangs. For a CLI tool this is generally acceptable, but for long-running or interrupt-sensitive scenarios, passing a context derived from the signal handler could improve responsiveness.internal/db/mongodb.go (1)
115-127: JSON round-trip conversion is functional but has overhead.The
resultToDocumentfunction marshals to JSON then unmarshals to BSON. This works correctly and ensures field names match JSON tags, but incurs double serialization overhead. For high-volume scenarios, consider usingbson.Marshaldirectly with appropriate BSON struct tags, or accept this trade-off for code simplicity and consistency with the JSON output format.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (10)
README.mdcmd/httpx/httpx.gogo.modinternal/db/config.gointernal/db/db.gointernal/db/mongodb.gointernal/db/mysql.gointernal/db/postgres.gointernal/db/writer.gorunner/options.go
🚧 Files skipped from review as they are similar to previous changes (4)
- internal/db/db.go
- runner/options.go
- internal/db/mysql.go
- go.mod
🧰 Additional context used
🧬 Code graph analysis (3)
internal/db/mongodb.go (3)
internal/db/db.go (4)
Register(47-49)MongoDB(13-13)Database(31-41)DatabaseType(10-10)internal/db/config.go (1)
Config(23-37)runner/types.go (1)
Result(35-108)
cmd/httpx/httpx.go (4)
internal/db/config.go (3)
Options(93-102)Config(23-37)LoadConfigFromFile(69-91)runner/options.go (1)
Options(177-373)internal/db/writer.go (2)
Writer(13-20)NewWriter(22-54)runner/types.go (1)
Result(35-108)
internal/db/config.go (1)
internal/db/db.go (2)
DatabaseType(10-10)SupportedDatabases(68-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Functional Test (ubuntu-latest)
- GitHub Check: Functional Test (macOS-latest)
- GitHub Check: Functional Test (windows-latest)
- GitHub Check: release-test
- GitHub Check: Analyze (go)
🔇 Additional comments (16)
README.md (1)
206-213: LGTM! New database output flags are well-documented.The new CLI flags for database output (
-rdb,-rdbc,-rdbt,-rdbcs,-rdbn,-rdbtb,-rdbbs,-rdbor) are clearly documented with appropriate descriptions and default values that align with the implementation ininternal/db/config.go.internal/db/config.go (2)
39-49: LGTM!Validation correctly ensures a valid database type and non-empty connection string. The error messages are helpful, including the list of supported databases.
51-67: LGTM!
ApplyDefaults()correctly sets sensible defaults for missing configuration values and handles the<= 0check for numeric fields.internal/db/postgres.go (3)
47-176: LGTM! Schema design is comprehensive and well-structured.The table schema correctly maps all
Resultstruct fields with appropriate PostgreSQL types:
- JSONB for complex nested objects (hashes, ASN, TLS, etc.)
- TEXT[] for string arrays
- BYTEA for screenshot bytes
- Appropriate indexes including a GIN index on
techfor efficient array queriesThe use of
pq.QuoteIdentifierfor table and index names properly prevents SQL injection.
183-189: Transaction rollback in defer is correctly implemented.The
defer tx.Rollback()pattern is correct - after a successfultx.Commit(), the rollback becomes a no-op. This ensures cleanup on any error path.
26-38: LGTM! Connection handling is correct.Properly opens connection, verifies with
PingContext, and stores the handle. Error messages are descriptive.cmd/httpx/httpx.go (3)
189-196: LGTM! Callback chaining correctly preserves existing handlers.The implementation properly captures and invokes the existing
OnResultcallback before the database writer callback, ensuring both PDCP upload and database output can work together.
198-205: LGTM! Close order ensures database flush before upstream close.Calling
writer.Close()beforeexistingClose()is the correct order - this ensures all buffered results are flushed to the database before any upstream resources (like PDCP upload) are closed.
161-182: LGTM! Configuration loading with file vs CLI fallback is well-structured.The logic correctly prioritizes config file when specified, otherwise builds configuration from CLI options. Both paths properly handle errors with fatal logging.
internal/db/mongodb.go (3)
61-87: LGTM! Index strategy is appropriate for common query patterns.The indexes on
timestamp(descending for recent-first queries),url,host, andstatus_codecover typical filtering needs. The sparse index ontechis a good choice since not all results will have technology detection data.
30-50: LGTM! Connection setup with appropriate timeouts.The 10-second connect and server selection timeouts are reasonable defaults. The ping verification ensures the connection is actually usable before returning success.
89-109: LGTM! Batch insertion with proper error propagation.Unlike the PostgreSQL implementation, errors from
resultToDocumentare properly checked and propagated. The early return on empty results avoids unnecessary database calls.internal/db/writer.go (4)
22-54: LGTM!Clean initialization sequence with proper resource cleanup on
EnsureSchemafailure. The batcher configuration usingBatchSizeandFlushIntervalfrom config is appropriate.
93-106: LGTM!Proper idempotent close using
CompareAndSwap, correct shutdown sequence withStopfollowed byWaitDone, and appropriate final stats logging.
108-110: LGTM!Thread-safe accessor for the results counter.
72-91: No issue here — batcher.Append is safe to call after Stop.The
batcherpackage fromgithub.com/projectdiscovery/utilsis designed for concurrent use and implementsAppendandStopwith well-defined thread-safe semantics. CallingAppendafterStophas been called is safe and does not result in undefined behavior. The callback can safely execute in parallel withClose()without risk.Likely an incorrect or invalid review comment.
| func (o *Options) ToConfig() (*Config, error) { | ||
| cfg := &Config{ | ||
| Type: DatabaseType(o.Type), | ||
| ConnectionString: o.ConnectionString, | ||
| DatabaseName: o.DatabaseName, | ||
| TableName: o.TableName, | ||
| BatchSize: o.BatchSize, | ||
| OmitRaw: o.OmitRaw, | ||
| } | ||
|
|
||
| cfg.ApplyDefaults() | ||
|
|
||
| if err := cfg.Validate(); err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| return cfg, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing environment variable fallback for ConnectionString in ToConfig().
LoadConfigFromFile() (lines 80-82) falls back to os.Getenv(EnvConnectionString) when the connection string is empty, but ToConfig() does not apply the same fallback. This inconsistency means users relying on CLI options cannot use the HTTPX_DB_CONNECTION_STRING environment variable.
Proposed fix
func (o *Options) ToConfig() (*Config, error) {
+ connStr := o.ConnectionString
+ if connStr == "" {
+ connStr = os.Getenv(EnvConnectionString)
+ }
+
cfg := &Config{
Type: DatabaseType(o.Type),
- ConnectionString: o.ConnectionString,
+ ConnectionString: connStr,
DatabaseName: o.DatabaseName,
TableName: o.TableName,
BatchSize: o.BatchSize,
OmitRaw: o.OmitRaw,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (o *Options) ToConfig() (*Config, error) { | |
| cfg := &Config{ | |
| Type: DatabaseType(o.Type), | |
| ConnectionString: o.ConnectionString, | |
| DatabaseName: o.DatabaseName, | |
| TableName: o.TableName, | |
| BatchSize: o.BatchSize, | |
| OmitRaw: o.OmitRaw, | |
| } | |
| cfg.ApplyDefaults() | |
| if err := cfg.Validate(); err != nil { | |
| return nil, err | |
| } | |
| return cfg, nil | |
| } | |
| func (o *Options) ToConfig() (*Config, error) { | |
| connStr := o.ConnectionString | |
| if connStr == "" { | |
| connStr = os.Getenv(EnvConnectionString) | |
| } | |
| cfg := &Config{ | |
| Type: DatabaseType(o.Type), | |
| ConnectionString: connStr, | |
| DatabaseName: o.DatabaseName, | |
| TableName: o.TableName, | |
| BatchSize: o.BatchSize, | |
| OmitRaw: o.OmitRaw, | |
| } | |
| cfg.ApplyDefaults() | |
| if err := cfg.Validate(); err != nil { | |
| return nil, err | |
| } | |
| return cfg, nil | |
| } |
🤖 Prompt for AI Agents
In @internal/db/config.go around lines 104 - 121, The Options.ToConfig()
implementation doesn't fall back to the environment variable for
ConnectionString like LoadConfigFromFile() does; update ToConfig() (method
Options.ToConfig) to read os.Getenv(EnvConnectionString) when o.ConnectionString
is empty before constructing cfg so CLI-based config honors the
HTTPX_DB_CONNECTION_STRING env var; ensure you reference EnvConnectionString and
set cfg.ConnectionString accordingly prior to
cfg.ApplyDefaults()/cfg.Validate().
| func (w *Writer) flush(batch []runner.Result) { | ||
| if len(batch) == 0 { | ||
| return | ||
| } | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | ||
| defer cancel() | ||
|
|
||
| if err := w.db.InsertBatch(ctx, batch); err != nil { | ||
| gologger.Error().Msgf("Failed to insert batch to database: %v", err) | ||
| } else { | ||
| w.counter.Add(int64(len(batch))) | ||
| gologger.Verbose().Msgf("Inserted %d results to database (total: %d)", len(batch), w.counter.Load()) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed batch insertions result in data loss.
When InsertBatch fails, the batch is logged but discarded without retry or fallback. For a scan results persistence feature, this could silently lose data during transient database issues.
Consider:
- Adding a configurable retry mechanism with exponential backoff.
- Falling back to a local file buffer for failed batches.
- At minimum, tracking failed result counts in
Stats().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 205-214: The README claims support for http2 and http3 but
ValidateOptions in runner/options.go (function ValidateOptions) currently only
allows "unknown" and "http11"; update the validation to accept "http2" and
"http3" (and mark experimental if needed) by adding those values to the allowed
protocol set in ValidateOptions, and ensure any flag help text/usage strings
(the protocol flag handling) reflect the new valid values; alternatively, if you
prefer to not enable them yet, remove http2 and http3 from README/protocol flag
help text so docs match the current ValidateOptions behavior.
| -pr, -protocol string protocol to use (unknown, http11, http2, http3) | ||
| -fepp, -filter-error-page-path string path to store filtered error pages (default "filtered_error_page.json") | ||
| -lof, -list-output-fields list available output field names for filtering | ||
| -eof, -exclude-output-fields string[] exclude specified output fields from results | ||
| -rdb, -result-db store results in database | ||
| -rdbc, -result-db-config string path to database config file | ||
| -rdbt, -result-db-type string database type (mongodb, postgres, mysql) | ||
| -rdbcs, -result-db-conn string database connection string (env: HTTPX_DB_CONNECTION_STRING) | ||
| -rdbn, -result-db-name string database name (default "httpx") | ||
| -rdbtb, -result-db-table string table/collection name (default "results") | ||
| -rdbbs, -result-db-batch-size int batch size for database inserts (default 100) | ||
| -rdbor, -result-db-omit-raw omit raw request/response data from database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for protocol-related validation and constants
rg -n "ValidateOptions|protocol" -S -g'*.go' | head -50Repository: projectdiscovery/httpx
Length of output: 4320
🏁 Script executed:
# Search for specific protocol values
rg -n "http3|http2|HTTP2|HTTP3|UNKNOWN|HTTP11|http11" -S -g'*.go' | head -100Repository: projectdiscovery/httpx
Length of output: 2626
🏁 Script executed:
# Look for protocol type/enum definitions
rg -n "type.*Protocol|const.*Protocol" -S -g'*.go'Repository: projectdiscovery/httpx
Length of output: 48
🏁 Script executed:
# Check README lines 205-214
sed -n '205,214p' README.mdRepository: projectdiscovery/httpx
Length of output: 927
🏁 Script executed:
# Get full ValidateOptions function to see complete validation logic
rg -A 50 "func (options \*Options) ValidateOptions" runner/options.go | head -70Repository: projectdiscovery/httpx
Length of output: 48
🏁 Script executed:
# Get the ValidateOptions function more explicitly
sed -n '694,850p' runner/options.goRepository: projectdiscovery/httpx
Length of output: 6007
Protocol validation does not support http2 and http3 despite documentation claiming they are available. ValidateOptions at runner/options.go:816 only accepts unknown and http11, but README.md lists all four protocols. The flag help text also incorrectly lists http2 and http3 as [experimental] options. Either enable these protocols in the validation logic or remove them from documentation.
🤖 Prompt for AI Agents
In `@README.md` around lines 205 - 214, The README claims support for http2 and
http3 but ValidateOptions in runner/options.go (function ValidateOptions)
currently only allows "unknown" and "http11"; update the validation to accept
"http2" and "http3" (and mark experimental if needed) by adding those values to
the allowed protocol set in ValidateOptions, and ensure any flag help text/usage
strings (the protocol flag handling) reflect the new valid values;
alternatively, if you prefer to not enable them yet, remove http2 and http3 from
README/protocol flag help text so docs match the current ValidateOptions
behavior.
Summary
Adds support for storing httpx scan results directly to databases (MongoDB, PostgreSQL, MySQL) with both CLI flags and YAML config file options.
Features
HTTPX_DB_CONNECTION_STRING-rdbor)New CLI Flags (OUTPUT group)
-rdb, -result-db-rdbc, -result-db-config-rdbt, -result-db-type-rdbcs, -result-db-conn-rdbn, -result-db-name-rdbtb, -result-db-table-rdbbs, -result-db-batch-size-rdbor, -result-db-omit-rawUsage Examples
Example Config File (db-config.yaml)
Testing with Docker
1. Start Database Containers
2. Test httpx with Each Database
3. Verify Results in Database
4. Cleanup
Test plan
Closes #1973
Closes #2360
Closes #2361
Closes #2362
Summary by CodeRabbit
New Features
New Features (CLI)
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.