Skip to content

Fix Qualys parser collapsing findings with same QID but different por…#14528

Open
tejas0077 wants to merge 53 commits intoDefectDojo:bugfixfrom
tejas0077:fix/qualys-port-deduplication
Open

Fix Qualys parser collapsing findings with same QID but different por…#14528
tejas0077 wants to merge 53 commits intoDefectDojo:bugfixfrom
tejas0077:fix/qualys-port-deduplication

Conversation

@tejas0077
Copy link
Copy Markdown
Contributor

@tejas0077 tejas0077 commented Mar 15, 2026

Description
When importing Qualys scan reports, findings with the same QID but
different ports were being collapsed into a single finding, causing
inaccurate vulnerability counts and loss of port-level granularity.
Root cause: the finding title only used QID and vulnerability name,
so findings with the same QID on different ports (e.g. 80, 5985, 9999)
got deduplicated into one finding.
Fix: port is now added directly to the Endpoint (and LocationData for V3) when present. Each QID+port combination gets its own endpoint on the finding. Finding titles and deduplication are completely unchanged.
Fixes #13682
Test results
Manually traced the parser logic. Port is already extracted from the
XML as temp["port_status"] and is now passed to the Endpoint object when present.
Documentation
No documentation changes needed.
Checklist

Bugfix submitted against the bugfix branch.
Meaningful PR name given.
Proper label added.

@valentijnscholten
Copy link
Copy Markdown
Member

This will need some consideration/assurance as this will completely break deduplication with existing findings. Posted a comment on #13682

@tejas0077
Copy link
Copy Markdown
Contributor Author

Hi @valentijnscholten, thank you for the feedback!

You raise a valid point. Changing the title format will break
deduplication with existing findings since the hash code is
calculated from the title.

A safer approach would be to keep the title unchanged but instead
use the port in the hash code calculation for Qualys specifically,
by adding port to the HASHCODE_FIELDS_PER_SCANNER setting:

"Qualys Scan": ["title", "severity", "vulnerability_ids", "cwe", "port"]

This way:

  • Existing finding titles remain unchanged
  • Deduplication correctly separates findings by port
  • No breaking change to existing data

Would you prefer this approach instead? I can update the PR accordingly.

Copy link
Copy Markdown
Contributor

@Maffooch Maffooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qualys is a super popular parser, and breaking deduplication would have far reaching impacts. Port is not a field that could be used for deduplication since port is on the endpoint model rather than the finding model.

Something that could be more palatable is to add the port to the endpoint, and then add multiple endpoints to the finding

@tejas0077
Copy link
Copy Markdown
Contributor Author

Thanks @Maffooch, that makes sense! I'll rework the fix to keep a single finding per QID but attach multiple endpoints with the respective ports. That way port-level granularity is preserved without touching titles or breaking deduplication. Will update the PR shortly.

@tejas0077
Copy link
Copy Markdown
Contributor Author

Hi @Maffooch, I've reworked the fix as suggested. Instead of modifying the title, the port is now added directly to the Endpoint (and LocationData for V3) when present. This way each QID+port combination gets its own endpoint on the finding, port-level granularity is preserved, and existing titles/deduplication are completely unchanged. Please take a look!

@valentijnscholten
Copy link
Copy Markdown
Member

The title is still being modified, I believe this has to be removed.

@tejas0077
Copy link
Copy Markdown
Contributor Author

Hi @valentijnscholten, thanks for catching that! I've removed the port from the finding title. The title is now back to the original format QID-XXXX | Vulnerability Name and the port is only added to the Endpoint. Please take a look!

@Maffooch
Copy link
Copy Markdown
Contributor

Please add some unit tests for this to prevent regressions, add some updates to the release notes, and then I think this should be good!

@Maffooch Maffooch added this to the 2.57.0 milestone Mar 20, 2026
@tejas0077 tejas0077 force-pushed the fix/qualys-port-deduplication branch from 1671d06 to a4e5698 Compare March 20, 2026 17:38
tejas0077 pushed a commit to tejas0077/django-DefectDojo that referenced this pull request Mar 20, 2026
…n fix

- Add test XML with same QID on ports 80, 443, 8080
- Add test verifying each port gets its own endpoint
- Add 2.57.x release notes mentioning the fix

Addresses review feedback from @Maffooch on PR DefectDojo#14528
@github-actions github-actions bot added docker settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR apiv2 docs unittests ui helm labels Mar 20, 2026
@tejas0077
Copy link
Copy Markdown
Contributor Author

Hi @Maffooch, I have addressed both your review requests:

Added unit tests -created a test XML file with the same QID (12345) on three different ports (80, 443, 8080) and added a test that verifies each port gets its own separate endpoint, the finding title remains unchanged as QID-12345 | Test Vulnerability, and all 3 ports are correctly captured.
Added release notes -created docs/content/releases/os_upgrading/2.57.md with a note about the Qualys parser fix referencing issue #13682.

Please take a look!

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@valentijnscholten
Copy link
Copy Markdown
Member

@tejas0077 Can you rebase to get rid of the conflicts?

dependabot bot and others added 5 commits March 30, 2026 14:35
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.15.2 to 0.15.4.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.15.2...0.15.4)

---
updated-dependencies:
- dependency-name: ruff
  dependency-version: 0.15.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…mpose.yml) (DefectDojo#14399)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…idator action from v2.0.0 to v2.1.0 (.github/workflows/renovate.yaml) (DefectDojo#14407)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…v1.35.2 (.github/workflows/k8s-tests.yml) (DefectDojo#14417)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…ithub/workflows/k8s-tests.yml) (DefectDojo#14418)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
renovate bot and others added 26 commits March 30, 2026 14:37
…2.12 to v (docker-compose.yml) (DefectDojo#14480)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…Dojo#14434)

* minor changes: django.conf.settings over dojo.settings

* missed bit

* auditlog not used anymore
Co-authored-by: valentijnscholten <4426050+valentijnscholten@users.noreply.github.com>
Co-authored-by: valentijnscholten <valentijnscholten@gmail.com>
* test: add IriusRisk parser sample scan files

Authored by T. Walker - DefectDojo

* feat: add IriusRisk parser stub for auto-discovery

Authored by T. Walker - DefectDojo

* test: add IriusRisk parser unit tests (failing, TDD)

Authored by T. Walker - DefectDojo

* feat: implement IriusRisk CSV threat parser

Authored by T. Walker - DefectDojo

* docs: add IriusRisk parser documentation

Authored by T. Walker - DefectDojo

* fix: address gap analysis findings for IriusRisk parser

- Update test CSVs from 12 to 14 columns (add MITRE reference, STRIDE-LM)
- Parse MITRE reference: CWE-NNN extracts to cwe field, other values to references
- Include STRIDE-LM in description when populated
- Add Critical to severity mapping
- Change static_finding to False per connector spec
- Update documentation to reflect all changes
- Add tests for CWE extraction, references, STRIDE-LM, and Critical severity

Authored by T. Walker - DefectDojo

* fix: remove computed unique_id_from_tool from IriusRisk parser

Per PR review feedback, parsers must not compute unique_id_from_tool.
Removed SHA-256 hash generation and related tests. Deduplication now
relies on DefectDojo's default hashcode algorithm. Updated docs
to reflect the change.

Authored by T. Walker - DefectDojo

* docs: remove parser line numbers from IriusRisk documentation

Per PR review feedback, removed line number references from field
mapping tables and prose sections to reduce maintenance burden
when parser code changes.

Authored by T. Walker - DefectDojo

* fix: increase title truncation threshold from 150 to 500 characters

Per PR review feedback, expanded title field to use more of the
available 511 characters. Added test data with 627-char threat
to verify truncation behavior. Updated docs accordingly.

Authored by T. Walker - DefectDojo

* feat: add hashcode deduplication config for IriusRisk parser

Register IriusRisk Threats Scan in HASHCODE_FIELDS_PER_SCANNER and
DEDUPLICATION_ALGORITHM_PER_PARSER so deduplication uses title and
component_name rather than the legacy algorithm. These stable fields
ensure reimports match existing findings even when risk levels or
countermeasure progress change between scans. Update docs to match.

Authored by T. Walker - DefectDojo

* chore: retrigger CI checks

Authored by T. Walker - DefectDojo

---------

Co-authored-by: Cody Maffucci <46459665+Maffooch@users.noreply.github.com>
…7 (.github/workflows/release-x-manual-docker-containers.yml) (DefectDojo#14451)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…r-compose.yml) (DefectDojo#13582)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* perf: batch duplicate marking in batch deduplication

Instead of saving each duplicate finding individually, collect all
modified findings during a batch deduplication run and flush them in
a single bulk_update call. Original (existing) findings are still
saved individually to preserve auto_now timestamp updates and
post_save signal behavior, but are deduplicated by id so each is
saved at most once per batch.

Reduces DB writes from O(2N) individual saves to 1 bulk_update +
O(unique originals) saves for a batch of N duplicates.

Performance test shows -23 queries on a second import with duplicates.

* perf: restrict SELECT columns for batch deduplication via only()

Add Finding.DEDUPLICATION_FIELDS — the union of all Finding fields
needed across every deduplication algorithm — and apply it as an
only() clause in get_finding_models_for_deduplication.

This avoids loading large text columns (description, mitigation,
impact, references, steps_to_reproduce, severity_justification, etc.)
when loading findings for the batch deduplication task, reducing
data transferred from the database without affecting query count.

build_candidate_scope_queryset is intentionally excluded: it is also
used for reimport matching (which accesses severity, numerical_severity
and other fields outside this set) and applying only() there would
cause deferred-field extra queries.

* perf(dedup): defer large text fields on candidate queryset

- Add Finding.DEDUPLICATION_DEFERRED_FIELDS constant listing large text
  columns (description, mitigation, impact, references, etc.) that are
  never read during deduplication or candidate matching.
- Apply .defer(*Finding.DEDUPLICATION_DEFERRED_FIELDS) in
  build_candidate_scope_queryset to avoid loading those columns for the
  potentially large candidate pool fetched per dedup batch.

Reduces deduplication second-import query count from 213 to 183 (-30).

---------

Co-authored-by: Matt Tesauro <mtesauro@gmail.com>
…#14449)

* perf(fp-history): batch false positive history processing

Replaces the N+1 query pattern in false positive history with a single
product-scoped DB query per batch, and switches per-finding save() calls
to QuerySet.update() to eliminate redundant signal overhead.

Changes:
- Extract _fp_candidates_qs() as the single algorithm-dispatch helper
  shared by both single-finding and batch lookup paths
- Add do_false_positive_history_batch() which fetches all FP candidates
  in one query and marks findings with a single UPDATE
- do_false_positive_history() now delegates to the batch function
- post_process_findings_batch (import/reimport) calls the batch function
  instead of a per-finding loop
- _bulk_update_finding_status_and_severity (bulk edit) groups findings
  by (product, dedup_alg) and calls the batch function once per group;
  retroactive reactivation also batched the same way
- Fix dead-code bug in process_false_positive_history: the condition
  finding.false_p and not finding.false_p was always False because
  form.save(commit=False) mutates the finding in place; fixed by
  capturing old_false_p before the form save
- Replace all per-finding save()/save_no_options() in FP history paths
  with QuerySet.update() (bypasses signals identically to the old calls)
- Move all FP history helpers from dojo/utils.py to
  dojo/finding/deduplication.py alongside the matching dedupe helpers

All update() calls carry a comment explaining the signal-bypass
equivalence with the previous save(skip_validation=True) calls.

Adds 4 unit tests covering: batch single-query behaviour, retroactive
batch FP marking, retroactive reactivation (previously dead code), and
the no-reactivation guard.

* perf(fp-history): add .only() to candidate fetch, fix update() comments

Limit _fetch_fp_candidates_for_batch to only the fields actually read
from candidate objects (id, false_p, active, hash_code,
unique_id_from_tool, title, severity), avoiding loading unused columns.

Correct update() comments to clarify that .only() does not constrain
QuerySet.update() — Django generates UPDATE SQL independently — so the
sync requirement is only for fields *read* from candidate objects.

* test(fp-history): assert exact query count in batch tests

assertNumQueries(7) on both batch tests covers: System_Settings,
4 lazy-load chain (test/engagement/product/test_type from findings[0]),
candidates SELECT with .only(), and the bulk UPDATE — fixed regardless
of batch size or number of retroactively marked findings.

* test(fp-history): assert query count stays flat with N affected findings

New test creates 5 pre-existing findings and asserts the batch still
uses exactly 7 queries regardless — proving the old O(N) per-finding
save loop is gone and a single bulk UPDATE covers all affected rows.
…8.0.1 (.github/workflows/rest-framework-tests.yml) (DefectDojo#14490)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…to v0.13.1 (.github/workflows/cancel-outdated-workflow-runs.yml) (DefectDojo#14491)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…0 to v7 (.github/workflows/release-drafter.yml) (DefectDojo#14513)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.15.5 to 0.15.6.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.15.5...0.15.6)

---
updated-dependencies:
- dependency-name: ruff
  dependency-version: 0.15.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…43.76.4 (.github/workflows/renovate.yaml) (DefectDojo#14526)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
… v2.5.3 (.github/workflows/release-x-manual-helm-chart.yml) (DefectDojo#14525)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
… v2.6.1 (.github/workflows/release-x-manual-helm-chart.yml) (DefectDojo#14532)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…ces and publish date. (DefectDojo#14498)

* Support CVSS4 and also import CVSS vectors, references and publish date.

* Fix linter issues
…fectdojo/chart.yaml) (DefectDojo#14509)

* chore(deps): update valkey docker tag from 0.17.1 to v0.18.0 (helm/defectdojo/chart.yaml)

* update Helm documentation

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* deduplication: return modified findings

* fix(lint): remove unnecessary elif after return (RET505)

* update comments
…n fix

- Add test XML with same QID on ports 80, 443, 8080
- Add test verifying each port gets its own endpoint
- Add 2.57.x release notes mentioning the fix

Addresses review feedback from @Maffooch on PR DefectDojo#14528
@tejas0077 tejas0077 force-pushed the fix/qualys-port-deduplication branch from a4e5698 to a998e63 Compare March 30, 2026 18:42
@github-actions
Copy link
Copy Markdown
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@tejas0077
Copy link
Copy Markdown
Contributor Author

Hi @valentijnscholten, I've rebased the branch onto the latest bugfix branch. Conflicts are resolved. Please take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

apiv2 docker docs helm parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR ui unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants