[comp] Production Deploy by github-actions[bot] · Pull Request #2574 · trycompai/comp

github-actions · 2026-04-16T20:10:12Z

This is an automated pull request to release the candidate branch into production, which will trigger a deployment.
It was created by the [Production PR] action.

Summary by cubic

Adds a trust-portal deep scrape to reliably extract certifications (even from SPA trust centers) and merges them with the Firecrawl Agent results. Also extends timeouts and fixes response parsing to prevent empty or partial assessments.

New Features
- Added deep-scrape that discovers anchors or clicks SPA tabs, aggregates markdown, extracts certifications, and merges with core results (deduped by slug with status priority).
- Reworked core agent: prompt now prioritizes returning trust_center_url, JSON schema extracted, and seed URLs expanded for better portal discovery.
- Increased task max duration to 30 minutes; added logs and persisted complianceBadgesJson and certificationsInAssessmentJson.
Bug Fixes
- Handle non-completed Firecrawl statuses and bump agent timeouts to 25 minutes for core/news; retry once on fetch failures to avoid silent empties.
- Parse agent payload by scoring candidates so populated .data wins over empty wrappers.
- Gate and sanitize URLs: pick on-domain sources, skip known third‑party portals, and escape CSS selectors during scraping.

^{Written for commit 08a3786. Summary will update on new commits.}

* fix(vendor): harden firecrawl trust center crawling * refactor(vendor): export TRUSTED_PORTAL_DOMAINS and add host check helper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(vendor): add trust portal section-url discovery helper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(vendor): add certification merge helper with status priority Pure mergeCertifications function dedupes by canonical slug and resolves status via verified > expired > unknown > not_certified priority, preferring core URL/dates on ties. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(vendor): scaffold trust portal deep-scrape orchestrator with gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(vendor): implement trust portal deep-scrape orchestrator Clicks through SPA sidebar sections, concatenates markdown from each, and extracts certifications via Claude Sonnet 4.6. * fix(vendor): escape CSS selector values and cover concurrency bound Add cssEscapeAttr helper to sanitize `\` and `"` inside CSS double-quoted attribute values in buildSectionScrapeOptions, preventing silent selector no-ops for anchor slugs containing CSS-reserved characters. Add two new tests: one verifying the escaping (using `\` which survives URL normalization) and one confirming mapWithConcurrency covers all items when section count (8) exceeds SECTION_CONCURRENCY (5). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(vendor): run trust portal deep-scrape after core agent Resolves a source URL (trust center -> security page -> verified cert url), runs deepScrapeTrustPortal, and merges certifications before returning. * refactor(vendor): extract pickDeepScrapeSourceUrl and tighten extraction prompt Move pickDeepScrapeSourceUrl into its own module with unit tests so firecrawl-agent-core.ts drops below the 300-line limit. Also hoist the Firecrawl Agent JSON schema into firecrawl-agent-schema-json.ts for the same reason. Tighten the Sonnet 4.6 extraction prompt to explicitly require evidence_snippet so Claude doesn't silently drop rows. * feat(vendor): log Agent snapshot, deep-scrape decision, and persisted certs Adds three diagnostic logs so a trigger.dev run tells the full story: - "Firecrawl Agent returned — pre-deep-scrape snapshot" dumps the raw Agent links, normalized links, and cert types/statuses before the deep-scrape decision. Exposes what the LLM actually found. - Deep-scrape branch logs either "source URL resolved" + merged types, "returned no certifications", or "skipped: no usable URL on vendor domain" with available links + verified certs — no more silent gate decisions. - "Risk level and badges extracted" now includes the full compliance badge payload and the certifications array being persisted to the vendor record, so DB-write state is inspectable from logs. * fix(vendor): json-stringify complex diagnostic log fields Trigger.dev's OpenTelemetry attribute pipeline strips nested objects and arrays — keeping only top-level scalars — so rich log payloads like rawAgentLinks, normalizedLinks, and complianceBadges were being silently discarded. Serialize them to JSON strings so they survive the OTel export and surface in the dashboard / MCP span details. * feat(vendor): rewrite Firecrawl Agent prompt — URL-discovery first Prior prompt treated trust_center_url as just another field, so when the Agent failed to extract certifications from a JavaScript SPA (e.g. ui.com/trust-center) it abandoned the whole output — including the URL the downstream deep-scrape needs. New prompt reframes the mission: - Primary goal: return trust_center_url even when page content is empty or SPA-only. Deep-scrape handles rendering; Agent just has to find. - Explicit numbered URL paths to try when nav discovery fails, including third-party portals keyed off the vendor slug. - Explicit instruction to return URLs of SPA-only pages rather than discarding them. - Stricter output contract marking trust_center_url as REQUIRED when any trust/security/compliance surface exists on the vendor domain. - Bumped maxCredits 2500 → 4000 to give the Agent headroom on sites that require multi-hop discovery. Prompt extracted into firecrawl-agent-prompt.ts to keep core orchestrator under the 300-line limit. * chore(vendor): log raw firecrawl agent response for ui.com diagnosis Adds temporary diagnostic logs capturing: - agentResponse.success / status / error / keys (before schema parse) - first 4KB of the raw agentResponse JSON - first 4KB of parsed.data JSON, plus security_assessment and risk_level The agent is returning links: null for ubiquiti even after the URL-first prompt rewrite — need to see what it IS returning to understand whether it's a fetch block, a model compliance issue, or a parse path we're missing. Pushes the file to 315 lines; will roll back once diagnosed. * fix(vendor): handle firecrawl agent processing status + extend timeouts Discovered via new diagnostic log: the Firecrawl SDK's agent call was returning status="processing" on ui.com because its internal poll timed out (360s) before the agent job completed on Firecrawl's side. Our code only guarded against status="failed", so it silently parsed the empty response as success — leaving vendor records with no certifications even when the agent could have found them given more time. Changes: - Guard on status !== "completed" instead of just "failed"; log clearly when SDK returns while job is still processing so timeouts are visible instead of silent. - Bump agent SDK timeout 360s -> 1500s (25 min) so slow SPA trust centers like Ubiquiti have room to finish. - Bump task maxDuration 10 min -> 30 min to accommodate the longer agent call plus deep-scrape + DB writes. * fix(vendor): score agent payload candidates by populated fields The firecrawl agent response has a nested shape: { success, status, data: { links, certifications, ... }, ... } extractAgentPayloadCandidates returns [wrapper, wrapper.data] in that order, and every field in vendorRiskAssessmentAgentSchema is optional. The wrapper therefore parsed successfully as an empty object and won the first-match .find() lookup — even though it contained no real fields. The actual .data payload (with trust_center_url, security page, privacy policy, etc.) was silently discarded. Pick the candidate with the most populated schema fields instead of the first success. This has been a latent bug on main — the ubiquiti run on v20260415.12 showed the same "found 0 links, 0 certifications" symptom. * fix(vendor): remove invalid maxCredits from scrape calls Firecrawl's v2 /scrape endpoint rejects maxCredits — that option belongs to the Agent API, not scrape. We were passing it on both the initial scrape and the per-section scrapes, and Firecrawl was returning "Unrecognized key in body", causing the deep-scrape pass to fail on its very first call. Replace with `timeout` (2 min per scrape, within Firecrawl's 5-min cap) which is the scrape v2 equivalent of "budget per call." * chore(vendor): log raw initial scrape output for section discovery diag Ubiquiti run finished with sectionCount=0 even though the initial scrape returned 9891 chars of markdown. Need to see what firecrawlClient.scrape actually returned in `links` to understand whether the sidebar items are missing from the response or whether discoverSectionUrls is wrongly filtering them out. Logs the first 50 links and the first 2KB of markdown from the initial scrape. Temporary diagnostic, will trim once the sidebar discovery strategy is fixed. * feat(vendor): llm-driven tab discovery for spa trust portals Ubiquiti's trust center sidebar items are <button>/<div onClick> elements with no href, so Firecrawl's `links` format returns 0 anchor URLs for them. URL-based section discovery then had nothing to work with and the deep-scrape only ever saw the landing tab. Add a tab-discovery step: when URL-based discovery yields zero sections, pass the initial markdown to Claude Sonnet 4.6 to identify sidebar labels, then scrape each one with an executeJavascript click-by-text action. The click script finds the matching element by exact textContent, scrolls it into view, and clicks it. Works for any SPA that has tab labels visible in the rendered markdown — not just Ubiquiti. Flow: 1. Initial scrape -> markdown + links 2. URL-based discovery (existing, unchanged) 3. If urlSections.length === 0 and markdown non-empty, call identifySidebarTabs to get labels from the LLM 4. Merge url-based + tab-label sections, dedupe by label, cap at 25 5. Per-section scrape with click-by-text OR click-by-href 6. Combine markdown, extract certs, merge Files: new trust-portal-deep-scrape-tabs.ts (92 lines) edit trust-portal-deep-scrape.ts (+70 lines) edit trust-portal-deep-scrape-sections.ts (+tabLabel field) edit trust-portal-deep-scrape.spec.ts (1 new test, 3 updated) * fix(vendor): apply same processing-status + timeout fixes to news agent firecrawlResearchNews had the exact two bugs we already fixed for firecrawlResearchCore: 1. Status guard was too loose (only `=== 'failed'`), so when the SDK returned `status: 'processing'` (Firecrawl still running the job after our SDK poll timed out) we silently proceeded to read agentResponse.data.news, got undefined, and logged "no news items." 2. Timeout was 360s while matching agent jobs for slow vendor sites routinely take 6+ minutes. Ubiquiti run hit 6m 1s and returned empty, matching the timeout boundary almost exactly. Bump timeout 360s -> 1500s (matches core), guard on `!== 'completed'`, and add the same diagnostic logs we added to core so future runs surface the raw agent response + data shape when news comes back empty. * refactor(vendor): extract payload + scrape-option helpers, trim verbose logs Post-debugging cleanup. No behavior change. Files split so both orchestrators drop back under the 300-line rule: - firecrawl-agent-payload.ts (58) — asRecord, extractAgentPayloadCandidates, countPopulatedAgentFields. Moved out of firecrawl-agent-core.ts so the payload-candidate logic can be shared and tested separately. - trust-portal-deep-scrape-scrape-options.ts (107) — cssEscapeAttr, buildClickByTextScript, buildInitialScrapeOptions, buildSectionScrapeOptions. Moved out of trust-portal-deep-scrape.ts so the scrape-option + click-by-text JS builders are isolated from the orchestration code. Log trimming — drop the 4KB agent-response and 2KB markdown-head dumps from happy-path logs. They were added for live diagnosis and landed big blobs in every prod run. Keep scalar summary fields. Full raw-response JSON now only logged on the exceptional "not completed" warning path where it is actually useful, not on every successful run. File line counts: firecrawl-agent-core.ts 315 -> 296 trust-portal-deep-scrape.ts 383 -> 293 firecrawl-agent-news.ts 172 -> 158 67/67 tests still pass. --------- Co-authored-by: Mariano Fuentes <marfuen98@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

vercel · 2026-04-16T20:10:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
comp-framework-editor	Ready	Preview, Comment	Apr 16, 2026 8:10pm

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
app (staging)	Skipped		Apr 16, 2026 8:10pm
portal (staging)	Skipped		Apr 16, 2026 8:10pm

cubic-dev-ai

No issues found across 19 files

_{Requires human review: This PR contains significant business logic changes, including a new deep-scrape feature, reworked core agent prompts, and increased task timeouts, requiring human review.}

claudfuen · 2026-04-16T20:19:00Z

🎉 This PR is included in version 3.23.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

github-actions Bot and others added 2 commits April 16, 2026 19:53

chore: merge release v3.23.0 back to main [skip ci]

fdf8dc9

github-actions Bot added prod-deploy automated-pr labels Apr 16, 2026

vercel Bot deployed to Preview – comp-framework-editor April 16, 2026 20:10 View deployment

cubic-dev-ai Bot reviewed Apr 16, 2026

View reviewed changes

Marfuen merged commit a840682 into release Apr 16, 2026
14 checks passed

claudfuen added the released label Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[comp] Production Deploy#2574

[comp] Production Deploy#2574
Marfuen merged 2 commits into
releasefrom
main

github-actions Bot commented Apr 16, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

vercel Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

claudfuen commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

github-actions Bot commented Apr 16, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

vercel Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claudfuen commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 16, 2026 •

edited by cubic-dev-ai Bot

Loading

vercel Bot commented Apr 16, 2026 •

edited

Loading