Skip to content

Conversation

@AkshitGarg054
Copy link

@AkshitGarg054 AkshitGarg054 commented Jan 12, 2026

Closes #228

📝 Description

This PR improves the repository indexing flow to prevent repositories from getting stuck indefinitely in a pending state and allows safe recovery from failures or backend crashes.
The change ensures that indexing can always be retried unless a repository has already been successfully indexed.

🔧 Problem Solved

Previously, repositories could remain stuck in pending state if :

  • The code-graph backend crashed,
  • The network dropped mid-request, or
  • or no response was returned from /analyze_repo.

In such cases, subsequent /index_repository requests would only show “indexing in progress” and never restart indexing, leaving users blocked.

🔧 Solution

The indexing flow is updated to treat pending and failed states as recoverable, while keeping completed as a terminal state.
Key changes made :

completed --> indexing is blocked (no re-index)
failed --> indexing is restarted
pending --> indexing is restarted (covers slow jobs and backend crashes)

User is clearly informed that indexing may take 30–35 minutes.

FINAL INDEXING FLOW :

Case 1: New repository (never indexed before) :

  • User runs /index_repository
  • Discord immediately shows : “Indexing started. This can take 30–35 minutes.”
  • Backend creates a new DB row with status = pending.
  • Backend calls /analyze_repo
    If indexing succeeds :
    DB --> status = completed , user sees “Repository Indexed”.
    If indexing fails :
    DB --> status = failed , user sees “Indexing Failed”.

Case 2: Repository already indexed (completed)

  • User runs /index_repository.
  • Backend checks DB.
  • Status is completed.
  • User sees : “Repository already indexed”.

Case 3: Repository in failed state

  • User runs /index_repository.
  • Backend finds status = failed.
  • Backend resets : status = pending , last_error = null.
  • /analyze_repo is called again and user sees “Indexing started again. This can take 30–35 minutes.”

Case 4: Repository in pending state (including backend crash)

  • User runs /index_repository.
  • Backend finds status = pending
    This could mean : Indexing is slow OR backend crashed earlier.
  • Backend resets and restarts indexing
    User sees : “Indexing started again. Please wait 30–35 minutes.”

✅ Checklist

  • I have read the contributing guidelines.
  • [ ] I have added tests that prove my fix is effective or that my feature works.
  • I have added necessary documentation (if applicable).
  • Any dependent changes have been merged and published in downstream modules.

Summary by CodeRabbit

  • Bug Fixes

    • Clean up expired verification tokens before creating new verification sessions.
    • Allow pending or failed repository indexing to be restarted instead of returned as an error.
  • Improvements

    • Preserve “verification pending” prompts for still-valid tokens with timezone-aware expiry checks.
    • Clarified indexing messages: updated failure wording and added expected duration guidance (~30–35 minutes).

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

Adds server-side cleanup of expired verification tokens before creating sessions, makes pending repository indexing retryable (reset to pending, clear errors, retrigger), and adds timezone-aware verification token expiry checks plus updated Discord messaging.

Changes

Cohort / File(s) Summary
Token cleanup & verification expiry
backend/app/services/auth/verification.py, backend/integrations/discord/cogs.py
create_verification_session now calls cleanup_expired_tokens() before session creation; Discord cog normalizes verification_token_expires_at (string or datetime) to UTC and preserves the "Verification Pending" prompt when token remains valid.
Repository indexing retry logic
backend/app/services/codegraph/repo_service.py
Non-completed indexing statuses (including pending and failed) are now treated as retryable: reset indexing_status to pending, clear last_error, update timestamp, and restart indexing; completed still returns an error.
Discord messaging updates
backend/integrations/discord/cogs.py
Indexing messages now mention 30–35 minute duration for large repos, change failure wording from "Could not index" to "Indexing did not complete", and remove the prior "pending" tip branch.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant DiscordCog
  participant AuthService
  participant DB
  participant RepoService
  participant Analyzer

  User->>DiscordCog: Request to index repo / verify
  DiscordCog->>AuthService: Check verification_token and expires_at
  AuthService->>DB: cleanup_expired_tokens()
  AuthService-->>DiscordCog: token valid? (yes/no)
  alt token valid
    DiscordCog-->>User: "Verification Pending"
  else token expired or none
    DiscordCog->>AuthService: create_verification_session()
    AuthService->>DB: create session record
    AuthService-->>DiscordCog: session created
  end

  User->>RepoService: index_repo(repo)
  RepoService->>DB: fetch repo record
  alt record not found
    RepoService->>DB: insert record (pending)
    RepoService->>Analyzer: trigger /analyze_repo
  else record found & completed
    RepoService-->>User: error (already indexed)
  else record found & not completed (pending/failed)
    RepoService->>DB: set indexing_status = pending, clear last_error, touch timestamp
    RepoService->>Analyzer: retrigger /analyze_repo
    RepoService-->>User: indexing restarted
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • smokeyScraper

Poem

🐰 I hopped through tokens, swept the stale,
Gave stuck indexes a brand-new trail.
Timezones tidy, expiries in sight,
Retries hum softly through day and night.
— a cheerful rabbit, nibbling code delights 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: enabling repo indexing to recover from a stuck pending state by allowing retries, which directly addresses issue #228.
Linked Issues check ✅ Passed All coding requirements from issue #228 are implemented: pending state is now retryable, status resets to pending, errors clear, and /analyze_repo re-triggers for both pending and failed states.
Out of Scope Changes check ✅ Passed The changes in verification.py token cleanup and Discord cogs message updates support the core objective by enabling fresh verification attempts and improving user feedback about indexing duration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
backend/integrations/discord/cogs.py (1)

38-46: Consider using logger instead of print and renaming to avoid shadowing.

Two minor observations:

  1. Lines 42 and 44 use print statements while the rest of the file uses logger for consistency.
  2. The method name cleanup_expired_tokens shadows the imported function with the same name (line 19). This works due to Python's name resolution but is confusing.
♻️ Suggested improvements
 `@tasks.loop`(minutes=5)
-async def cleanup_expired_tokens(self):
+async def cleanup_expired_tokens_task(self):
     """Periodic cleanup of expired verification tokens"""
     try:
-        print("--> Running token cleanup task...")
+        logger.debug("Running token cleanup task...")
         await cleanup_expired_tokens()
-        print("--> Token cleanup task finished.")
+        logger.debug("Token cleanup task finished.")
     except Exception as e:
         logger.error(f"Error during token cleanup: {e}")

Note: If renaming, update cog_load (line 33) and cog_unload (line 36) accordingly.


📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e48d281 and 2c38e09.

📒 Files selected for processing (1)
  • backend/integrations/discord/cogs.py
🔇 Additional comments (4)
backend/integrations/discord/cogs.py (4)

6-6: LGTM!

Standard library import correctly added for the timezone-aware expiry handling logic.


132-166: LGTM! Well-implemented expiry handling with proper edge case coverage.

The verification token expiry logic correctly handles:

  • String parsing via fromisoformat (line 138)
  • Native datetime objects (line 140)
  • Naive datetimes normalized to UTC (lines 142-143)
  • Missing expiry treated as expired (lines 150-152) — good defensive coding

This ensures tokens without an expiry field won't remain permanently valid.


221-229: LGTM!

The newline separator between the sentences is now correctly placed, and the messaging aligns with the PR objective of informing users that indexing may take 30–35 minutes for large repositories.


275-286: LGTM!

The updated error message "Indexing did not complete" is more accurate than "Could not index" — it better reflects scenarios where indexing started but was interrupted (pending/failed states), aligning with the new retry-from-pending behavior.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @backend/app/services/codegraph/repo_service.py:
- Around line 71-83: Fix the typo in the comment message ("faild" → "failed")
where the restart log is emitted; update the logged string near the logger.info
call that says "Restarting indexing for {repo_info['full_name']} (previous
status: {status})" or any adjacent inline comment in RepoService to use "failed"
instead of "faild" so the log and comments are spelled correctly; verify the
change around the logger.info and the subsequent
supabase.table("indexed_repositories").update(...) block for consistency.

In @backend/integrations/discord/cogs.py:
- Around line 221-222: The two adjacent string literals used in the Discord
embed are concatenated without a separating space/newline, producing "...30-35
minutes.Please wait..."; update the embed text in cogs.py so the sentences are
separated (e.g., add a trailing space to the first literal, insert "\n" between
them, or combine into one string with proper spacing) where the embed message
for "⏳ Note: For large repositories..." is defined.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa66519 and efb6913.

📒 Files selected for processing (3)
  • backend/app/services/auth/verification.py
  • backend/app/services/codegraph/repo_service.py
  • backend/integrations/discord/cogs.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/auth/verification.py (1)
backend/integrations/discord/cogs.py (1)
  • cleanup_expired_tokens (39-46)
🔇 Additional comments (4)
backend/app/services/auth/verification.py (1)

39-40: LGTM - Good addition of token cleanup before session creation.

This ensures expired tokens are cleaned from the database before creating new verification sessions, preventing stale data accumulation. The ordering (DB cleanup first, then in-memory cleanup) is appropriate.

backend/app/services/codegraph/repo_service.py (1)

84-93: LGTM - New repository insertion path.

The insertion logic for new repositories is unchanged and correctly sets initial state to pending with no error.

backend/integrations/discord/cogs.py (2)

132-162: LGTM - Timezone-aware expiry validation.

The logic correctly handles both string and datetime types for expires_at, makes naive datetimes timezone-aware (assuming UTC storage convention), and allows users to retry verification when their previous token has expired.


271-293: LGTM - Error handling and messaging.

The updated error embed and tip logic appropriately handles different failure scenarios. The removal of the obsolete "pending" tip aligns with the new restart behavior where pending states are now recoverable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/integrations/discord/cogs.py`:
- Around line 133-148: The code currently leaves a token valid if
verification_token exists but verification_token_expires_at is None; update the
expiry check around expires_at/verification_token_expires_at so that if
expires_at is falsy (None) you mark is_expired = True (i.e., treat missing
expiry as expired) instead of leaving is_expired False — modify the block
handling expires_at/expires_at_dt (variables: expires_at, expires_at_dt,
is_expired, verification_token/verification_token_expires_at) to include an else
branch that sets is_expired = True.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efb6913 and e48d281.

📒 Files selected for processing (2)
  • backend/app/services/codegraph/repo_service.py
  • backend/integrations/discord/cogs.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/services/codegraph/repo_service.py
🔇 Additional comments (2)
backend/integrations/discord/cogs.py (2)

221-222: LGTM - Concatenation issue resolved.

The newline character at the end of line 221 properly separates the sentences in the Discord embed. This addresses the PR objective of informing users about the 30-35 minute indexing time.


273-276: LGTM - Improved error messaging.

The updated description "Indexing did not complete" is more accurate than "Could not index" given the new retry-from-pending behavior, where indexing may have started but not finished.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG : Repository indexing can get stuck in pending with no recovery path

1 participant