Fix : Repo Indexing Stuck in Pending State (#228) #232

AkshitGarg054 · 2026-01-12T18:03:22Z

Closes #228

📝 Description

This PR improves the repository indexing flow to prevent repositories from getting stuck indefinitely in a pending state and allows safe recovery from failures or backend crashes.
The change ensures that indexing can always be retried unless a repository has already been successfully indexed.

🔧 Problem Solved

Previously, repositories could remain stuck in pending state if :

The code-graph backend crashed,
The network dropped mid-request, or
or no response was returned from /analyze_repo.

In such cases, subsequent /index_repository requests would only show “indexing in progress” and never restart indexing, leaving users blocked.

🔧 Solution

The indexing flow is updated to treat pending and failed states as recoverable, while keeping completed as a terminal state.
Key changes made :

completed --> indexing is blocked (no re-index)
failed --> indexing is restarted
pending --> indexing is restarted (covers slow jobs and backend crashes)

User is clearly informed that indexing may take 30–35 minutes.

FINAL INDEXING FLOW :

Case 1: New repository (never indexed before) :

User runs /index_repository
Discord immediately shows : “Indexing started. This can take 30–35 minutes.”
Backend creates a new DB row with status = pending.
Backend calls /analyze_repo
If indexing succeeds :
DB --> status = completed , user sees “Repository Indexed”.
If indexing fails :
DB --> status = failed , user sees “Indexing Failed”.

Case 2: Repository already indexed (completed)

User runs /index_repository.
Backend checks DB.
Status is completed.
User sees : “Repository already indexed”.

Case 3: Repository in failed state

User runs /index_repository.
Backend finds status = failed.
Backend resets : status = pending , last_error = null.
/analyze_repo is called again and user sees “Indexing started again. This can take 30–35 minutes.”

Case 4: Repository in pending state (including backend crash)

User runs /index_repository.
Backend finds status = pending
This could mean : Indexing is slow OR backend crashed earlier.
Backend resets and restarts indexing
User sees : “Indexing started again. Please wait 30–35 minutes.”

✅ Checklist

I have read the contributing guidelines.
[ ] I have added tests that prove my fix is effective or that my feature works.
I have added necessary documentation (if applicable).
Any dependent changes have been merged and published in downstream modules.

Summary by CodeRabbit

Bug Fixes
- Clean up expired verification tokens before creating new verification sessions.
- Allow pending or failed repository indexing to be restarted instead of returned as an error.
Improvements
- Preserve “verification pending” prompts for still-valid tokens with timezone-aware expiry checks.
- Clarified indexing messages: updated failure wording and added expected duration guidance (~30–35 minutes).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-12T18:03:34Z

📝 Walkthrough

Walkthrough

Adds server-side cleanup of expired verification tokens before creating sessions, makes pending repository indexing retryable (reset to pending, clear errors, retrigger), and adds timezone-aware verification token expiry checks plus updated Discord messaging.

Changes

Cohort / File(s)	Summary
Token cleanup & verification expiry `backend/app/services/auth/verification.py`, `backend/integrations/discord/cogs.py`	`create_verification_session` now calls `cleanup_expired_tokens()` before session creation; Discord cog normalizes `verification_token_expires_at` (string or datetime) to UTC and preserves the "Verification Pending" prompt when token remains valid.
Repository indexing retry logic `backend/app/services/codegraph/repo_service.py`	Non-completed indexing statuses (including `pending` and `failed`) are now treated as retryable: reset `indexing_status` to `pending`, clear `last_error`, update timestamp, and restart indexing; `completed` still returns an error.
Discord messaging updates `backend/integrations/discord/cogs.py`	Indexing messages now mention 30–35 minute duration for large repos, change failure wording from "Could not index" to "Indexing did not complete", and remove the prior "pending" tip branch.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant DiscordCog
  participant AuthService
  participant DB
  participant RepoService
  participant Analyzer

  User->>DiscordCog: Request to index repo / verify
  DiscordCog->>AuthService: Check verification_token and expires_at
  AuthService->>DB: cleanup_expired_tokens()
  AuthService-->>DiscordCog: token valid? (yes/no)
  alt token valid
    DiscordCog-->>User: "Verification Pending"
  else token expired or none
    DiscordCog->>AuthService: create_verification_session()
    AuthService->>DB: create session record
    AuthService-->>DiscordCog: session created
  end

  User->>RepoService: index_repo(repo)
  RepoService->>DB: fetch repo record
  alt record not found
    RepoService->>DB: insert record (pending)
    RepoService->>Analyzer: trigger /analyze_repo
  else record found & completed
    RepoService-->>User: error (already indexed)
  else record found & not completed (pending/failed)
    RepoService->>DB: set indexing_status = pending, clear last_error, touch timestamp
    RepoService->>Analyzer: retrigger /analyze_repo
    RepoService-->>User: indexing restarted
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

BugFix: Refactor Discord bot to use slash commands and use ephemeral message #112: Related changes to Discord cog's verification flow and token cleanup/expiry handling.

Suggested reviewers

smokeyScraper

Poem

🐰 I hopped through tokens, swept the stale,
Gave stuck indexes a brand-new trail.
Timezones tidy, expiries in sight,
Retries hum softly through day and night.
— a cheerful rabbit, nibbling code delights 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main fix: enabling repo indexing to recover from a stuck pending state by allowing retries, which directly addresses issue `#228`.
Linked Issues check	✅ Passed	All coding requirements from issue `#228` are implemented: pending state is now retryable, status resets to pending, errors clear, and /analyze_repo re-triggers for both pending and failed states.
Out of Scope Changes check	✅ Passed	The changes in verification.py token cleanup and Discord cogs message updates support the core objective by enabling fresh verification attempts and improving user feedback about indexing duration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

backend/integrations/discord/cogs.py (1)
38-46: Consider using logger instead of print and renaming to avoid shadowing.

Two minor observations:

Lines 42 and 44 use print statements while the rest of the file uses logger for consistency.

The method name cleanup_expired_tokens shadows the imported function with the same name (line 19). This works due to Python's name resolution but is confusing.
♻️ Suggested improvements
 `@tasks.loop`(minutes=5)
-async def cleanup_expired_tokens(self):
+async def cleanup_expired_tokens_task(self):
     """Periodic cleanup of expired verification tokens"""
     try:
-        print("--> Running token cleanup task...")
+        logger.debug("Running token cleanup task...")
         await cleanup_expired_tokens()
-        print("--> Token cleanup task finished.")
+        logger.debug("Token cleanup task finished.")
     except Exception as e:
         logger.error(f"Error during token cleanup: {e}")
Note: If renaming, update cog_load (line 33) and cog_unload (line 36) accordingly.

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e48d281 and 2c38e09.

📒 Files selected for processing (1)

backend/integrations/discord/cogs.py

🔇 Additional comments (4)

backend/integrations/discord/cogs.py (4)

6-6: LGTM!

Standard library import correctly added for the timezone-aware expiry handling logic.

132-166: LGTM! Well-implemented expiry handling with proper edge case coverage.

The verification token expiry logic correctly handles:

String parsing via fromisoformat (line 138)

Native datetime objects (line 140)

Naive datetimes normalized to UTC (lines 142-143)

Missing expiry treated as expired (lines 150-152) — good defensive coding

This ensures tokens without an expiry field won't remain permanently valid.

221-229: LGTM!

The newline separator between the sentences is now correctly placed, and the messaging aligns with the PR objective of informing users that indexing may take 30–35 minutes for large repositories.

275-286: LGTM!

The updated error message "Indexing did not complete" is more accurate than "Could not index" — it better reflects scenarios where indexing started but was interrupted (pending/failed states), aligning with the new retry-from-pending behavior.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @backend/app/services/codegraph/repo_service.py:
- Around line 71-83: Fix the typo in the comment message ("faild" → "failed")
where the restart log is emitted; update the logged string near the logger.info
call that says "Restarting indexing for {repo_info['full_name']} (previous
status: {status})" or any adjacent inline comment in RepoService to use "failed"
instead of "faild" so the log and comments are spelled correctly; verify the
change around the logger.info and the subsequent
supabase.table("indexed_repositories").update(...) block for consistency.

In @backend/integrations/discord/cogs.py:
- Around line 221-222: The two adjacent string literals used in the Discord
embed are concatenated without a separating space/newline, producing "...30-35
minutes.Please wait..."; update the embed text in cogs.py so the sentences are
separated (e.g., add a trailing space to the first literal, insert "\n" between
them, or combine into one string with proper spacing) where the embed message
for "⏳ Note: For large repositories..." is defined.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa66519 and efb6913.

📒 Files selected for processing (3)

backend/app/services/auth/verification.py
backend/app/services/codegraph/repo_service.py
backend/integrations/discord/cogs.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/services/auth/verification.py (1)

backend/integrations/discord/cogs.py (1)

cleanup_expired_tokens (39-46)

🔇 Additional comments (4)

backend/app/services/auth/verification.py (1)

39-40: LGTM - Good addition of token cleanup before session creation.

This ensures expired tokens are cleaned from the database before creating new verification sessions, preventing stale data accumulation. The ordering (DB cleanup first, then in-memory cleanup) is appropriate.

backend/app/services/codegraph/repo_service.py (1)

84-93: LGTM - New repository insertion path.

The insertion logic for new repositories is unchanged and correctly sets initial state to pending with no error.

backend/integrations/discord/cogs.py (2)

132-162: LGTM - Timezone-aware expiry validation.

The logic correctly handles both string and datetime types for expires_at, makes naive datetimes timezone-aware (assuming UTC storage convention), and allows users to retry verification when their previous token has expired.

271-293: LGTM - Error handling and messaging.

The updated error embed and tip logic appropriately handles different failure scenarios. The removal of the obsolete "pending" tip aligns with the new restart behavior where pending states are now recoverable.

backend/app/services/codegraph/repo_service.py

backend/integrations/discord/cogs.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@backend/integrations/discord/cogs.py`:
- Around line 133-148: The code currently leaves a token valid if
verification_token exists but verification_token_expires_at is None; update the
expiry check around expires_at/verification_token_expires_at so that if
expires_at is falsy (None) you mark is_expired = True (i.e., treat missing
expiry as expired) instead of leaving is_expired False — modify the block
handling expires_at/expires_at_dt (variables: expires_at, expires_at_dt,
is_expired, verification_token/verification_token_expires_at) to include an else
branch that sets is_expired = True.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efb6913 and e48d281.

📒 Files selected for processing (2)

backend/app/services/codegraph/repo_service.py
backend/integrations/discord/cogs.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/services/codegraph/repo_service.py

🔇 Additional comments (2)

backend/integrations/discord/cogs.py (2)

221-222: LGTM - Concatenation issue resolved.

The newline character at the end of line 221 properly separates the sentences in the Discord embed. This addresses the PR objective of informing users about the 30-35 minute indexing time.

273-276: LGTM - Improved error messaging.

The updated description "Indexing did not complete" is more accurate than "Could not index" given the new retry-from-pending behavior, where indexing may have started but not finished.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

backend/integrations/discord/cogs.py

AkshitGarg054 added 2 commits January 10, 2026 16:31

github-verification-stuck-fixed

276366d

fixed restart indexing from pending

efb6913

coderabbitai bot reviewed Jan 12, 2026

View reviewed changes

backend/app/services/codegraph/repo_service.py Show resolved Hide resolved

backend/integrations/discord/cogs.py Outdated Show resolved Hide resolved

fixed typo

e48d281

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

backend/integrations/discord/cogs.py Show resolved Hide resolved

fix: treat missing expiry as expired

2c38e09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix : Repo Indexing Stuck in Pending State (#228) #232

Fix : Repo Indexing Stuck in Pending State (#228) #232

Uh oh!

AkshitGarg054 commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 12, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix : Repo Indexing Stuck in Pending State (#228) #232

Are you sure you want to change the base?

Fix : Repo Indexing Stuck in Pending State (#228) #232

Uh oh!

Conversation

AkshitGarg054 commented Jan 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🔧 Problem Solved

🔧 Solution

FINAL INDEXING FLOW :

Case 1: New repository (never indexed before) :

Case 2: Repository already indexed (completed)

Case 3: Repository in failed state

Case 4: Repository in pending state (including backend crash)

✅ Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AkshitGarg054 commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 12, 2026 •

edited

Loading