Skip to content

Conversation

@SwenSchaeferjohann
Copy link

@SwenSchaeferjohann SwenSchaeferjohann commented Nov 27, 2025

Overview

  • Summary of changes

Testing

  • Testing performed to validate the changes

Summary by CodeRabbit

  • New Features

    • Added SQLite database backend support alongside existing PostgreSQL compatibility.
  • Improvements

    • Enhanced token data parsing with expanded discriminator support for multiple token variants.
    • Improved numeric precision handling for large integer values across database backends.
  • Bug Fixes

    • Optimized balance retrieval queries for better performance across different database backends.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 27, 2025

Walkthrough

This PR introduces comprehensive SQLite support across the application by adding conditional database backend detection and branching logic. New utility functions and query models handle string-based balance representations for SQLite while maintaining numeric types for PostgreSQL. Schema migrations and data persistence operations now differentiate between backends. Token parsing recognizes additional discriminators.

Changes

Cohort / File(s) Summary
Configuration
\.gitignore, Cargo\.toml
Adds ignore patterns for .cursor and **/photon.log; version bumped to 0.51.1
API Method - Balance Retrieval
src/api/method/get_compressed_account_balance.rs, src/api/method/get_compressed_balance_by_owner.rs, src/api/method/get_compressed_token_account_balance.rs, src/api/method/get_compressed_token_balances_by_owner.rs
Introduces SQLite-aware branching on balance retrieval: SQLite path uses string-based models (*ModelString) with parse_balance_string; PostgreSQL path uses numeric models with parse_decimal. Conditional logic adapts query model and parsing to backend.
API Method - Token Holders
src/api/method/get_compressed_mint_token_holders.rs
Adds backend-specific data fetching: SQLite branch uses OwnerBalanceModelString with parse_balance_string; non-SQLite uses OwnerBalanceModel with parse_decimal. Duplicate query logic preserved across branches.
Utilities
src/api/method/utils.rs
Adds public helpers is_sqlite and parse_balance_string; introduces new FromQueryResult models (BalanceModelString, LamportModelString, MintBalanceModel, MintBalanceModelString, OwnerBalanceModel, OwnerBalanceModelString) for string-based balance handling; imports DatabaseBackend from SeaORM.
Token Parsing
src/common/typedefs/account/v1.rs, src/common/typedefs/account/v2.rs
Expands parse_token_data logic to recognize additional token discriminators (v1, v2, sha_flat) alongside owner check; returns None if discriminator patterns don't match.
Data Persistence
src/ingester/persist/mod.rs
Introduces dual-backend persistence: SQLite path uses raw SQL with TEXT storage for numeric values; PostgreSQL path continues with SeaORM and numeric types. Updates execute_account_update_query_and_update_balances signature to include AccountType and ModificationType parameters. Reworks append_output_accounts and persist_token_accounts with backend-aware SQL construction. Adds helpers (option_bytes_to_sqlite, option_i64_to_sql, option_i32_to_sql, bool_to_sqlite) for SQLite conversion.
Schema Migrations
src/migration/migrations/standard/m20220101_000001_init.rs, src/migration/migrations/standard/m20250206_000007_init.rs
For SQLite branch: changes column types from REAL to TEXT for large integers (accounts.lamports, accounts.discriminator, token_accounts.amount, owner_balances.lamports, token_owner_balances.amount) to preserve precision. PostgreSQL path unchanged.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

Areas requiring extra attention:

  • SQL generation and data persistence layer (src/ingester/persist/mod.rs): Raw SQL construction for SQLite with TEXT-based numeric storage; verify correctness of arithmetic operations and precision preservation across both backends.
  • Balance retrieval branching consistency (multiple get_compressed_* files): Ensure consistent handling of string parsing across all methods and equivalent behavior between SQLite and PostgreSQL paths.
  • Schema migrations (m20220101_000001_init.rs, m20250206_000007_init.rs): Verify column type transitions preserve data correctly during migration and that both up/down paths are symmetric.
  • Token discriminator expansion (src/common/typedefs/account/*.rs): Confirm new discriminator checks (v1, v2, sha_flat) are correct and don't inadvertently filter valid token data.

Poem

🐰 A rabbit's hop through databases bright,
SQLite paths now shine alongside postgres light,
Text for numbers, precision preserved with care,
Schema migrations branching everywhere,
Two backends dancing, one query to declare! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description uses only placeholder text without providing any actual details about changes, testing approach, or implementation specifics. Fill in the 'Summary of changes' and 'Testing performed' sections with concrete details about the modifications made and how they were validated.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the changeset: storing large numbers as text in SQLite to preserve precision across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch swen/sqlite-strings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@SwenSchaeferjohann SwenSchaeferjohann changed the title Swen/sqlite strings chore: store large numbers as text in sqlite Nov 27, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/migration/migrations/standard/m20220101_000001_init.rs (1)

15-24: Consider extracting the duplicated execute_sql helper.

This helper function is duplicated across multiple migration files (m20220101_000001_init.rs, m20240914_000005_init.rs, m20241008_000006_init.rs, m20250206_000007_init.rs, custom20252201_000001_init.rs). Consider extracting it to a shared module to reduce duplication.

// In a shared module like src/migration/utils.rs:
pub async fn execute_sql(manager: &SchemaManager<'_>, sql: &str) -> Result<(), DbErr> {
    manager
        .get_connection()
        .execute(Statement::from_string(
            manager.get_database_backend(),
            sql.to_string(),
        ))
        .await?;
    Ok(())
}
src/api/method/utils.rs (1)

38-44: Edge case: parse_balance_string may produce confusing error message.

If the input string is "123.456", after splitting on '.', value becomes "123". If parsing fails, the error message shows "123" not the original "123.456", which could confuse debugging. Consider preserving the original value in the error message.

 pub fn parse_balance_string(value: &str) -> Result<u64, PhotonApiError> {
+    let original = value;
     // Handle potential decimal point from Decimal serialization
     let value = value.split('.').next().unwrap_or(value);
     value
         .parse::<u64>()
-        .map_err(|_| PhotonApiError::UnexpectedError(format!("Invalid balance: {}", value)))
+        .map_err(|_| PhotonApiError::UnexpectedError(format!("Invalid balance: {}", original)))
 }
src/common/typedefs/account/v2.rs (1)

44-57: Identical discriminator logic duplicated from v1.rs.

The parse_token_data method in both Account (v1.rs) and AccountV2 (v2.rs) contains identical discriminator detection logic. Consider extracting this to a shared helper function.

// In a shared location, e.g., src/common/typedefs/token_data.rs or a new utils module:
pub fn is_token_account(owner: &Pubkey, discriminator: u64) -> bool {
    let bytes = discriminator.to_le_bytes();
    let is_v1 = bytes == [2, 0, 0, 0, 0, 0, 0, 0];
    let is_v2 = bytes == [0, 0, 0, 0, 0, 0, 0, 3];
    let is_sha_flat = bytes == [0, 0, 0, 0, 0, 0, 0, 4];
    
    *owner == COMPRESSED_TOKEN_PROGRAM && (is_v1 || is_v2 || is_sha_flat)
}

// Then in both v1.rs and v2.rs:
pub fn parse_token_data(&self) -> Result<Option<TokenData>, IngesterError> {
    match self.data.as_ref() {
        Some(data) if is_token_account(&self.owner.0, data.discriminator.0) => {
            let token_data = TokenData::try_from_slice(data.data.0.as_slice())
                .map_err(|e| IngesterError::ParserError(format!("Failed to parse token data: {:?}", e)))?;
            Ok(Some(token_data))
        }
        _ => Ok(None),
    }
}
src/api/method/get_compressed_balance_by_owner.rs (1)

27-49: SQLite branching logic looks correct.

Both paths correctly handle the balance retrieval with appropriate model types and parsing functions. The logic is consistent with the pattern used across this PR.

Consider extracting the common query builder logic into a helper to reduce duplication, as the only differences are the model type and parse function. This would improve maintainability when query logic needs to change.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de9080f and 743bebe.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • .gitignore (1 hunks)
  • Cargo.toml (1 hunks)
  • src/api/method/get_compressed_account_balance.rs (2 hunks)
  • src/api/method/get_compressed_balance_by_owner.rs (2 hunks)
  • src/api/method/get_compressed_mint_token_holders.rs (2 hunks)
  • src/api/method/get_compressed_token_account_balance.rs (2 hunks)
  • src/api/method/get_compressed_token_balances_by_owner.rs (2 hunks)
  • src/api/method/utils.rs (4 hunks)
  • src/common/typedefs/account/v1.rs (1 hunks)
  • src/common/typedefs/account/v2.rs (1 hunks)
  • src/ingester/persist/mod.rs (6 hunks)
  • src/migration/migrations/standard/m20220101_000001_init.rs (1 hunks)
  • src/migration/migrations/standard/m20250206_000007_init.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
src/api/method/get_compressed_balance_by_owner.rs (1)
src/api/method/utils.rs (3)
  • is_sqlite (46-48)
  • parse_balance_string (38-44)
  • parse_decimal (31-36)
src/api/method/get_compressed_mint_token_holders.rs (1)
src/api/method/utils.rs (4)
  • is_sqlite (46-48)
  • parse_balance_string (38-44)
  • parse_decimal (31-36)
  • filter (279-296)
src/api/method/get_compressed_token_account_balance.rs (1)
src/api/method/utils.rs (3)
  • is_sqlite (46-48)
  • parse_balance_string (38-44)
  • parse_decimal (31-36)
src/api/method/get_compressed_account_balance.rs (1)
src/api/method/utils.rs (3)
  • is_sqlite (46-48)
  • parse_balance_string (38-44)
  • parse_decimal (31-36)
src/migration/migrations/standard/m20220101_000001_init.rs (4)
src/migration/migrations/standard/m20250206_000007_init.rs (1)
  • execute_sql (10-19)
src/migration/migrations/custom/custom20252201_000001_init.rs (1)
  • execute_sql (12-21)
src/migration/migrations/standard/m20240914_000005_init.rs (1)
  • execute_sql (9-18)
src/migration/migrations/standard/m20241008_000006_init.rs (1)
  • execute_sql (9-18)
src/api/method/utils.rs (2)
src/common/typedefs/unsigned_integer.rs (1)
  • value (55-56)
tests/integration_tests/utils.rs (2)
  • format (260-261)
  • conn (282-291)
src/api/method/get_compressed_token_balances_by_owner.rs (3)
src/api/method/utils.rs (4)
  • is_sqlite (46-48)
  • parse_balance_string (38-44)
  • parse_decimal (31-36)
  • filter (279-296)
tests/integration_tests/utils.rs (1)
  • conn (282-291)
src/api/method/get_compressed_mint_token_holders.rs (4)
  • token_owner_balances (57-57)
  • token_owner_balances (78-78)
  • token_owner_balances (85-103)
  • token_owner_balances (105-123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (17)
Cargo.toml (1)

10-10: Version bump looks appropriate for the SQLite precision fix.

Patch version increment aligns with the nature of this change—a fix/improvement for SQLite backend without breaking API changes.

.gitignore (1)

16-18: Sensible gitignore additions.

Ignoring .cursor (editor config) and **/photon.log (log files) keeps development artifacts out of version control.

src/migration/migrations/standard/m20250206_000007_init.rs (2)

42-43: Good: Using TEXT for large integers preserves precision in SQLite.

SQLite's REAL type loses precision beyond 2^53 (JavaScript's safe integer limit). Storing as TEXT and parsing in application code is the correct approach for u64 values like lamports.


244-245: Down migration correctly mirrors the TEXT column types.

The rollback schema maintains consistency with the up migration's TEXT storage for lamports and discriminator.

src/migration/migrations/standard/m20220101_000001_init.rs (1)

278-303: Comprehensive SQLite precision handling for all large integer columns.

All columns that could exceed SQLite's safe integer limit (2^53) are correctly changed to TEXT: lamports, discriminator, and amount across accounts, token_accounts, owner_balances, and token_owner_balances tables.

src/api/method/utils.rs (2)

46-48: Simple and clear backend detection.


335-368: New string-based models for SQLite are well-structured.

The *ModelString variants mirror the Decimal-based models and provide clear separation for backend-specific query handling.

src/ingester/persist/mod.rs (5)

316-339: Helper functions for SQLite SQL formatting are well-implemented.

Clean separation of concerns with dedicated functions for nullable values and boolean conversion. The hex formatting for bytes matches SQLite's blob literal syntax.


373-381: Good: Backend-aware balance parsing from query results.

SQLite returns TEXT that needs string parsing, while PostgreSQL returns Decimal directly. The fallback to Decimal::ZERO on parse failure is reasonable for robustness.


417-430: SQLite arithmetic on TEXT columns uses correct CAST pattern.

The double CAST (CAST(CAST(... AS INTEGER) + ... AS TEXT)) correctly handles arithmetic on TEXT-stored numbers and preserves the TEXT storage format.


481-555: SQLite raw SQL path correctly handles all account fields.

The raw SQL construction for SQLite properly quotes TEXT values (lamports, discriminator) and uses appropriate formatters for all field types. The ON CONFLICT DO NOTHING matches the SeaORM equivalent.


579-628: Token accounts SQLite path mirrors the accounts pattern consistently.

The same TEXT quoting pattern for amount and proper formatting for all other fields ensures consistency between the two persistence paths.

src/common/typedefs/account/v1.rs (1)

36-49: Discriminator patterns are internally consistent; external specification verification cannot be completed.

The code in src/common/typedefs/account/v1.rs correctly encodes token discriminators:

  • v1_token: [2, 0, 0, 0, 0, 0, 0, 0] = little-endian u64 value 2
  • v2_token: [0, 0, 0, 0, 0, 0, 0, 3] = little-endian u64 value 3 << 56
  • sha_flat_token: [0, 0, 0, 0, 0, 0, 0, 4] = little-endian u64 value 4 << 56

These checks are:

  • Identical in both v1.rs and v2.rs (consistent duplication)
  • Properly using to_le_bytes() to convert UnsignedInteger (u64 wrapper) to 8-byte little-endian arrays
  • No contradictions found in the codebase

However, the original review's request to verify these against Light Protocol's official token format specifications cannot be completed—no accessible external documentation confirms these are the canonical discriminator values. The codebase provides no comments explaining why these specific values represent each token variant.

src/api/method/get_compressed_token_balances_by_owner.rs (1)

85-123: LGTM!

The SQLite-aware branching correctly uses the appropriate model types and parsing functions. Both paths maintain identical query semantics (filtering, ordering, limiting) while adapting to the database-specific balance representation.

src/api/method/get_compressed_account_balance.rs (1)

19-41: LGTM!

The SQLite-aware branching correctly handles single account balance retrieval. The .map().transpose()?.unwrap_or(0) pattern is idiomatic for handling Option<Result<T, E>> and correctly defaults to 0 when no account is found.

src/api/method/get_compressed_mint_token_holders.rs (1)

84-124: LGTM!

The SQLite-aware branching correctly implements the token holders query with proper ordering (descending by amount, then owner) preserved in both paths. The model types and parsing functions are appropriately selected based on the database backend.

src/api/method/get_compressed_token_account_balance.rs (1)

37-59: LGTM!

The SQLite-aware branching correctly handles single token account balance retrieval, consistent with the pattern used in get_compressed_account_balance.rs. The .map().transpose()?.unwrap_or(0) handling is appropriate.

Comment on lines +498 to +502
let lamports = format!("'{}'", account.account.lamports.0);
let discriminator = match account.account.data.as_ref() {
Some(d) => format!("'{}'", d.discriminator.0),
None => "NULL".to_string(),
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for discriminator handling in write path (persist/mod.rs)
echo "=== Discriminator in persist/mod.rs ==="
rg -n "discriminator" src/ingester/persist/mod.rs -A3 -B3

echo ""
echo "=== Discriminator in api/method/ directory ==="
rg -n "discriminator" src/api/method/ -A3 -B3

Repository: Lightprotocol/photon

Length of output: 4330


🏁 Script executed:

#!/bin/bash
# Check the full context around the conditional logic (lines 485-530)
echo "=== Full context of insert logic ==="
sed -n '485,535p' src/ingester/persist/mod.rs

echo ""
echo "=== Search for database schema definition ==="
find . -name "*.sql" -o -name "*migration*" -o -name "*schema*" | head -20

echo ""
echo "=== Search for discriminator column definition ==="
rg -n "discriminator" --type sql

Repository: Lightprotocol/photon

Length of output: 4346


🏁 Script executed:

#!/bin/bash
# List migration files
echo "=== Migration files ==="
ls -la src/migration/migrations/

echo ""
echo "=== Check for discriminator schema definition ==="
cat src/migration/migrations/*.sql 2>/dev/null | grep -A5 -B5 "discriminator" || echo "No SQL files found"

echo ""
echo "=== Search for balance-related logic ==="
rg -n "balance" src/ingester/persist/mod.rs -A3 -B3

echo ""
echo "=== Search for discriminator usage in balance or calculations ==="
rg -n "discriminator" src/ -A2 -B2 | grep -A5 -B5 "balance\|where\|WHERE\|select\|SELECT" || echo "No balance-related discriminator usage found"

Repository: Lightprotocol/photon

Length of output: 5377


🏁 Script executed:

#!/bin/bash
# Search for where discriminator values are parsed/read after database retrieval
echo "=== How discriminator is extracted from query results ==="
rg -n "try_get.*discriminator|discriminator.*try_get" src/ -A2 -B2

echo ""
echo "=== Check api/method/utils.rs for discriminator handling ==="
sed -n '120,140p' src/api/method/utils.rs

echo ""
echo "=== Search for discriminator comparisons or operations ==="
rg -n "discriminator" src/api/ -A3 -B3 | head -80

echo ""
echo "=== Check the complete migration file for schema details ==="
ls -la src/migration/migrations/standard/

Repository: Lightprotocol/photon

Length of output: 4175


🏁 Script executed:

#!/bin/bash
# Search for discriminator usage in proof generation and API responses
echo "=== Discriminator usage in responses ==="
rg -n "discriminator" src/api/ -B3 -A3 | grep -v "^--$" | head -60

echo ""
echo "=== Check how query results are mapped to response struct ==="
rg -n "CompressedAccountData\|TokenAccountData" src/api/method/ -A10 | head -80

echo ""
echo "=== Look for any string trimming or quote stripping logic ==="
rg -n "trim|strip|replace.*['\"]" src/ | grep -i discrimin

echo ""
echo "=== Verify actual discriminator storage in SQLite path ==="
grep -n "discriminator.0" src/ingester/persist/mod.rs

Repository: Lightprotocol/photon

Length of output: 2389


🏁 Script executed:

#!/bin/bash
# Search for query result mapping and deserialization
echo "=== Search for map_row or result deserialization ==="
rg -n "map_row|from_row|TryFrom.*Row" src/api/method/ -A5

echo ""
echo "=== Look for CompressedAccountData struct definition and construction ==="
rg -n "impl.*CompressedAccountData\|CompressedAccountData {" src/ -A15 | head -100

echo ""
echo "=== Check if there's a custom from_row or deserialization ==="
rg -n "fn.*row.*discriminator|discriminator.*from" src/ -B5 -A5

echo ""
echo "=== Search in utils.rs for how data is constructed from query ==="
sed -n '140,200p' src/api/method/utils.rs

Repository: Lightprotocol/photon

Length of output: 4183


🏁 Script executed:

#!/bin/bash
# Find accounts entity/model
echo "=== Find accounts model definition ==="
find . -name "*accounts*" -path "*/entities/*" -o -name "*accounts*" -path "*/model*" | head -10

echo ""
echo "=== Search for TryInto for accounts ==="
rg -n "impl.*TryInto.*accounts\|impl.*TryFrom.*accounts" src/ -A20 | head -100

echo ""
echo "=== Look for CompressedAccountData in entire codebase ==="
rg -n "CompressedAccountData" src/ -B2 -A5

echo ""
echo "=== Check the accounts table schema in migration ==="
grep -A30 "CREATE TABLE.*accounts" src/migration/migrations/standard/m20250206_000007_init.rs

Repository: Lightprotocol/photon

Length of output: 5295


Critical data type mismatch in discriminator storage: [u8; 8] array truncated to single byte or numeric value

The discriminator field is defined as [u8; 8] (8-byte fixed array) in CompressedAccountData, but both database paths store it incorrectly:

  • SQLite path (line 500): format!("'{}'", d.discriminator.0) stores only the first byte as a quoted TEXT string (e.g., '123'), losing the remaining 7 bytes
  • PostgreSQL path (line 528): Decimal::from(x.discriminator.0) converts the first byte to a numeric type, incompatible with the byte array

When retrieved, SQLite returns the quoted ASCII representation (e.g., '123' as bytes) instead of the original 8-byte discriminator, causing proof generation and validation failures. Both implementations need to store the complete 8-byte discriminator array consistently.

🤖 Prompt for AI Agents
In src/ingester/persist/mod.rs around lines 498–502, the discriminator
(CompressedAccountData.discriminator: [u8;8]) is being stored incorrectly (only
the first byte as a quoted string for SQLite and converted to a numeric for
Postgres); change both DB paths to persist the entire 8-byte array consistently
(not via format!("'{}'", ...) or Decimal::from(...)). For SQLite write the raw 8
bytes as a BLOB (or a hex/base64 string consistently encoded) and bind the byte
slice directly in the insert/update statement; for Postgres use BYTEA (or the
same hex/base64 string) and bind the full 8-byte slice instead of converting to
Decimal. Update the SQL schema/columns if needed, use parameterized binding (no
string interpolation), and ensure read path decodes the stored bytes back to
[u8;8] the same way.

Copy link

@ananas-block ananas-block left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add integration tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants