Skip to content

Latest commit

 

History

History
401 lines (310 loc) · 10.4 KB

File metadata and controls

401 lines (310 loc) · 10.4 KB

SQL Analyzer - LLM-Powered Automated Fixing

This guide explains how to use the LLM-powered automated fixing feature to let AI suggest and apply fixes to SQL best practice violations.

Quick Start

# 1. Create .env file with your LLM API key (repo root)
cd c:\Projects\Analyzer
copy .env.example .env
# Edit .env and add your API key

# 2. Analyze and fix a SQL file
cd SqlAnalyzer\bin\Release\net8.0
SqlAnalyzer.exe --file "path\to\your\file.sql" --fix

What This Does

The --fix flag sends your SQL code and all detected violations to an LLM (OpenAI GPT-4 by default), which:

  1. Analyzes each violation in context
  2. Suggests specific fixes for each issue
  3. Rewrites the SQL code with improvements
  4. Documents what changed and why
  5. Saves the fixed SQL to a new file

Configuration

Create .env File

Create a .env file at the repository root (same folder as Analyzer.sln):

# OpenAI Configuration
LLM_API_ENDPOINT=https://api.openai.com/v1
LLM_API_KEY=sk-your-actual-api-key-here
LLM_MODEL=gpt-4

Configuration Options

Variable Description Example
LLM_API_ENDPOINT API endpoint URL https://api.openai.com/v1
LLM_API_KEY Your API key sk-...
LLM_MODEL Model to use gpt-4, gpt-4-turbo, gpt-3.5-turbo

Azure OpenAI

For Azure OpenAI, use:

LLM_API_ENDPOINT=https://your-resource.openai.azure.com
LLM_API_KEY=your-azure-api-key
LLM_MODEL=gpt-4

Custom OpenAI-Compatible APIs

The analyzer works with any OpenAI-compatible API:

LLM_API_ENDPOINT=https://your-custom-api.com/v1
LLM_API_KEY=your-api-key
LLM_MODEL=your-model-name

Usage Examples

Basic Fix

SqlAnalyzer.exe --file query.sql --fix

Output:

  • query.fixed.sql - Fixed SQL code
  • Console output showing changes made

With JSON Report

SqlAnalyzer.exe --file query.sql --fix --json

Output:

  • query.fixed.sql - Fixed SQL code
  • query.fix-report.json - Detailed JSON report with:
    • Original violations
    • LLM response (changes made, violations addressed)
    • Output file path

Batch Processing (Coming Soon)

# Currently --fix only works with --file, not --directory
# Directory support planned for future release
SqlAnalyzer.exe --directory "C:\Scripts" --fix  # Not yet supported

Output Files

Fixed SQL File

Naming: original-name.fixed.sql

Example:

  • Input: GetUser.sql
  • Output: GetUser.fixed.sql

JSON Report (with --json flag)

Naming: original-name.fix-report.json

Example structure:

{
  "success": true,
  "fileName": "GetUser.sql",
  "originalViolations": [
    {
      "ruleName": "AvoidSelectStar",
      "description": "SELECT * should be replaced with explicit column list",
      "severity": "Warning",
      "line": 5,
      "column": 1
    }
  ],
  "llmFix": {
    "success": true,
    "fixedSql": "SELECT UserId, UserName, Email FROM Users WHERE UserId = @UserId",
    "changesMade": [
      "Replaced SELECT * with explicit column list (UserId, UserName, Email)",
      "Changed = NULL to IS NULL for proper null comparison"
    ],
    "violationsAddressed": [
      "AvoidSelectStar",
      "NullEqualityComparison"
    ],
    "notes": "All violations have been addressed. The code now follows SQL best practices."
  },
  "outputFile": "GetUser.fixed.sql"
}

Example Session

PS> .\SqlAnalyzer.exe --file BadQuery.sql --fix

SQL Server Best Practices Analyzer
====================================

Analyzing file: BadQuery.sql

Found 5 violation(s):

[Warning] AvoidSelectStar
  SELECT * should be replaced with explicit column list
  Location: Line 5, Column 1

[Error] MissingWhereClause
  UPDATE without WHERE will affect all rows
  Location: Line 10, Column 1

[Warning] MissingSchemaQualification
  Table 'Users' should be qualified as 'dbo.Users'
  Location: Line 5, Column 15

[Error] NullEqualityComparison
  Use IS NULL instead of = NULL
  Location: Line 7, Column 20

[Warning] AvoidNolock
  NOLOCK hint can cause dirty reads
  Location: Line 5, Column 20

Sending to LLM (gpt-4) for automated fixing...

✓ LLM successfully generated fixes!

Changes made:
  • Replaced SELECT * with explicit column list
  • Added WHERE clause with UserId parameter
  • Added schema qualification (dbo.Users)
  • Changed = NULL to IS NULL
  • Removed NOLOCK hint, added READ COMMITTED isolation level

Violations addressed:
  ✓ AvoidSelectStar
  ✓ MissingWhereClause
  ✓ MissingSchemaQualification
  ✓ NullEqualityComparison
  ✓ AvoidNolock

Notes:
  All violations have been addressed. Added proper error handling with TRY/CATCH
  and transaction management. The code now follows SQL Server best practices.

✓ Fixed SQL saved to: BadQuery.fixed.sql

What the LLM Fixes

The LLM is prompted to address all detected violations:

Performance Issues

  • SELECT * → Explicit column lists
  • NOLOCK hints → Proper isolation levels
  • Functions on indexed columns → Sargable predicates
  • Correlated subqueries → JOINs or CTEs

Security Issues

  • SQL injection risks → Parameterization
  • Missing error handling → TRY/CATCH blocks
  • Missing transactions → BEGIN/COMMIT/ROLLBACK

Code Quality

  • Missing WHERE clauses → Safe WHERE conditions
  • = NULL → IS NULL
  • Old-style joins → ANSI JOIN syntax
  • Missing schema qualification → dbo.TableName

Design Issues

  • Deprecated types (TEXT, NTEXT) → VARCHAR(MAX), NVARCHAR(MAX)
  • MONEY type → DECIMAL(19,4)

Cost Considerations

LLM API calls cost money! Estimates:

File Size Tokens GPT-4 Cost GPT-3.5 Cost
Small (100 lines) ~3K $0.09 $0.006
Medium (500 lines) ~10K $0.30 $0.020
Large (2000 lines) ~30K $0.90 $0.060

Tips to reduce costs:

  • Use gpt-3.5-turbo for simple fixes
  • Fix critical violations manually first
  • Use --summary to preview violations before fixing
  • Fix high-impact violations only

Troubleshooting

"Error: .env file not found"

Solution: Create .env file at the repository root (same folder as Analyzer.sln). The CLI automatically loads it from there.

  cd c:\Projects\Analyzer
  copy .env.example .env
  # Edit .env with your API key

"Error: .env file is missing required settings"

Solution: Ensure your .env has both LLM_API_ENDPOINT and LLM_API_KEY

LLM_API_ENDPOINT=https://api.openai.com/v1
LLM_API_KEY=sk-your-key-here

"Error calling LLM: Unauthorized"

Solution: Check your API key is valid

"Error calling LLM: Timeout"

Solution: LLM took too long (>2 minutes)

  • Try a faster model (gpt-3.5-turbo)
  • Split large files into smaller chunks
  • Check your network connection

"LLM fix failed: Invalid JSON response"

Solution: LLM returned malformed output

  • This is rare with GPT-4
  • Try again (LLM responses can vary)
  • Report the issue with your SQL file

Fixed SQL has syntax errors

Solution: Re-analyze the fixed file

SqlAnalyzer.exe --file query.fixed.sql

If violations remain, the LLM may need better context. Consider:

  • Adding comments explaining business logic
  • Simplifying complex queries
  • Fixing critical issues manually first

Limitations

Current Limitations

  • ✅ Works with: Single SQL files (--file)
  • ❌ Not yet: Directory analysis (--directory)
  • ❌ Not yet: Execution plans, Extended Events, DACPACs
  • ❌ Not yet: Interactive review before applying fixes

What LLM Cannot Fix

  • Schema-dependent issues: Missing indexes, invalid columns (needs actual database)
  • Business logic errors: LLM doesn't know your requirements
  • Performance without execution plans: LLM can't see runtime statistics
  • Production-specific issues: Connection strings, server names, etc.

Best Practices

DO:

  • Review fixed SQL before deploying
  • Test in development first
  • Keep original files as backup (analyzer does this automatically)
  • Use version control
  • Start with small/simple files

DON'T:

  • Blindly apply fixes to production
  • Fix files with sensitive data (API sends to OpenAI!)
  • Expect LLM to understand complex business rules
  • Use on files with inline credentials

Security & Privacy

Data Transmission

⚠️ IMPORTANT: When using --fix, your SQL code is sent to the configured LLM API (OpenAI by default).

What's sent:

  • Your entire SQL file content
  • All detected violations
  • File name

What's NOT sent:

  • Your API key (used for authentication only)
  • Other files in your directory
  • Database connection strings (unless they're in the SQL file!)

Recommendations

  • Remove sensitive data from SQL files before fixing
  • Use self-hosted LLM for proprietary code
  • Review .fixed.sql files before committing to source control
  • Consider local-only analysis for sensitive environments (don't use --fix)

Programmatic Usage

You can also use the LLM fix service in your own code:

using SqlAnalyzer;

var service = new SqlAnalysisService(sqlVersion: 160);
var result = service.AnalyzeSql(sqlCode);

if (result.Violations.Any())
{
    var llmService = new LlmFixService(
        apiEndpoint: "https://api.openai.com/v1",
        apiKey: "your-key",
        model: "gpt-4"
    );

    var fixResponse = await llmService.GetFixedSqlAsync(
        sqlCode, 
        "query.sql", 
        result.Violations
    );

    if (fixResponse.Success)
    {
        Console.WriteLine(fixResponse.FixedSql);
    }
}

Future Enhancements

Planned features:

  • 🔮 Directory batch fixing (--directory --fix)
  • 🔮 Interactive review mode (--fix --interactive)
  • 🔮 Dry-run mode (--fix --dry-run)
  • 🔮 Custom fix templates
  • 🔮 Local LLM support (Ollama, LLaMA, etc.)
  • 🔮 Cost estimation before fixing
  • 🔮 Incremental fixes (fix one violation type at a time)

Support


Quick Reference Card:

# Setup (one-time)
cd c:\Projects\Analyzer
copy .env.example .env
# Edit .env with your API key (keep .env in the repo root)

# Fix a SQL file
SqlAnalyzer.exe --file query.sql --fix

# With JSON report
SqlAnalyzer.exe --file query.sql --fix --json

# Check what would be fixed (no LLM call)
SqlAnalyzer.exe --file query.sql --summary

Happy fixing! 🚀