SQL Analyzer - LLM-Powered Automated Fixing

This guide explains how to use the LLM-powered automated fixing feature to let AI suggest and apply fixes to SQL best practice violations.

Quick Start

# 1. Create .env file with your LLM API key (repo root)
cd c:\Projects\Analyzer
copy .env.example .env
# Edit .env and add your API key

# 2. Analyze and fix a SQL file
cd SqlAnalyzer\bin\Release\net8.0
SqlAnalyzer.exe --file "path\to\your\file.sql" --fix

What This Does

The --fix flag sends your SQL code and all detected violations to an LLM (OpenAI GPT-4 by default), which:

Analyzes each violation in context
Suggests specific fixes for each issue
Rewrites the SQL code with improvements
Documents what changed and why
Saves the fixed SQL to a new file

Configuration

Create .env File

Create a .env file at the repository root (same folder as Analyzer.sln):

# OpenAI Configuration
LLM_API_ENDPOINT=https://api.openai.com/v1
LLM_API_KEY=sk-your-actual-api-key-here
LLM_MODEL=gpt-4

Configuration Options

Variable	Description	Example
`LLM_API_ENDPOINT`	API endpoint URL	`https://api.openai.com/v1`
`LLM_API_KEY`	Your API key	`sk-...`
`LLM_MODEL`	Model to use	`gpt-4`, `gpt-4-turbo`, `gpt-3.5-turbo`

Azure OpenAI

For Azure OpenAI, use:

LLM_API_ENDPOINT=https://your-resource.openai.azure.com
LLM_API_KEY=your-azure-api-key
LLM_MODEL=gpt-4

Custom OpenAI-Compatible APIs

The analyzer works with any OpenAI-compatible API:

LLM_API_ENDPOINT=https://your-custom-api.com/v1
LLM_API_KEY=your-api-key
LLM_MODEL=your-model-name

Usage Examples

Basic Fix

SqlAnalyzer.exe --file query.sql --fix

Output:

query.fixed.sql - Fixed SQL code
Console output showing changes made

With JSON Report

SqlAnalyzer.exe --file query.sql --fix --json

Output:

query.fixed.sql - Fixed SQL code
query.fix-report.json - Detailed JSON report with:
- Original violations
- LLM response (changes made, violations addressed)
- Output file path

Batch Processing (Coming Soon)

# Currently --fix only works with --file, not --directory
# Directory support planned for future release
SqlAnalyzer.exe --directory "C:\Scripts" --fix  # Not yet supported

Output Files

Fixed SQL File

Naming: original-name.fixed.sql

Example:

Input: GetUser.sql
Output: GetUser.fixed.sql

JSON Report (with --json flag)

Naming: original-name.fix-report.json

Example structure:

{
  "success": true,
  "fileName": "GetUser.sql",
  "originalViolations": [
    {
      "ruleName": "AvoidSelectStar",
      "description": "SELECT * should be replaced with explicit column list",
      "severity": "Warning",
      "line": 5,
      "column": 1
    }
  ],
  "llmFix": {
    "success": true,
    "fixedSql": "SELECT UserId, UserName, Email FROM Users WHERE UserId = @UserId",
    "changesMade": [
      "Replaced SELECT * with explicit column list (UserId, UserName, Email)",
      "Changed = NULL to IS NULL for proper null comparison"
    ],
    "violationsAddressed": [
      "AvoidSelectStar",
      "NullEqualityComparison"
    ],
    "notes": "All violations have been addressed. The code now follows SQL best practices."
  },
  "outputFile": "GetUser.fixed.sql"
}

Example Session

PS> .\SqlAnalyzer.exe --file BadQuery.sql --fix

SQL Server Best Practices Analyzer
====================================

Analyzing file: BadQuery.sql

Found 5 violation(s):

[Warning] AvoidSelectStar
  SELECT * should be replaced with explicit column list
  Location: Line 5, Column 1

[Error] MissingWhereClause
  UPDATE without WHERE will affect all rows
  Location: Line 10, Column 1

[Warning] MissingSchemaQualification
  Table 'Users' should be qualified as 'dbo.Users'
  Location: Line 5, Column 15

[Error] NullEqualityComparison
  Use IS NULL instead of = NULL
  Location: Line 7, Column 20

[Warning] AvoidNolock
  NOLOCK hint can cause dirty reads
  Location: Line 5, Column 20

Sending to LLM (gpt-4) for automated fixing...

✓ LLM successfully generated fixes!

Changes made:
  • Replaced SELECT * with explicit column list
  • Added WHERE clause with UserId parameter
  • Added schema qualification (dbo.Users)
  • Changed = NULL to IS NULL
  • Removed NOLOCK hint, added READ COMMITTED isolation level

Violations addressed:
  ✓ AvoidSelectStar
  ✓ MissingWhereClause
  ✓ MissingSchemaQualification
  ✓ NullEqualityComparison
  ✓ AvoidNolock

Notes:
  All violations have been addressed. Added proper error handling with TRY/CATCH
  and transaction management. The code now follows SQL Server best practices.

✓ Fixed SQL saved to: BadQuery.fixed.sql

What the LLM Fixes

The LLM is prompted to address all detected violations:

Performance Issues

SELECT * → Explicit column lists
NOLOCK hints → Proper isolation levels
Functions on indexed columns → Sargable predicates
Correlated subqueries → JOINs or CTEs

Security Issues

SQL injection risks → Parameterization
Missing error handling → TRY/CATCH blocks
Missing transactions → BEGIN/COMMIT/ROLLBACK

Code Quality

Missing WHERE clauses → Safe WHERE conditions
= NULL → IS NULL
Old-style joins → ANSI JOIN syntax
Missing schema qualification → dbo.TableName

Design Issues

Deprecated types (TEXT, NTEXT) → VARCHAR(MAX), NVARCHAR(MAX)
MONEY type → DECIMAL(19,4)

Cost Considerations

LLM API calls cost money! Estimates:

File Size	Tokens	GPT-4 Cost	GPT-3.5 Cost
Small (100 lines)	~3K	$0.09	$0.006
Medium (500 lines)	~10K	$0.30	$0.020
Large (2000 lines)	~30K	$0.90	$0.060

Tips to reduce costs:

Use gpt-3.5-turbo for simple fixes
Fix critical violations manually first
Use --summary to preview violations before fixing
Fix high-impact violations only

Troubleshooting

"Error: .env file not found"

Solution: Create .env file at the repository root (same folder as Analyzer.sln). The CLI automatically loads it from there.

  cd c:\Projects\Analyzer
  copy .env.example .env
  # Edit .env with your API key

"Error: .env file is missing required settings"

Solution: Ensure your .env has both LLM_API_ENDPOINT and LLM_API_KEY

LLM_API_ENDPOINT=https://api.openai.com/v1
LLM_API_KEY=sk-your-key-here

"Error calling LLM: Unauthorized"

Solution: Check your API key is valid

OpenAI: https://platform.openai.com/api-keys
Azure: https://portal.azure.com

"Error calling LLM: Timeout"

Solution: LLM took too long (>2 minutes)

Try a faster model (gpt-3.5-turbo)
Split large files into smaller chunks
Check your network connection

"LLM fix failed: Invalid JSON response"

Solution: LLM returned malformed output

This is rare with GPT-4
Try again (LLM responses can vary)
Report the issue with your SQL file

Fixed SQL has syntax errors

Solution: Re-analyze the fixed file

SqlAnalyzer.exe --file query.fixed.sql

If violations remain, the LLM may need better context. Consider:

Adding comments explaining business logic
Simplifying complex queries
Fixing critical issues manually first

Limitations

Current Limitations

✅ Works with: Single SQL files (--file)
❌ Not yet: Directory analysis (--directory)
❌ Not yet: Execution plans, Extended Events, DACPACs
❌ Not yet: Interactive review before applying fixes

What LLM Cannot Fix

Schema-dependent issues: Missing indexes, invalid columns (needs actual database)
Business logic errors: LLM doesn't know your requirements
Performance without execution plans: LLM can't see runtime statistics
Production-specific issues: Connection strings, server names, etc.

Best Practices

✅ DO:

Review fixed SQL before deploying
Test in development first
Keep original files as backup (analyzer does this automatically)
Use version control
Start with small/simple files

❌ DON'T:

Blindly apply fixes to production
Fix files with sensitive data (API sends to OpenAI!)
Expect LLM to understand complex business rules
Use on files with inline credentials

Security & Privacy

Data Transmission

⚠️ IMPORTANT: When using --fix, your SQL code is sent to the configured LLM API (OpenAI by default).

What's sent:

Your entire SQL file content
All detected violations
File name

What's NOT sent:

Your API key (used for authentication only)
Other files in your directory
Database connection strings (unless they're in the SQL file!)

Recommendations

Remove sensitive data from SQL files before fixing
Use self-hosted LLM for proprietary code
Review .fixed.sql files before committing to source control
Consider local-only analysis for sensitive environments (don't use --fix)

Programmatic Usage

You can also use the LLM fix service in your own code:

using SqlAnalyzer;

var service = new SqlAnalysisService(sqlVersion: 160);
var result = service.AnalyzeSql(sqlCode);

if (result.Violations.Any())
{
    var llmService = new LlmFixService(
        apiEndpoint: "https://api.openai.com/v1",
        apiKey: "your-key",
        model: "gpt-4"
    );

    var fixResponse = await llmService.GetFixedSqlAsync(
        sqlCode, 
        "query.sql", 
        result.Violations
    );

    if (fixResponse.Success)
    {
        Console.WriteLine(fixResponse.FixedSql);
    }
}

Future Enhancements

Planned features:

🔮 Directory batch fixing (--directory --fix)
🔮 Interactive review mode (--fix --interactive)
🔮 Dry-run mode (--fix --dry-run)
🔮 Custom fix templates
🔮 Local LLM support (Ollama, LLaMA, etc.)
🔮 Cost estimation before fixing
🔮 Incremental fixes (fix one violation type at a time)

Support

Issues: GitHub Issues
Documentation: README.md
Command Reference: COPILOT_COMMAND.md

Quick Reference Card:

# Setup (one-time)
cd c:\Projects\Analyzer
copy .env.example .env
# Edit .env with your API key (keep .env in the repo root)

# Fix a SQL file
SqlAnalyzer.exe --file query.sql --fix

# With JSON report
SqlAnalyzer.exe --file query.sql --fix --json

# Check what would be fixed (no LLM call)
SqlAnalyzer.exe --file query.sql --summary

Happy fixing! 🚀

FilesExpand file tree

LLM_FIX_GUIDE.md

Latest commit

History