Skip to content

AlgoFoe/tinyh2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TinyH2 - A Lightweight Embedded SQL DB Engine

Overview

TinyH2 is a lightweight, and embeddable SQL database engine written entirely in Java from scratch. It is designed to be included in any Java application as a simple JAR file, providing robust database functionality without requiring a separate server process.

Inspired by engines like H2, TinyH2 supports a rich subset of SQL, features B-Tree indexing for performance, and implements full ACID transactions with multi-threaded concurrency control (including deadlock detection), making it a powerful tool for projects needing simple, reliable, and persistent data storage within the application itself. Data is stored locally in human-readable CSV files, simplifying inspection and debugging.

Features Implemented

This engine is not just a simple parser; it implements the core features of a modern relational database:

  • SQL Command Support:
    • DDL: CREATE TABLE (including IF NOT EXISTS), CREATE INDEX
    • DML: INSERT INTO, SELECT (including *), UPDATE, DELETE FROM
    • Transaction Control: BEGIN TRANSACTION (or implicit start via setAutoCommit(false)), COMMIT, ROLLBACK
  • Query Capabilities:
    • Complex WHERE clauses with =, !=, >, <, >=, <=, LIKE, IN, AND, OR.
    • GROUP BY for aggregation.
    • ORDER BY with ASC and DESC support.
    • Aggregate Functions: COUNT, SUM, AVG, MIN, MAX.
  • Data Types & Constraints:
    • Supports INT/INTEGER, TEXT, VARCHAR(n), DECIMAL(p,s), BOOLEAN, etc. Data validation is performed based on type.
    • Enforces PRIMARY KEY (implies UNIQUE and NOT NULL), UNIQUE, NOT NULL, and FOREIGN KEY constraints.
    • AUTO_INCREMENT support for primary keys, with the next ID managed reliably across sessions and rollbacks in _meta.txt files.
  • High-Performance Indexing:
    • B-Tree Indexes: Uses persistent B-Trees (serialized to .btr.idx files) for fast key-based lookups and sorted retrieval.
    • Automatic Indexing: Automatically creates indexes on PRIMARY KEY and UNIQUE columns.
    • Query Optimization: Automatically uses available B-Tree indexes to significantly accelerate:
      • WHERE column = value lookups.
      • ORDER BY column operations (completely avoiding in-memory sorts).
    • LRU Caching: Manages an in-memory, fixed-size Least Recently Used (LRU) cache for B-Tree indexes to minimize disk I/O for frequently accessed indexes.
  • Full ACID Transactions:
    • Atomicity & Durability: Implements a Write-Ahead Log (WAL) (wal.log). All changes (inserts, updates, deletes) are written to the log before data files are touched, guaranteeing that the database can recover to a consistent state after a crash. Includes basic recovery logic on startup.
    • Isolation (Multi-Threaded): Provides session-based, isolated transactions using an in-memory LockManager. This allows multiple threads within the same application to use the database concurrently without corrupting data or seeing partial changes from other threads. It implements a Read Committed isolation level (transactions only see committed data).
    • Concurrency Control: Uses row-level locking (based on primary key or an internal identifier) for UPDATE and DELETE, and table-level locking for INSERT.
    • Deadlock Detection: The LockManager builds a waits-for graph in real-time to detect deadlocks between transactions. When a cycle is found, it automatically chooses a "victim" transaction, rolls it back, and throws a DeadlockException, allowing other transactions to proceed.
  • JDBC-Style API:
    • Provides a clean, intuitive public API via api.TinyH2, api.Connection, and api.ResultSet, designed to feel familiar to developers who have used JDBC.
    • ResultSet object includes a built-in .print() method for easy console output.
  • Command-Line Interface (CLI):
    • Includes an interactive CLI for direct database interaction and testing.
    • Supports executing SQL commands from script files as well.

How It Works: Architecture

TinyH2 follows a well-defined layered architecture:

  1. API Layer (api package): The public interface. Developers interact with TinyH2.getConnection(), Connection, and ResultSet. This layer hides the internal engine complexity.
  2. Session Layer (engine.Session): Manages a single client connection and holds their TransactionContext. It acts as an intermediary between the API and the core engine.
  3. Parsing Layer (parser package): The SQLParser uses regular expressions for tokenization and a recursive-descent strategy to build an Abstract Syntax Tree (AST) object (like SelectQuery, InsertQuery, TransactionQuery) representing the SQL command.
  4. Execution Layer (engine.DatabaseEngine): The central coordinator. It receives the AST from the Session, manages shared resources (like LockManager, LogManager, metadata cache), orchestrates transaction steps, decides query execution plans (e.g., use index or full scan), and interacts with lower layers.
  5. Transaction Layer (transaction package):
    • LockManager: Manages in-memory locks (synchronized, wait, notifyAll) for rows/tables and implements the waits-for graph for deadlock detection.
    • LogManager: Appends all change operations (INSERT, UPDATE_BEFORE, UPDATE_AFTER, DELETE) to the wal.log for durability.
    • TransactionContext: A per-session object that holds the transaction ID, inTransaction status, the in-memory transactionBuffer (pending table changes), and the set of currently held locks.
  6. Indexing Layer (index package):
    • IndexManager: Manages the LRU cache of B-Trees, loading them from disk (.btr.idx files) or cache, and saving them.
    • BTree: The core B-Tree implementation (nodes, keys, values=offsets), including insertion, search, and ordered traversal methods.
  7. Storage Layer (storage package):
    • TableStorage: The lowest layer, responsible for the physical reading and writing of schema (_schema.csv) and data (_data.csv) files using Apache Commons CSV. This layer is only touched when a transaction is committed or when tables are initially loaded.

How to Use TinyH2 (API Guide)

Embedding TinyH2 in your own Java application is designed to be straightforward, mimicking the standard JDBC flow.

1. Add the JAR

Build the project with mvn clean package. This creates a self-contained "fat JAR" in the target/ directory (e.g., tinyh2-1.0-SNAPSHOT.jar). Add this JAR to your application's classpath (e.g., put it in a lib folder and add it in your IDE or build script).

2. Get a Connection

Always start by getting a Connection object using the static TinyH2.getConnection() method. Use a try-with-resources statement to ensure the connection is properly closed (which also handles rolling back any uncommitted transactions).

import api.Connection;
import api.TinyH2;
import java.sql.SQLException;

// ...

try (Connection conn = TinyH2.getConnection()) {
    // Your database code goes here...
    System.out.println("Connected!");
} catch (SQLException e) {
    System.err.println("Error: " + e.getMessage());
}

3. Basic Operations (Auto-Commit)

By default, the connection is in auto-commit mode. Every statement is its own transaction and is saved immediately.

try (Connection conn = TinyH2.getConnection()) {
    conn.executeUpdate("CREATE TABLE users (id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(50), age INT)");

    conn.executeUpdate("INSERT INTO users (name, age) VALUES ('Alice', 30)");

    ResultSet rs = conn.executeQuery("SELECT * FROM users");
    rs.print(); // Prints a formatted table to the console

} catch (SQLException e) { /* ... handle error ... */ }

4. Manual Transactions (COMMIT / ROLLBACK)

This is the standard pattern for safe, multi-statement operations, demonstrated in your TestTinyH2.java:

  1. Disable auto-commit: conn.setAutoCommit(false); (This implicitly begins the transaction).
  2. Execute your sequence of executeUpdate calls.
  3. If all succeed, make them permanent: conn.commit();.
  4. Call conn.rollback() in a catch block to undo all changes if an error occurs.
  5. Always re-enable auto-commit in a finally block if you turned it off: conn.setAutoCommit(true);.
Connection conn = null;
try {
    conn = TinyH2.getConnection();

    // 1. Start manual transaction mode
    conn.setAutoCommit(false);
    System.out.println("Auto-commit disabled. Transaction is active.");

    try {
        // 2. Execute multiple statements. These are only in memory.
        conn.executeUpdate("INSERT INTO users (id, name, age) VALUES (7, 'David', 28)");
        conn.executeUpdate("INSERT INTO users (id, name, age) VALUES (8, 'Eve', 32)");

        System.out.println("\nData within transaction (before commit):");
        ResultSet rs = conn.executeQuery("SELECT * FROM users");
        rs.print();

        // 3. Make the changes permanent
        conn.commit();
        System.out.println("Transaction committed.");

    } catch (SQLException e) {
        // 4. If anything fails, undo everything
        System.err.println("Transaction failed, rolling back: " + e.getMessage());
        if (conn != null) {
            try {
                conn.rollback();
                System.out.println("Transaction rolled back.");
            } catch (SQLException ex) {
                System.err.println("Rollback failed: " + ex.getMessage());
            }
        }
    } finally {
        // 5. Always restore auto-commit
        if (conn != null) {
            try {
                conn.setAutoCommit(true);
                System.out.println("\nAuto-commit re-enabled.");
            } catch (SQLException ex) {
                System.err.println("Failed to restore auto-commit: " + ex.getMessage());
            }
        }
    }
} catch (SQLException e) {
    System.err.println("Initial connection error: " + e.getMessage());
} finally {
    // Close connection if obtained
    if (conn != null) {
        try {
            conn.close();
        } catch (SQLException ex) {
             System.err.println("Failed to close connection: " + ex.getMessage());
        }
    }
}

Limitations & Future Work

While TinyH2 implements a substantial set of database features, it is still a lightweight engine and has several areas for potential improvement.

  • Inter-Process Locking: The current LockManager is designed for multi-threading within a single JVM process. It is not safe to have multiple independent applications (separate processes) accessing the same database files simultaneously. Implementing robust inter-process locking (e.g., using OS-level file locks) would be required for this use case.

  • WAL Recovery: The LogManager includes basic "redo" logic for committed transactions found in the WAL upon startup. However, it lacks robust "undo" logic for transactions that were incomplete or explicitly rolled back, which is necessary for full crash recovery guarantees in all scenarios (especially for UPDATEs).

  • Checkpointing: The Write-Ahead Log (wal.log) currently grows indefinitely. A checkpointing mechanism is needed to periodically flush all dirty data pages (or in this case, commit buffered tables) to the main data files and then safely truncate the log.

  • Advanced Query Optimization: The optimizer primarily uses indexes for direct equality lookups (WHERE col = val) and single-column ORDER BY. It does not support:

    • Index usage for range queries (>, <, LIKE), JOIN operations, or OR conditions.
    • Cost-based optimization to choose between multiple indexes or index vs. full scan.
    • Composite index optimization.
  • Concurrency Levels: Implements a basic Read Committed isolation level with exclusive locking. Does not support more advanced levels (e.g., Serializable) or optimizations like Shared/Read locks.

  • Limited SQL Features: Does not support JOIN, subqueries, views, triggers, stored procedures, ALTER TABLE, DROP TABLE, DROP INDEX.

  • Data Type Variety: Supports common types but lacks more specialized types like TIMESTAMP WITH TIME ZONE, INTERVAL, complex BLOB/CLOB handling.