TinyH2 is a lightweight, and embeddable SQL database engine written entirely in Java from scratch. It is designed to be included in any Java application as a simple JAR file, providing robust database functionality without requiring a separate server process.
Inspired by engines like H2, TinyH2 supports a rich subset of SQL, features B-Tree indexing for performance, and implements full ACID transactions with multi-threaded concurrency control (including deadlock detection), making it a powerful tool for projects needing simple, reliable, and persistent data storage within the application itself. Data is stored locally in human-readable CSV files, simplifying inspection and debugging.
This engine is not just a simple parser; it implements the core features of a modern relational database:
- SQL Command Support:
- DDL:
CREATE TABLE(includingIF NOT EXISTS),CREATE INDEX - DML:
INSERT INTO,SELECT(including*),UPDATE,DELETE FROM - Transaction Control:
BEGIN TRANSACTION(or implicit start viasetAutoCommit(false)),COMMIT,ROLLBACK
- DDL:
- Query Capabilities:
- Complex
WHEREclauses with=,!=,>,<,>=,<=,LIKE,IN,AND,OR. GROUP BYfor aggregation.ORDER BYwithASCandDESCsupport.- Aggregate Functions:
COUNT,SUM,AVG,MIN,MAX.
- Complex
- Data Types & Constraints:
- Supports
INT/INTEGER,TEXT,VARCHAR(n),DECIMAL(p,s),BOOLEAN, etc. Data validation is performed based on type. - Enforces
PRIMARY KEY(impliesUNIQUEandNOT NULL),UNIQUE,NOT NULL, andFOREIGN KEYconstraints. AUTO_INCREMENTsupport for primary keys, with the next ID managed reliably across sessions and rollbacks in_meta.txtfiles.
- Supports
- High-Performance Indexing:
- B-Tree Indexes: Uses persistent B-Trees (serialized to
.btr.idxfiles) for fast key-based lookups and sorted retrieval. - Automatic Indexing: Automatically creates indexes on
PRIMARY KEYandUNIQUEcolumns. - Query Optimization: Automatically uses available B-Tree indexes to significantly accelerate:
WHERE column = valuelookups.ORDER BY columnoperations (completely avoiding in-memory sorts).
- LRU Caching: Manages an in-memory, fixed-size Least Recently Used (LRU) cache for B-Tree indexes to minimize disk I/O for frequently accessed indexes.
- B-Tree Indexes: Uses persistent B-Trees (serialized to
- Full ACID Transactions:
- Atomicity & Durability: Implements a Write-Ahead Log (WAL) (
wal.log). All changes (inserts, updates, deletes) are written to the log before data files are touched, guaranteeing that the database can recover to a consistent state after a crash. Includes basic recovery logic on startup. - Isolation (Multi-Threaded): Provides session-based, isolated transactions using an in-memory
LockManager. This allows multiple threads within the same application to use the database concurrently without corrupting data or seeing partial changes from other threads. It implements a Read Committed isolation level (transactions only see committed data). - Concurrency Control: Uses row-level locking (based on primary key or an internal identifier) for
UPDATEandDELETE, and table-level locking forINSERT. - Deadlock Detection: The
LockManagerbuilds a waits-for graph in real-time to detect deadlocks between transactions. When a cycle is found, it automatically chooses a "victim" transaction, rolls it back, and throws aDeadlockException, allowing other transactions to proceed.
- Atomicity & Durability: Implements a Write-Ahead Log (WAL) (
- JDBC-Style API:
- Provides a clean, intuitive public API via
api.TinyH2,api.Connection, andapi.ResultSet, designed to feel familiar to developers who have used JDBC. ResultSetobject includes a built-in.print()method for easy console output.
- Provides a clean, intuitive public API via
- Command-Line Interface (CLI):
- Includes an interactive CLI for direct database interaction and testing.
- Supports executing SQL commands from script files as well.
TinyH2 follows a well-defined layered architecture:
- API Layer (
apipackage): The public interface. Developers interact withTinyH2.getConnection(),Connection, andResultSet. This layer hides the internal engine complexity. - Session Layer (
engine.Session): Manages a single client connection and holds theirTransactionContext. It acts as an intermediary between the API and the core engine. - Parsing Layer (
parserpackage): TheSQLParseruses regular expressions for tokenization and a recursive-descent strategy to build an Abstract Syntax Tree (AST) object (likeSelectQuery,InsertQuery,TransactionQuery) representing the SQL command. - Execution Layer (
engine.DatabaseEngine): The central coordinator. It receives the AST from theSession, manages shared resources (likeLockManager,LogManager, metadata cache), orchestrates transaction steps, decides query execution plans (e.g., use index or full scan), and interacts with lower layers. - Transaction Layer (
transactionpackage):LockManager: Manages in-memory locks (synchronized,wait,notifyAll) for rows/tables and implements the waits-for graph for deadlock detection.LogManager: Appends all change operations (INSERT,UPDATE_BEFORE,UPDATE_AFTER,DELETE) to thewal.logfor durability.TransactionContext: A per-session object that holds the transaction ID,inTransactionstatus, the in-memorytransactionBuffer(pending table changes), and the set of currently held locks.
- Indexing Layer (
indexpackage):IndexManager: Manages the LRU cache of B-Trees, loading them from disk (.btr.idxfiles) or cache, and saving them.BTree: The core B-Tree implementation (nodes, keys, values=offsets), including insertion, search, and ordered traversal methods.
- Storage Layer (
storagepackage):TableStorage: The lowest layer, responsible for the physical reading and writing of schema (_schema.csv) and data (_data.csv) files using Apache Commons CSV. This layer is only touched when a transaction is committed or when tables are initially loaded.
Embedding TinyH2 in your own Java application is designed to be straightforward, mimicking the standard JDBC flow.
Build the project with mvn clean package. This creates a self-contained "fat JAR" in the target/ directory (e.g., tinyh2-1.0-SNAPSHOT.jar). Add this JAR to your application's classpath (e.g., put it in a lib folder and add it in your IDE or build script).
Always start by getting a Connection object using the static TinyH2.getConnection() method. Use a try-with-resources statement to ensure the connection is properly closed (which also handles rolling back any uncommitted transactions).
import api.Connection;
import api.TinyH2;
import java.sql.SQLException;
// ...
try (Connection conn = TinyH2.getConnection()) {
// Your database code goes here...
System.out.println("Connected!");
} catch (SQLException e) {
System.err.println("Error: " + e.getMessage());
}By default, the connection is in auto-commit mode. Every statement is its own transaction and is saved immediately.
try (Connection conn = TinyH2.getConnection()) {
conn.executeUpdate("CREATE TABLE users (id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(50), age INT)");
conn.executeUpdate("INSERT INTO users (name, age) VALUES ('Alice', 30)");
ResultSet rs = conn.executeQuery("SELECT * FROM users");
rs.print(); // Prints a formatted table to the console
} catch (SQLException e) { /* ... handle error ... */ }This is the standard pattern for safe, multi-statement operations, demonstrated in your TestTinyH2.java:
- Disable auto-commit:
conn.setAutoCommit(false);(This implicitly begins the transaction). - Execute your sequence of
executeUpdatecalls. - If all succeed, make them permanent:
conn.commit();. - Call
conn.rollback()in acatchblock to undo all changes if an error occurs. - Always re-enable auto-commit in a
finallyblock if you turned it off:conn.setAutoCommit(true);.
Connection conn = null;
try {
conn = TinyH2.getConnection();
// 1. Start manual transaction mode
conn.setAutoCommit(false);
System.out.println("Auto-commit disabled. Transaction is active.");
try {
// 2. Execute multiple statements. These are only in memory.
conn.executeUpdate("INSERT INTO users (id, name, age) VALUES (7, 'David', 28)");
conn.executeUpdate("INSERT INTO users (id, name, age) VALUES (8, 'Eve', 32)");
System.out.println("\nData within transaction (before commit):");
ResultSet rs = conn.executeQuery("SELECT * FROM users");
rs.print();
// 3. Make the changes permanent
conn.commit();
System.out.println("Transaction committed.");
} catch (SQLException e) {
// 4. If anything fails, undo everything
System.err.println("Transaction failed, rolling back: " + e.getMessage());
if (conn != null) {
try {
conn.rollback();
System.out.println("Transaction rolled back.");
} catch (SQLException ex) {
System.err.println("Rollback failed: " + ex.getMessage());
}
}
} finally {
// 5. Always restore auto-commit
if (conn != null) {
try {
conn.setAutoCommit(true);
System.out.println("\nAuto-commit re-enabled.");
} catch (SQLException ex) {
System.err.println("Failed to restore auto-commit: " + ex.getMessage());
}
}
}
} catch (SQLException e) {
System.err.println("Initial connection error: " + e.getMessage());
} finally {
// Close connection if obtained
if (conn != null) {
try {
conn.close();
} catch (SQLException ex) {
System.err.println("Failed to close connection: " + ex.getMessage());
}
}
}While TinyH2 implements a substantial set of database features, it is still a lightweight engine and has several areas for potential improvement.
-
Inter-Process Locking: The current
LockManageris designed for multi-threading within a single JVM process. It is not safe to have multiple independent applications (separate processes) accessing the same database files simultaneously. Implementing robust inter-process locking (e.g., using OS-level file locks) would be required for this use case. -
WAL Recovery: The
LogManagerincludes basic "redo" logic for committed transactions found in the WAL upon startup. However, it lacks robust "undo" logic for transactions that were incomplete or explicitly rolled back, which is necessary for full crash recovery guarantees in all scenarios (especially forUPDATEs). -
Checkpointing: The Write-Ahead Log (
wal.log) currently grows indefinitely. A checkpointing mechanism is needed to periodically flush all dirty data pages (or in this case, commit buffered tables) to the main data files and then safely truncate the log. -
Advanced Query Optimization: The optimizer primarily uses indexes for direct equality lookups (
WHERE col = val) and single-columnORDER BY. It does not support:- Index usage for range queries (
>,<,LIKE),JOINoperations, orORconditions. - Cost-based optimization to choose between multiple indexes or index vs. full scan.
- Composite index optimization.
- Index usage for range queries (
-
Concurrency Levels: Implements a basic Read Committed isolation level with exclusive locking. Does not support more advanced levels (e.g., Serializable) or optimizations like Shared/Read locks.
-
Limited SQL Features: Does not support
JOIN, subqueries, views, triggers, stored procedures,ALTER TABLE,DROP TABLE,DROP INDEX. -
Data Type Variety: Supports common types but lacks more specialized types like
TIMESTAMP WITH TIME ZONE,INTERVAL, complexBLOB/CLOBhandling.