Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: .NET CI

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build-and-test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '10.0.x'

- name: Restore tools
run: dotnet tool restore

- name: Check Formatting
run: dotnet fantomas . --check

- name: Restore dependencies
run: dotnet restore

- name: Build
run: dotnet build --no-restore -c Release

- name: Run Unit Tests
run: dotnet test tests/AVLSet.UnitTests/AVLSet.UnitTests.fsproj --no-build -c Release
115 changes: 114 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,114 @@
# AVLSetFSharp
# AVLSetFSharp

![.NET CI](https://github.com/LeoN192/AVLSetFSharp/actions/workflows/ci.yml/badge.svg)
![Formatter](https://img.shields.io/badge/format-Fantomas-blue?logo=fsharp&logoColor=white)
![.NET](https://img.shields.io/badge/.NET-10.0-purple?logo=dotnet&logoColor=white)
![License](https://img.shields.io/badge/license-BSD_3--Clause-blue)

## Overview

This repository contains a high-performance, purely functional **Set** data structure implemented using a self-balancing **AVL Tree** in F#.

Unlike standard library collections, this implementation focuses on efficient set-theoretic operations (Union, Intersection, Difference, Symmetrical Difference) using the **Split/Join** algorithm, providing a solid foundation for both sequential and parallel data processing.

## Technical Details & Architecture

The project is built on the principles of immutability and persistent data structures. Every modification returns a new version of the set, while maximizing node sharing to optimize memory usage.

### Core Algorithms & Complexity
1. **Basic Operations**: `add`, `delete`, `contains` are implemented with $O(\log n)$ time complexity.
2. **Set-theoretic Operations**: `union`, `intersection`, `difference`, `symmetrical difference` utilize the **Split & Join** approach.
* Efficiency: This reduces complexity to $O(m \log (n/m))$ where m is the size of the smaller set.
3. **Parallelism**: Recursive set operations are implemented using `Parallel.Invoke`.

### Invariants & Balancing
* Balance Factor: For every node, the height difference between left and right subtrees is at most 1.
* Rotations: Four types of rotations (LL, RR, LR, RL) are performed automatically.

---

## Quick Start

### Requirements
* .NET SDK 10.0+
* Fantomas

### 1. Setup & Build
```bash
# Restore local tools (Fantomas)
dotnet tool restore

# Build the entire solution
dotnet build -c Release
```

### 2. Run Tests
```bash
dotnet test
```

### 3. Run Benchmarks
```bash
dotnet run -c Release --project benchmarks/AVLSet.Benchmarks
```

---

## Usage Example

```fsharp
open AVLSet.Library

// 1. Create sets from sequences
let setA = [1..10] |> List.fold (fun s v -> AVLSet.add v s) AVLSet.empty
let setB = [5..15] |> List.fold (fun s v -> AVLSet.add v s) AVLSet.empty

// 2. Perform set operations
let unionSet = AVLSet.union setA setB
let interSet = AVLSet.intersection setA setB

// 3. Check membership
if AVLSet.contains 7 interSet then
printfn "Intersection contains 7"

// 4. Parallel operations for large data
let opts = System.Threading.Tasks.ParallelOptions(MaxDegreeOfParallelism = 4)
let largeUnion = AVLSet.parallelUnion opts setA setB
```

---

## API Reference

The `AVLSet` module provides a comprehensive interface:

| Function | Signature | Description |
|:---|:---|:---|
| **add** | `'a -> AVLTree<'a> -> AVLTree<'a>` | Adds an element. |
| **delete** | `'a -> AVLTree<'a> -> AVLTree<'a>` | Removes an element. |
| **contains** | `'a -> AVLTree<'a> -> bool` | Checks membership. |
| **union** | `AVLTree<'a> -> AVLTree<'a> -> AVLTree<'a>` | Standard union ($A \cup B$). |
| **intersection** | `AVLTree<'a> -> AVLTree<'a> -> AVLTree<'a>` | Standard intersection ($A \cap B$). |
| **difference** | `AVLTree<'a> -> AVLTree<'a> -> AVLTree<'a>` | Standard difference ($A \setminus B$). |
| **symmDifference** | `AVLTree<'a> -> AVLTree<'a> -> AVLTree<'a>` | Standard symmetrical difference ($A \vartriangle B$). |
| **parallel(Union/Intersection/Difference/SymmDiff)**| `ParallelOptions -> AVLTree<'a> -> AVLTree<'a> -> AVLTree<'a>` | Multi-threaded set-theoretic operations. |
| **(union/intersection/difference/symmDiff)Traversal**| `AVLTree<'a> -> AVLTree<'a> -> AVLTree<'a>` | Set-theoretic operations via tree traversal. |

---

## Project Structure
```text
/src
└── AVLSet.Library
├── AVLSet.Library.fsproj
   └── Library.fs
/tests
└── AVLSet.UnitTests
├── AVLSet.UnitTests.fsproj
   └── Tests.fs
/benchmarks
└── AVLSet.Benchmarks
├── AVLSet.Benchmarks.fsproj
   ├── Program.fs
└── Bemchmarks.fs
```
74 changes: 39 additions & 35 deletions benchmarks/AVLSet.Benchmarks/Benchmarks.fs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
namespace AVLSet.Benchmarks

open System.Threading.Tasks
open System.Threading.Tasks
open BenchmarkDotNet.Diagnosers
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Configs
Expand All @@ -10,119 +10,123 @@ open AVLSet.Library
[<CategoriesColumn>]
[<HtmlExporter>]
[<MemoryDiagnoser>]
[<ThreadingDiagnoser>]
[<HardwareCounters(
HardwareCounter.CacheMisses,
HardwareCounter.BranchMispredictions)>]
[<ThreadingDiagnoser>]
[<HardwareCounters(HardwareCounter.CacheMisses, HardwareCounter.BranchMispredictions)>]
type SetBenchmarks() =
let rnd = System.Random(1234561)

[<Params(100, 10000, 1000000)>]
[<DefaultValue>]
val mutable public A : int
val mutable public A: int

[<Params(100, 1000, 100000)>]
[<DefaultValue>]
val mutable public B : int
val mutable public B: int

[<Params("Random", "Sorted")>]
[<DefaultValue>]
val mutable public DataTypeA : string
val mutable public DataTypeA: string

[<Params(1, 2, 4, 8)>]
[<DefaultValue>]
val mutable public Threads : int
val mutable public Threads: int

[<DefaultValue>]
val mutable public rndInt : int
val mutable public rndInt: int

[<DefaultValue>]
val mutable public setA : AVLTree<int>
val mutable public setA: AVLTree<int>

[<DefaultValue>]
val mutable public setB : AVLTree<int>
val mutable public setB: AVLTree<int>

[<GlobalSetup>]
member self.Setup() =
self.rndInt <- rnd.Next(self.A + 1, self.A + 1000)
self.rndInt <- rnd.Next(self.A + 1, self.A + 1000)

let dataA =
let dataA =
match self.DataTypeA with
| "Random" -> Array.init self.A (fun _ -> rnd.Next())
| _ -> [| 1 .. self.A |]
let dataB = Array.init self.B (fun _ -> rnd.Next())


let dataB = Array.init self.B (fun _ -> rnd.Next())

self.setA <- dataA |> Array.fold (fun s v -> AVLSet.add v s) AVLSet.empty
self.setB <- dataB |> Array.fold (fun s v -> AVLSet.add v s) AVLSet.empty

[<Benchmark>]
[<BenchmarkCategory("Adding")>]
member self.``Adding one element`` () = AVLSet.add self.rndInt self.setA
member self.``Adding one element``() = AVLSet.add self.rndInt self.setA

[<Benchmark>]
[<BenchmarkCategory("Deleting")>]
member self.``Deleting one element`` () = AVLSet.delete self.rndInt self.setA
member self.``Deleting one element``() = AVLSet.delete self.rndInt self.setA

[<Benchmark(Baseline = true)>]
[<BenchmarkCategory("Union")>]
member self.``Sequential union`` () = AVLSet.union self.setA self.setB
member self.``Sequential union``() = AVLSet.union self.setA self.setB

[<Benchmark>]
[<BenchmarkCategory("Union")>]
member self.``Union via tree traversal`` () = AVLSet.unionTraversal self.setA self.setB
member self.``Union via tree traversal``() =
AVLSet.unionTraversal self.setA self.setB

[<Benchmark>]
[<BenchmarkCategory("Union")>]
member self.``Parallel union with threads`` () =
member self.``Parallel union with threads``() =
let opts = ParallelOptions()
opts.MaxDegreeOfParallelism <- self.Threads

AVLSet.parallelUnion opts self.setA self.setB

[<Benchmark(Baseline = true)>]
[<BenchmarkCategory("Intersection")>]
member self.``Sequential intersection`` () = AVLSet.intersection self.setA self.setB
member self.``Sequential intersection``() = AVLSet.intersection self.setA self.setB

[<Benchmark>]
[<BenchmarkCategory("Intersection")>]
member self.``Intersection via tree traversal`` () = AVLSet.intersectionTraversal self.setA self.setB
member self.``Intersection via tree traversal``() =
AVLSet.intersectionTraversal self.setA self.setB

[<Benchmark>]
[<BenchmarkCategory("Intersection")>]
member self.``Parallel intersection with threads`` () =
member self.``Parallel intersection with threads``() =
let opts = ParallelOptions()
opts.MaxDegreeOfParallelism <- self.Threads

AVLSet.parallelIntersection opts self.setA self.setB

[<Benchmark(Baseline = true)>]
[<BenchmarkCategory("Difference")>]
member self.``Sequential difference`` () = AVLSet.difference self.setA self.setB
member self.``Sequential difference``() = AVLSet.difference self.setA self.setB

[<Benchmark>]
[<BenchmarkCategory("Difference")>]
member self.``Difference via tree traversal`` () = AVLSet.differenceTraversal self.setA self.setB
member self.``Difference via tree traversal``() =
AVLSet.differenceTraversal self.setA self.setB

[<Benchmark>]
[<BenchmarkCategory("Difference")>]
member self.``Parallel difference with threads`` () =
member self.``Parallel difference with threads``() =
let opts = ParallelOptions()
opts.MaxDegreeOfParallelism <- self.Threads

AVLSet.parallelDifference opts self.setA self.setB

[<Benchmark(Baseline = true)>]
[<BenchmarkCategory("Symmetrical Difference")>]
member self.``Sequential symmetrical difference`` () = AVLSet.symmDifference self.setA self.setB
member self.``Sequential symmetrical difference``() =
AVLSet.symmDifference self.setA self.setB

[<Benchmark>]
[<BenchmarkCategory("Symmetrical Difference")>]
member self.``Symmetrical difference via tree traversal`` () = AVLSet.symmDifferenceTraversal self.setA self.setB
member self.``Symmetrical difference via tree traversal``() =
AVLSet.symmDifferenceTraversal self.setA self.setB

[<Benchmark>]
[<BenchmarkCategory("Symmetrical Difference")>]
member self.``Parallel symmetrical difference with threads`` () =
member self.``Parallel symmetrical difference with threads``() =
let opts = ParallelOptions()
opts.MaxDegreeOfParallelism <- self.Threads

AVLSet.parallelSymmDifference opts self.setA self.setB
2 changes: 1 addition & 1 deletion benchmarks/AVLSet.Benchmarks/Program.fs
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ open AVLSet.Benchmarks
[<EntryPoint>]
let main args =
BenchmarkRunner.Run<SetBenchmarks>() |> ignore
0
0
13 changes: 13 additions & 0 deletions dotnet-tools.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"version": 1,
"isRoot": true,
"tools": {
"fantomas": {
"version": "7.0.5",
"commands": [
"fantomas"
],
"rollForward": false
}
}
}
Loading
Loading