Skip to content

Commit 3a64deb

Browse files
Update resources.md
1 parent 6741be7 commit 3a64deb

File tree

1 file changed

+13
-59
lines changed

1 file changed

+13
-59
lines changed

resources.md

Lines changed: 13 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ title: Resources on Large Language Models (LLMs)
5454
<details id="resourcesContent" markdown="1" hidden aria-hidden="true">
5555
<summary>Hidden resources source</summary>
5656

57-
### 🎥📘 Introductory Video Material
57+
## 🎥📘 Introductory Video Material
5858

5959
- **[Dr. Wu-Jun Li's tutorial slides](https://cs.nju.edu.cn/lwj/slides/L2H.pdf)**: These tutorial slides by Dr. Wu-Jun Li offer a comprehensive introduction to learning to hash (L2H) techniques. It's an excellent resource for anyone seeking a deep understanding of hashing from a technical perspective.
6060

@@ -64,7 +64,7 @@ title: Resources on Large Language Models (LLMs)
6464

6565
- **[Hashing Algorithms for Large-Scale Machine Learning - 2017 Rice Machine Learning Workshop](https://www.youtube.com/embed/tQ0OJXowLJA)**: This video is a recording of a presentation from the 2017 Rice Machine Learning Workshop. It offers a detailed overview of various hashing algorithms used for large-scale machine learning.
6666

67-
### 🎤🧑‍🔬Conferences and Workshops
67+
## 🎤🧑‍🔬Conferences and Workshops
6868

6969
- **[IJCNN 2025: Scalable and Deep Graph Learning and Mining](https://www.ijcnn.org/)**: Workshop including hashing methods applied to graph structures for retrieval and similarity.
7070

@@ -84,7 +84,7 @@ title: Resources on Large Language Models (LLMs)
8484

8585
- **[SIAM International Conference on Data Mining (SDM)](https://www.siam.org/conferences/cm/conference/sdm22)**: SDM is an important conference for researchers in data mining, focusing on the latest developments in algorithms, data analysis, and big data applications.
8686

87-
### 📄🔬 Survey Papers
87+
## 📄🔬 Survey Papers
8888

8989
For a deeper dive, these survey papers are excellent resources:
9090

@@ -104,7 +104,7 @@ For a deeper dive, these survey papers are excellent resources:
104104

105105
- **[Learning to Hash With Binary Deep Neural Networks: A Survey](https://www.sciencedirect.com/science/article/abs/pii/S016786552030208X)**: This survey focuses on binary deep neural networks and their use in learning to hash. It explores how these networks are trained to produce compact binary codes that can be used for efficient data retrieval in large-scale datasets.
106106

107-
### 🎓📚 Courses
107+
## 🎓📚 Courses
108108

109109
Some university courses cover topics related to machine learning and efficient computing, with publicly available materials:
110110

@@ -116,7 +116,7 @@ Some university courses cover topics related to machine learning and efficient c
116116

117117
- **[CS276: Information Retrieval](https://web.stanford.edu/class/cs276/)** (Stanford University): A comprehensive, foundational course covering algorithms for vector similarity search, ranking, indexing, and hashing.
118118

119-
#### 🧠 DeepLearning.AI Short Courses on Vector Search & ANN
119+
## 🧠 DeepLearning.AI Short Courses on Vector Search & ANN
120120

121121
- **[Vector Databases: from Embeddings to Applications](https://www.deeplearning.ai/short-courses/vector-databases-embeddings-applications/?utm_source=chatgpt.com)**: Learn how vector databases work (dense vs sparse search, multilingual embeddings, hybrid search) with real-world applications using Weaviate. *(~55 min)*
122122

@@ -130,7 +130,7 @@ Some university courses cover topics related to machine learning and efficient c
130130

131131
- **[Prompt Compression and Query Optimization](https://www.deeplearning.ai/short-courses/prompt-compression-and-query-optimization/?utm_source=chatgpt.com)**: Covers retrieval latency reduction via query filtering, projection, re-ranking, and prompt shortening — with examples using MongoDB Atlas Vector Search.
132132

133-
### 📝📰 Blog Posts
133+
## 📝📰 Blog Posts
134134

135135
Blog posts are a great way to keep up with cutting-edge research. Here are some of our favorites:
136136

@@ -154,9 +154,7 @@ Blog posts are a great way to keep up with cutting-edge research. Here are some
154154

155155
- **[What is Locality-Sensitive Hashing?](https://www.quora.com/What-is-locality-sensitive-hashing)**: This Quora discussion explains LSH in simple terms. It covers the core principles of how LSH works and why it is useful for approximate nearest neighbor search.
156156

157-
### 🧩💾 Hashing Software Packages
158-
159-
#### 📦 Hashing Algorithms
157+
## 📦 Hashing Algorithms
160158

161159
- **[Deep Hashing Toolbox](https://github.com/thulab/DeepHash)**: An open-source implementation designed for learning to hash with deep neural networks. Useful for deep similarity search research.
162160

@@ -166,7 +164,7 @@ Blog posts are a great way to keep up with cutting-edge research. Here are some
166164

167165
- **[HashNet](https://github.com/thuml/HashNet)**: Implements HashNet, a deep hashing method that handles imbalanced data distributions and learns binary hash codes end-to-end.
168166

169-
#### 🏗️ Indexing / ANN Libraries
167+
## 🏗️ Indexing / ANN Libraries
170168

171169
- **[Faiss (Facebook AI Similarity Search)](https://github.com/facebookresearch/faiss)**: A powerful library by Facebook AI Research for efficient similarity search of dense vectors. Supports PQ, IVF, HNSW, and more.
172170

@@ -178,67 +176,23 @@ Blog posts are a great way to keep up with cutting-edge research. Here are some
178176

179177
- **[ScaNN (Scalable Nearest Neighbors)](https://github.com/google-research/google-research/tree/master/scann)**: Developed by Google Research, ScaNN is optimized for vector similarity search at production scale using quantization and reordering.
180178

181-
#### 🛠️ Vector Databases
179+
## 🛠️ Vector Databases
182180

183181
- **[Milvus](https://milvus.io/)**: A production-ready open-source vector database for similarity search. Supports multiple ANN algorithms and distributed deployments.
184182

185183
- **[Weaviate](https://weaviate.io/)**: An open-source vector database with semantic search capabilities, supporting hybrid search, classification, and modules like CLIP and OpenAI.
186184

187185
- **[Qdrant](https://qdrant.tech/)**: A fast and scalable vector database written in Rust. Provides gRPC and REST APIs and supports filtering and payload-based search.
188186

189-
### 🧪📊 Benchmarking Tools and Leaderboards
190-
191-
#### 🧪 ANN-Benchmarks: Comparing Nearest Neighbor Libraries
187+
## 🧪 ANN-Benchmarks: Comparing Nearest Neighbor Libraries
192188

193189
**[ANN-Benchmarks](https://github.com/erikbern/ann-benchmarks)** is the standard benchmarking framework for evaluating Approximate Nearest Neighbor (ANN) algorithms on a wide range of datasets and distance metrics.
194190

195-
It includes:
196-
- Dockerized runners for 30+ ANN libraries including FAISS, HNSWlib, NMSLIB, Annoy, ScaNN, Milvus, and more.
197-
- Scripts to run and visualize benchmarking results.
198-
- Precomputed datasets in HDF5 format for fair and reproducible evaluation.
199-
200-
📄 Related Paper: [Aumüller et al. (2019)](https://arxiv.org/abs/1807.05614)
201-
202-
#### 🗃️ Evaluated Libraries on ANN-Benchmarks
203-
Some key evaluated libraries:
204-
- [FAISS](https://github.com/facebookresearch/faiss)
205-
- [HNSWlib](https://github.com/nmslib/hnswlib)
206-
- [Annoy](https://github.com/spotify/annoy)
207-
- [ScaNN](https://github.com/google-research/google-research/tree/master/scann)
208-
- [NMSLIB](https://github.com/nmslib/nmslib)
209-
- [Weaviate](https://github.com/weaviate/weaviate)
210-
- [Milvus](https://github.com/milvus-io/milvus)
211-
- [Qdrant](https://github.com/qdrant/qdrant)
212-
- [Elastiknn](https://github.com/alexklibisz/elastiknn)
213-
- [SPTAG (Microsoft)](https://github.com/microsoft/SPTAG)
214-
- [DiskANN (Microsoft)](https://github.com/microsoft/diskann)
215-
- [PyNNDescent](https://github.com/lmcinnes/pynndescent)
216-
- [FLANN](https://github.com/flann-lib/flann)
217-
218-
Full list: [github.com/erikbern/ann-benchmarks#evaluated](https://github.com/erikbern/ann-benchmarks#evaluated)
219-
220-
#### 📥 Precomputed Benchmark Datasets
221-
All datasets are split into train/test sets with ground truth for top-100 neighbors:
222-
223-
| Dataset | Dim | Train/Test | Distance | Download |
224-
|--------------|-----|------------------|-----------|----------|
225-
| DEEP1B | 96 | 9.9M / 10k | Angular | [HDF5](http://ann-benchmarks.com/deep-image-96-angular.hdf5) |
226-
| Fashion-MNIST| 784 | 60k / 10k | Euclidean | [HDF5](http://ann-benchmarks.com/fashion-mnist-784-euclidean.hdf5) |
227-
| SIFT | 128 | 1M / 10k | Euclidean | [HDF5](http://ann-benchmarks.com/sift-128-euclidean.hdf5) |
228-
| GIST | 960 | 1M / 1k | Euclidean | [HDF5](http://ann-benchmarks.com/gist-960-euclidean.hdf5) |
229-
| NYTimes | 256 | 290k / 10k | Angular | [HDF5](http://ann-benchmarks.com/nytimes-256-angular.hdf5) |
230-
| GloVe (25–200d)|| 1.18M / 10k | Angular | [Link](https://github.com/erikbern/ann-benchmarks#datasets) |
231-
| Last.fm | 65 | 292k / 50k | Angular | [HDF5](http://ann-benchmarks.com/lastfm-64-dot.hdf5) |
232-
| COCO-I2I | 512 | 113k / 10k | Angular | [HDF5](https://github.com/fabiocarrara/str-encoders/releases/download/v0.1.3/coco-i2i-512-angular.hdf5) |
233-
| COCO-T2I | 512 | 113k / 10k | Angular | [HDF5](https://github.com/fabiocarrara/str-encoders/releases/download/v0.1.3/coco-t2i-512-angular.hdf5) |
234-
235-
More: [ann-benchmarks.com](http://ann-benchmarks.com)
236-
237-
#### 🧠 Related Projects
191+
## 🧠 Related Projects
238192

239193
- **[Billion-Scale ANN Leaderboard](https://big-ann-benchmarks.com/neurips23.html)**: Continuously updated leaderboard comparing the performance of various billion-scale approximate nearest neighbor methods across recall, latency, and memory tradeoffs.
240194

241-
### 📚📖 Books
195+
## 📚📖 Books
242196

243197
Here are a few recommended books on large-scale machine learning:
244198

@@ -254,7 +208,7 @@ Here are a few recommended books on large-scale machine learning:
254208

255209
- **[Deep Learning](https://amzn.to/47updLU)** *(affiliate link)* by Goodfellow, Bengio, and Courville: The definitive book on deep learning. While not specific to hashing, it provides the theoretical backbone for understanding the neural network architectures used in deep supervised hashing models.
256210

257-
### 🗃️📥 Pre-Processed Datasets for Download
211+
## 🗃️📥 Pre-Processed Datasets for Download
258212

259213
- **[CIFAR-10 Gist Features (.mat)](https://www.dropbox.com/s/875u1rkva9iffpj/Gist512CIFAR10.mat?dl=0)**: This dataset contains GIST features extracted from the CIFAR-10 dataset, a popular image classification benchmark.
260214

0 commit comments

Comments
 (0)