You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**[Dr. Wu-Jun Li's tutorial slides](https://cs.nju.edu.cn/lwj/slides/L2H.pdf)**: These tutorial slides by Dr. Wu-Jun Li offer a comprehensive introduction to learning to hash (L2H) techniques. It's an excellent resource for anyone seeking a deep understanding of hashing from a technical perspective.
60
60
@@ -64,7 +64,7 @@ title: Resources on Large Language Models (LLMs)
64
64
65
65
-**[Hashing Algorithms for Large-Scale Machine Learning - 2017 Rice Machine Learning Workshop](https://www.youtube.com/embed/tQ0OJXowLJA)**: This video is a recording of a presentation from the 2017 Rice Machine Learning Workshop. It offers a detailed overview of various hashing algorithms used for large-scale machine learning.
66
66
67
-
###🎤🧑🔬Conferences and Workshops
67
+
##🎤🧑🔬Conferences and Workshops
68
68
69
69
-**[IJCNN 2025: Scalable and Deep Graph Learning and Mining](https://www.ijcnn.org/)**: Workshop including hashing methods applied to graph structures for retrieval and similarity.
70
70
@@ -84,7 +84,7 @@ title: Resources on Large Language Models (LLMs)
84
84
85
85
-**[SIAM International Conference on Data Mining (SDM)](https://www.siam.org/conferences/cm/conference/sdm22)**: SDM is an important conference for researchers in data mining, focusing on the latest developments in algorithms, data analysis, and big data applications.
86
86
87
-
###📄🔬 Survey Papers
87
+
##📄🔬 Survey Papers
88
88
89
89
For a deeper dive, these survey papers are excellent resources:
90
90
@@ -104,7 +104,7 @@ For a deeper dive, these survey papers are excellent resources:
104
104
105
105
-**[Learning to Hash With Binary Deep Neural Networks: A Survey](https://www.sciencedirect.com/science/article/abs/pii/S016786552030208X)**: This survey focuses on binary deep neural networks and their use in learning to hash. It explores how these networks are trained to produce compact binary codes that can be used for efficient data retrieval in large-scale datasets.
106
106
107
-
###🎓📚 Courses
107
+
##🎓📚 Courses
108
108
109
109
Some university courses cover topics related to machine learning and efficient computing, with publicly available materials:
110
110
@@ -116,7 +116,7 @@ Some university courses cover topics related to machine learning and efficient c
116
116
117
117
-**[CS276: Information Retrieval](https://web.stanford.edu/class/cs276/)** (Stanford University): A comprehensive, foundational course covering algorithms for vector similarity search, ranking, indexing, and hashing.
118
118
119
-
####🧠 DeepLearning.AI Short Courses on Vector Search & ANN
119
+
##🧠 DeepLearning.AI Short Courses on Vector Search & ANN
120
120
121
121
-**[Vector Databases: from Embeddings to Applications](https://www.deeplearning.ai/short-courses/vector-databases-embeddings-applications/?utm_source=chatgpt.com)**: Learn how vector databases work (dense vs sparse search, multilingual embeddings, hybrid search) with real-world applications using Weaviate. *(~55 min)*
122
122
@@ -130,7 +130,7 @@ Some university courses cover topics related to machine learning and efficient c
130
130
131
131
-**[Prompt Compression and Query Optimization](https://www.deeplearning.ai/short-courses/prompt-compression-and-query-optimization/?utm_source=chatgpt.com)**: Covers retrieval latency reduction via query filtering, projection, re-ranking, and prompt shortening — with examples using MongoDB Atlas Vector Search.
132
132
133
-
###📝📰 Blog Posts
133
+
##📝📰 Blog Posts
134
134
135
135
Blog posts are a great way to keep up with cutting-edge research. Here are some of our favorites:
136
136
@@ -154,9 +154,7 @@ Blog posts are a great way to keep up with cutting-edge research. Here are some
154
154
155
155
-**[What is Locality-Sensitive Hashing?](https://www.quora.com/What-is-locality-sensitive-hashing)**: This Quora discussion explains LSH in simple terms. It covers the core principles of how LSH works and why it is useful for approximate nearest neighbor search.
156
156
157
-
### 🧩💾 Hashing Software Packages
158
-
159
-
#### 📦 Hashing Algorithms
157
+
## 📦 Hashing Algorithms
160
158
161
159
-**[Deep Hashing Toolbox](https://github.com/thulab/DeepHash)**: An open-source implementation designed for learning to hash with deep neural networks. Useful for deep similarity search research.
162
160
@@ -166,7 +164,7 @@ Blog posts are a great way to keep up with cutting-edge research. Here are some
166
164
167
165
-**[HashNet](https://github.com/thuml/HashNet)**: Implements HashNet, a deep hashing method that handles imbalanced data distributions and learns binary hash codes end-to-end.
168
166
169
-
####🏗️ Indexing / ANN Libraries
167
+
##🏗️ Indexing / ANN Libraries
170
168
171
169
-**[Faiss (Facebook AI Similarity Search)](https://github.com/facebookresearch/faiss)**: A powerful library by Facebook AI Research for efficient similarity search of dense vectors. Supports PQ, IVF, HNSW, and more.
172
170
@@ -178,67 +176,23 @@ Blog posts are a great way to keep up with cutting-edge research. Here are some
178
176
179
177
-**[ScaNN (Scalable Nearest Neighbors)](https://github.com/google-research/google-research/tree/master/scann)**: Developed by Google Research, ScaNN is optimized for vector similarity search at production scale using quantization and reordering.
180
178
181
-
####🛠️ Vector Databases
179
+
##🛠️ Vector Databases
182
180
183
181
-**[Milvus](https://milvus.io/)**: A production-ready open-source vector database for similarity search. Supports multiple ANN algorithms and distributed deployments.
184
182
185
183
-**[Weaviate](https://weaviate.io/)**: An open-source vector database with semantic search capabilities, supporting hybrid search, classification, and modules like CLIP and OpenAI.
186
184
187
185
-**[Qdrant](https://qdrant.tech/)**: A fast and scalable vector database written in Rust. Provides gRPC and REST APIs and supports filtering and payload-based search.
**[ANN-Benchmarks](https://github.com/erikbern/ann-benchmarks)** is the standard benchmarking framework for evaluating Approximate Nearest Neighbor (ANN) algorithms on a wide range of datasets and distance metrics.
194
190
195
-
It includes:
196
-
- Dockerized runners for 30+ ANN libraries including FAISS, HNSWlib, NMSLIB, Annoy, ScaNN, Milvus, and more.
197
-
- Scripts to run and visualize benchmarking results.
198
-
- Precomputed datasets in HDF5 format for fair and reproducible evaluation.
199
-
200
-
📄 Related Paper: [Aumüller et al. (2019)](https://arxiv.org/abs/1807.05614)
-**[Billion-Scale ANN Leaderboard](https://big-ann-benchmarks.com/neurips23.html)**: Continuously updated leaderboard comparing the performance of various billion-scale approximate nearest neighbor methods across recall, latency, and memory tradeoffs.
240
194
241
-
###📚📖 Books
195
+
##📚📖 Books
242
196
243
197
Here are a few recommended books on large-scale machine learning:
244
198
@@ -254,7 +208,7 @@ Here are a few recommended books on large-scale machine learning:
254
208
255
209
-**[Deep Learning](https://amzn.to/47updLU)***(affiliate link)* by Goodfellow, Bengio, and Courville: The definitive book on deep learning. While not specific to hashing, it provides the theoretical backbone for understanding the neural network architectures used in deep supervised hashing models.
256
210
257
-
###🗃️📥 Pre-Processed Datasets for Download
211
+
##🗃️📥 Pre-Processed Datasets for Download
258
212
259
213
-**[CIFAR-10 Gist Features (.mat)](https://www.dropbox.com/s/875u1rkva9iffpj/Gist512CIFAR10.mat?dl=0)**: This dataset contains GIST features extracted from the CIFAR-10 dataset, a popular image classification benchmark.
0 commit comments