Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions site/content/arangodb/3.12/develop/http-api/indexes/vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,23 @@ paths:
maxItems: 1
items:
type: string
storedValues:
description: |
Store additional attributes in the index (introduced in v3.12.7).
Unlike with other index types, this is not for covering projections
with the index but for adding attributes that you filter on.
This lets you make the lookup in the vector index more efficient
because it avoids materializing documents twice, once for the
filtering and once for the matches.

The maximum number of attributes that you can use in `storedValues` is 32.
type: array
uniqueItems: true
items:
description: |
A list of attribute paths. The `.` character denotes sub-attributes.
type: string
type: string
sparse:
description: |
Whether to create a sparse index that excludes documents with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,14 @@ centroids and the quality of vector search thus degrades.
Set this option to `true` to keep the collection/shards available for
write operations by not using an exclusive write lock for the duration
of the index creation. Default: `false`.
- **storedValues** (array of strings, introduced in v3.12.7):
Store additional attributes in the index. Unlike with other index types, this
is not for covering projections with the index but for adding attributes that
you filter on. This lets you make the lookup in the vector index more efficient
because it avoids materializing documents twice, once for the filtering and
once for the matches.

The maximum number of attributes that you can use in `storedValues` is 32.
- **params**: The parameters as used by the Faiss library.
- **metric** (string): The measure for calculating the vector similarity:
- `"cosine"`: Angular similarity. Vectors are automatically
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ A `replace-entries-with-object-iteration` rule has been added in v3.12.3.

A `use-index-for-collect` and a `use-vector-index` rule have been added in v3.12.4.

A `push-filter-into-enumerate-near` rule has been added in v3.12.7.

The affected endpoints are `POST /_api/cursor`, `POST /_api/explain`, and
`GET /_api/query/rules`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1541,6 +1541,8 @@ FOR doc IN coll
RETURN doc
```

The filtering is handled by the `use-vector-index` optimizer rule in v3.12.6.

Vector indexes can now be sparse to exclude documents with the embedding attribute
for indexing missing or set to `null`.

Expand All @@ -1551,6 +1553,61 @@ The accompanying AQL function is the following:

- `APPROX_NEAR_INNER_PRODUCT()`

---

<small>Introduced in: v3.12.7</small>

Vector indexes now support `storedValues` to store additional attributes in the
index. Unlike with other index types, this is not for covering projections with
the index but for adding attributes that you filter on. This lets you make the
lookup in the vector index more efficient because it avoids materializing
documents twice, once for the filtering and once for the matches.

For example, if you set `storedValues` to `["val"]` in a vector index over
`["vector"]`, then the following query can utilize this index for the
filtering by `val` and the lookup using `vector`, but not for the projection of
`attr` even if you added it to `storedValues` as well:

```aql
FOR doc IN coll
FILTER doc.val > 3
SORT APPROX_NEAR_INNER_PRODUCT(doc.vector, @q) DESC
LIMIT 3
RETURN doc.attr
```

The query execution plan, the utilization of `storedValues` for filtering is
indicated by `/* covered by storedValues */`:

```aql
Execution plan:
Id NodeType Par Est. Comment
1 SingletonNode 1 * ROOT
10 CalculationNode 1 - LET #4 = [ ... ] /* json expression */ /* const assignment */
11 EnumerateNearVectorNode 3 - FOR doc OF coll IN TOP 3 NEAR #4 DISTANCE INTO #2 FILTER (doc.`val` > 3) /* early pruning */ /* covered by storedValues */
7 LimitNode 3 - LIMIT 0, 3
12 MaterializeNode 3 - MATERIALIZE doc INTO #5 /* (projections: `attr`) */ LET #6 = #5.`attr`
9 ReturnNode 3 - RETURN #6

Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
11 foo vector coll false false false n/a [ `vector` ] [ `val` ] #4
```

The new `push-filter-into-enumerate-near` optimizer rule now handles everything
related to vector index filtering (with and without `storedValues`).

The `FOR` operation now supports `indexHint` and `forceIndexHint` for vector
indexes to make the AQL optimizer prefer respectively require specific
vector indexes:

```aql
FOR doc IN c OPTIONS { indexHint: ["vec_idx_1", "vec_idx_2"], forceIndexHint: true }
SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC
LIMIT 3
RETURN doc
```

## Server options

### Effective and available startup options
Expand Down
17 changes: 17 additions & 0 deletions site/content/arangodb/4.0/develop/http-api/indexes/vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,23 @@ paths:
maxItems: 1
items:
type: string
storedValues:
description: |
Store additional attributes in the index (introduced in v3.12.7).
Unlike with other index types, this is not for covering projections
with the index but for adding attributes that you filter on.
This lets you make the lookup in the vector index more efficient
because it avoids materializing documents twice, once for the
filtering and once for the matches.

The maximum number of attributes that you can use in `storedValues` is 32.
type: array
uniqueItems: true
items:
description: |
A list of attribute paths. The `.` character denotes sub-attributes.
type: string
type: string
sparse:
description: |
Whether to create a sparse index that excludes documents with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,14 @@ centroids and the quality of vector search thus degrades.
Set this option to `true` to keep the collection/shards available for
write operations by not using an exclusive write lock for the duration
of the index creation. Default: `false`.
- **storedValues** (array of strings, introduced in v3.12.7):
Store additional attributes in the index. Unlike with other index types, this
is not for covering projections with the index but for adding attributes that
you filter on. This lets you make the lookup in the vector index more efficient
because it avoids materializing documents twice, once for the filtering and
once for the matches.

The maximum number of attributes that you can use in `storedValues` is 32.
- **params**: The parameters as used by the Faiss library.
- **metric** (string): The measure for calculating the vector similarity:
- `"cosine"`: Angular similarity. Vectors are automatically
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ A `replace-entries-with-object-iteration` rule has been added in v3.12.3.

A `use-index-for-collect` and a `use-vector-index` rule have been added in v3.12.4.

A `push-filter-into-enumerate-near` rule has been added in v3.12.7.

The affected endpoints are `POST /_api/cursor`, `POST /_api/explain`, and
`GET /_api/query/rules`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1541,6 +1541,8 @@ FOR doc IN coll
RETURN doc
```

The filtering is handled by the `use-vector-index` optimizer rule in v3.12.6.

Vector indexes can now be sparse to exclude documents with the embedding attribute
for indexing missing or set to `null`.

Expand All @@ -1551,6 +1553,61 @@ The accompanying AQL function is the following:

- `APPROX_NEAR_INNER_PRODUCT()`

---

<small>Introduced in: v3.12.7</small>

Vector indexes now support `storedValues` to store additional attributes in the
index. Unlike with other index types, this is not for covering projections with
the index but for adding attributes that you filter on. This lets you make the
lookup in the vector index more efficient because it avoids materializing
documents twice, once for the filtering and once for the matches.

For example, if you set `storedValues` to `["val"]` in a vector index over
`["vector"]`, then the following query can utilize this index for the
filtering by `val` and the lookup using `vector`, but not for the projection of
`attr` even if you added it to `storedValues` as well:

```aql
FOR doc IN coll
FILTER doc.val > 3
SORT APPROX_NEAR_INNER_PRODUCT(doc.vector, @q) DESC
LIMIT 3
RETURN doc.attr
```

The query execution plan, the utilization of `storedValues` for filtering is
indicated by `/* covered by storedValues */`:

```aql
Execution plan:
Id NodeType Par Est. Comment
1 SingletonNode 1 * ROOT
10 CalculationNode 1 - LET #4 = [ ... ] /* json expression */ /* const assignment */
11 EnumerateNearVectorNode 3 - FOR doc OF coll IN TOP 3 NEAR #4 DISTANCE INTO #2 FILTER (doc.`val` > 3) /* early pruning */ /* covered by storedValues */
7 LimitNode 3 - LIMIT 0, 3
12 MaterializeNode 3 - MATERIALIZE doc INTO #5 /* (projections: `attr`) */ LET #6 = #5.`attr`
9 ReturnNode 3 - RETURN #6

Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
11 foo vector coll false false false n/a [ `vector` ] [ `val` ] #4
```

The new `push-filter-into-enumerate-near` optimizer rule now handles everything
related to vector index filtering (with and without `storedValues`).

The `FOR` operation now supports `indexHint` and `forceIndexHint` for vector
indexes to make the AQL optimizer prefer respectively require specific
vector indexes:

```aql
FOR doc IN c OPTIONS { indexHint: ["vec_idx_1", "vec_idx_2"], forceIndexHint: true }
SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC
LIMIT 3
RETURN doc
```

## Server options

### Effective and available startup options
Expand Down