Skip to content

Commit d7c6ad5

Browse files
committed
MLE-27018 update cosine, cosineDistance and vectorScore to match codegen
The documentation for cosine function changed to just "return the cosine of the angle between two vectors". This is the same thing as similarity, but change the text to match codegen. Fix test to enforce correct range of values it should return ([-1,1]) to avoid confusion. Changed javadoc for cosineDistance to state explicitly "returns the cosine distance between two vectors" to match codegen. Test that 1 - cosine(v1,v2) == cosineDistance(v1,v2) explicitly in test (within a floating point delta) to document the relationship in the test. Update vectorScore methods, change the similarity parameter name to distance to match codegen. Add two new methods that have another weight param for the ANN portion of the final hybrid score.
1 parent 455048a commit d7c6ad5

3 files changed

Lines changed: 156 additions & 50 deletions

File tree

marklogic-client-api/src/main/java/com/marklogic/client/expression/VecExpr.java

Lines changed: 55 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -57,30 +57,26 @@ public interface VecExpr {
5757
*/
5858
public ServerExpression base64Encode(ServerExpression vector1);
5959

60-
/**
61-
* Returns the cosine similarity between two vectors. The vectors must be of the same dimension.
62-
*
63-
* <a name="ml-server-type-cosine"></a>
64-
*
65-
* <p>
66-
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:cosine" target="mlserverdoc">vec:cosine</a> server function.
67-
*
68-
* @param vector1 The vector from which to calculate the cosine similarity with vector2. (of <a href="{@docRoot}/doc-files/types/vec_vector.html">vec:vector</a>)
69-
* @param vector2 The vector from which to calculate the cosine similarity with vector1. (of <a href="{@docRoot}/doc-files/types/vec_vector.html">vec:vector</a>)
70-
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a> server data type
71-
* @since 7.2.0
72-
*/
73-
public ServerExpression cosine(ServerExpression vector1, ServerExpression vector2);
60+
/**
61+
* Returns the cosine of the angle between two vectors. The vectors must be of the same dimension.
62+
* <p>
63+
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:cosine" target="mlserverdoc">vec:cosine</a> server function.
64+
* @param vector1 The vector from which to calculate the cosine with vector2. (of <a href="{@docRoot}/doc-files/types/vec_vector.html">vec:vector</a>)
65+
* @param vector2 The vector from which to calculate the cosine with vector1. (of <a href="{@docRoot}/doc-files/types/vec_vector.html">vec:vector</a>)
66+
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a> server data type
67+
* @since 7.2.0
68+
*/
69+
public ServerExpression cosine(ServerExpression vector1, ServerExpression vector2);
7470

75-
/**
76-
* Return the distance between two vectors. The vectors must be of the same dimension.
77-
*
78-
* @param vector1 The vector from which to calculate the cosine distance with vector2. (of <a href="{@docRoot}/doc-files/types/vec_vector.html">vec:vector</a>)
79-
* @param vector2 The vector from which to calculate the cosine distance with vector1. (of <a href="{@docRoot}/doc-files/types/vec_vector.html">vec:vector</a>)
80-
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a> server data type
81-
* @since 7.2.0
82-
*/
83-
public ServerExpression cosineDistance(ServerExpression vector1, ServerExpression vector2);
71+
/**
72+
* Returns the cosine distance between two vectors. The vectors must be of the same dimension.
73+
*
74+
* @param vector1 The vector from which to calculate the cosine distance with vector2. (of <a href="{@docRoot}/doc-files/types/vec_vector.html">vec:vector</a>)
75+
* @param vector2 The vector from which to calculate the cosine distance with vector1. (of <a href="{@docRoot}/doc-files/types/vec_vector.html">vec:vector</a>)
76+
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a> server data type
77+
* @since 7.2.0
78+
*/
79+
public ServerExpression cosineDistance(ServerExpression vector1, ServerExpression vector2);
8480

8581
/**
8682
* Returns the dimension of the vector passed in.
@@ -246,44 +242,63 @@ public interface VecExpr {
246242
*/
247243
public ServerExpression vector(ServerExpression values);
248244
/**
249-
* A helper function that returns a hybrid score using a cts score and a vector similarity calculation result. You can tune the effect of the vector similarity on the score using the similarityWeight option. The ideal value for similarityWeight depends on your application.
245+
* A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results.
250246
* <p>
251247
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
252248
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
253-
* @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
249+
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
254250
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_unsignedLong.html">xs:unsignedLong</a> server data type
255251
*/
256-
public ServerExpression vectorScore(ServerExpression score, double similarity);
252+
public ServerExpression vectorScore(ServerExpression score, double distance);
257253
/**
258-
* A helper function that returns a hybrid score using a cts score and a vector similarity calculation result. You can tune the effect of the vector similarity on the score using the similarityWeight option. The ideal value for similarityWeight depends on your application.
259-
*
260-
* <a name="ml-server-type-vector-score"></a>
261-
254+
* A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results.
255+
* <p>
256+
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
257+
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
258+
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
259+
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_unsignedLong.html">xs:unsignedLong</a> server data type
260+
*/
261+
public ServerExpression vectorScore(ServerExpression score, ServerExpression distance);
262+
/**
263+
* A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results.
264+
* <p>
265+
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
266+
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
267+
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
268+
* @param distanceWeight The weight of the vector distance on the annScore. This value is a positive coefficient that scales the distance. A larger distanceWeight produces a lower annScore for the same distance. The default value is 1. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
269+
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_unsignedLong.html">xs:unsignedLong</a> server data type
270+
*/
271+
public ServerExpression vectorScore(ServerExpression score, double distance, double distanceWeight);
272+
/**
273+
* A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results.
262274
* <p>
263275
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
264276
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
265-
* @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
277+
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
278+
* @param distanceWeight The weight of the vector distance on the annScore. This value is a positive coefficient that scales the distance. A larger distanceWeight produces a lower annScore for the same distance. The default value is 1. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
266279
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_unsignedLong.html">xs:unsignedLong</a> server data type
267280
*/
268-
public ServerExpression vectorScore(ServerExpression score, ServerExpression similarity);
281+
public ServerExpression vectorScore(ServerExpression score, ServerExpression distance, ServerExpression distanceWeight);
269282
/**
270-
* A helper function that returns a hybrid score using a cts score and a vector similarity calculation result. You can tune the effect of the vector similarity on the score using the similarityWeight option. The ideal value for similarityWeight depends on your application.
283+
* A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results.
271284
* <p>
272285
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
273286
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
274-
* @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
275-
* @param similarityWeight The weight of the vector similarity on the score. The default value is 0.1. If 0.0 is passed in, vector similarity has no effect. If passed a value less than 0.0 or greater than 1.0, throw VEC-VECTORSCORE. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
287+
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
288+
* @param distanceWeight The weight of the vector distance on the annScore. This value is a positive coefficient that scales the distance. A larger distanceWeight produces a lower annScore for the same distance. The default value is 1. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
289+
* @param weight The weight of the annScore in the final hybrid score. This value is a coefficient between 0 and 1, where 0 gives full weight to the cts score and 1 gives full weight to the annScore. The default value is 0.5. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
276290
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_unsignedLong.html">xs:unsignedLong</a> server data type
277291
*/
278-
public ServerExpression vectorScore(ServerExpression score, double similarity, double similarityWeight);
292+
public ServerExpression vectorScore(ServerExpression score, double distance, double distanceWeight, double weight);
279293
/**
280-
* A helper function that returns a hybrid score using a cts score and a vector similarity calculation result. You can tune the effect of the vector similarity on the score using the similarityWeight option. The ideal value for similarityWeight depends on your application.
294+
* A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results.
281295
* <p>
282296
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
283297
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
284-
* @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
285-
* @param similarityWeight The weight of the vector similarity on the score. The default value is 0.1. If 0.0 is passed in, vector similarity has no effect. If passed a value less than 0.0 or greater than 1.0, throw VEC-VECTORSCORE. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
298+
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
299+
* @param distanceWeight The weight of the vector distance on the annScore. This value is a positive coefficient that scales the distance. A larger distanceWeight produces a lower annScore for the same distance. The default value is 1. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
300+
* @param weight The weight of the annScore in the final hybrid score. This value is a coefficient between 0 and 1, where 0 gives full weight to the cts score and 1 gives full weight to the annScore. The default value is 0.5. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
286301
* @return a server expression with the <a href="{@docRoot}/doc-files/types/xs_unsignedLong.html">xs:unsignedLong</a> server data type
287302
*/
288-
public ServerExpression vectorScore(ServerExpression score, ServerExpression similarity, ServerExpression similarityWeight);
303+
public ServerExpression vectorScore(ServerExpression score, ServerExpression distance, ServerExpression distanceWeight, ServerExpression weight);
289304
}

0 commit comments

Comments
 (0)