You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The maximum number of results to return (termed "top" or K) is currently a represented by VectorSearchOptions.Top, which is an optional property that defaults to 3.
Most vector databases (AFAIK) require this value as a mandatory parameter in their API/SDK; this is important as similarity search always returns all elements in the database unless some limit (or filter) is specified (FWIW, it's almost always a good idea to set LIMIT even in relational databases as well). I think our general concept with the abstraction is to expose underlying databases/SDKs, and so we'd ideally avoid making decisions or determining defaults on the user's behalf. If that makes sense, we can move Top to be a mandatory parameter directly on the method signature.
Some vector databases exist which allow not specifying the value (and themselves have some built-in default or similar); for example, Azure Search seems to allow KNearestNeighborsCount to be unset, and defaults to 50 (docs). Our current code overrides that default, setting it to 3.
To support database-side defaults, we could change our Limit to a nullable int without a default, allowing us to not send the value to the database. Providers which do require the limit would throw if it's unset. However, despite Azure Search's default of 50, I personally don't believe that a database-side default makes sense here - each application has the number of results that makes sense for it, and developers should make a decision on that, rather than get e.g. 50 (or 3) by default. I also think Azure Search is rare in allowing the limit to be unset (but this would need to be confirmed).
github-actionsbot
changed the title
.NET MEVD: Consider making Top a mandatory parameter
.Net MEVD: Consider making Top a mandatory parameter
Jan 15, 2025
The maximum number of results to return (termed "top" or K) is currently a represented by VectorSearchOptions.Top, which is an optional property that defaults to 3.
Most vector databases (AFAIK) require this value as a mandatory parameter in their API/SDK; this is important as similarity search always returns all elements in the database unless some limit (or filter) is specified (FWIW, it's almost always a good idea to set LIMIT even in relational databases as well). I think our general concept with the abstraction is to expose underlying databases/SDKs, and so we'd ideally avoid making decisions or determining defaults on the user's behalf. If that makes sense, we can move Top to be a mandatory parameter directly on the method signature.
Some vector databases exist which allow not specifying the value (and themselves have some built-in default or similar); for example, Azure Search seems to allow KNearestNeighborsCount to be unset, and defaults to 50 (docs). Our current code overrides that default, setting it to 3.
To support database-side defaults, we could change our Limit to a nullable int without a default, allowing us to not send the value to the database. Providers which do require the limit would throw if it's unset. However, despite Azure Search's default of 50, I personally don't believe that a database-side default makes sense here - each application has the number of results that makes sense for it, and developers should make a decision on that, rather than get e.g. 50 (or 3) by default. I also think Azure Search is rare in allowing the limit to be unset (but this would need to be confirmed).
/cc @westey-m
The text was updated successfully, but these errors were encountered: