Name		Name	Last commit message	Last commit date
parent directory ..
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

README.md

Vespa sample applications - embedding service (WORK IN PROGRESS)

This sample application demonstrates how a Java handler component can be used to process HTTP requests. In this application, a handler is used to implement an embedding service, which takes a string as an input and returns a vector embedding of that string.

Setup for Vespa Cloud deployment

Cloud deployment

Create a new application in Vespa Cloud by following steps 1-4 in the quick start guide
Clone this repository: vespa clone examples/embedding-service embedding-service && cd embedding-service
Download the models:

mkdir -p src/main/application/embedder-models/e5-small-v2
curl -o src/main/application/embedder-models/e5-small-v2/model.onnx \
  https://data.vespa-cloud.com/onnx_models/e5-small-v2/model.onnx
curl -o src/main/application/embedder-models/e5-small-v2/tokenizer.json \
  https://data.vespa-cloud.com/onnx_models/e5-small-v2/tokenizer.json

Add a public certificate: vespa auth cert
Compile and deploy the application: mvn install && vespa deploy --wait 600

Enabling more embedders

By default, only the e5-small-v2 embedder is enabled for cloud deployments. Additional models are available, and can be enabled easily, though you should be mindful of the increased memory consumption. Check out services.xml for more information.

Setup for local deployment

Set up a Vespa Docker container by following steps 1-5 in the quick start guide
Clone this repository: vespa clone examples/embedding-service embedding-service && cd embedding-service
Download the models:

mkdir -p src/main/application/embedder-models/e5-small-v2
curl -o src/main/application/embedder-models/e5-small-v2/model.onnx https://data.vespa-cloud.com/onnx_models/e5-small-v2/model.onnx
curl -o src/main/application/embedder-models/e5-small-v2/tokenizer.json https://data.vespa-cloud.com/onnx_models/e5-small-v2/tokenizer.json

Compile and deploy the application: mvn install && vespa deploy --wait 300

Adding more local embedders

More embedders from the model hub can be added for local deployments, but this increases compile/deployment time. To add a model, download its model.onnx and tokenizer.json files and add them to a new subdirectory in src/main/application/embedder-models. Then, add it as a component in services.xml.

Calling the embedding service

This sample application is a work in progress. Currently, it has no GUI. To interact with the application, you need to somehow send a POST request to the embedding endpoint, containing a JSON object specifying the text to be encoded and the embedder to use.

If you're using Vespa Cloud, you can use the vespa curl utility:

vespa curl -- -X POST --data-raw \
'{
    "text": "text to embed",
    "embedder": "e5-small-v2"
}' \
/embedding

If you're running the app locally, you can use normal curl:

curl 'http://127.0.0.1:8080/embedding'  \
-X POST --data-raw  \
'{ 
  "text": "text to embed", 
  "embedder": "e5-small-v2"  
}'

The output should look something like this in both cases:

{
    "embedder":"e5-small-v2",
    "text":"text to embed",
    "embedding":"tensor<float>(x[384]):[-0.5786399, 0.20775521, ...]"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embedding-service

embedding-service

README.md

Vespa sample applications - embedding service (WORK IN PROGRESS)

Setup for Vespa Cloud deployment

Cloud deployment

Enabling more embedders

Setup for local deployment

Adding more local embedders

Calling the embedding service

Files

embedding-service

Directory actions

More options

Directory actions

More options

Latest commit

History

embedding-service

Folders and files

parent directory

README.md

Vespa sample applications - embedding service (WORK IN PROGRESS)

Setup for Vespa Cloud deployment

Cloud deployment

Enabling more embedders

Setup for local deployment

Adding more local embedders

Calling the embedding service