Skip to content

Latest commit

 

History

History
128 lines (100 loc) · 4.47 KB

README.md

File metadata and controls

128 lines (100 loc) · 4.47 KB

langserve-rag-api

Local installation

Install micromamba

If you have acceess to mamba via some other method (maybe you have already installed miniforge), you can skip this step.

Otherwise, install micromamba following these instructions.

For x86_64 Linux, this would be:

curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba

Once you have access to mamba/micromamba, create an environment with

bin/micromamba env create -f environment.yml

when using micromamba or with

mamba env create -f environment.yml

when using mamba.

Setting up OpenAI API keys

This RAG example utilizes OpenAI compatible endpoint. You can either launch your own or use any other compatible one.

You can set these via environment variables or through secrets/api_keys.env.

The following keys need to be set:

  • OPENAI_API_KEY: API key for the OpenAI endpoint
  • OPENAI_BASE_URL: Base URL for the OpenAI endpoint.
  • OPENAI_MODEL: Model to use from the OpenAI endpoint.

If you're using API keys from Aalto Azure OpenAI endpoint, you need to set AZURE_AUTH=1 as well.

Example secrets are shown in secrets/example_api_keys.env.

Remember that you should never share your API keys and you should never commit your API keys to a repository.

Starting up the server

Activate the environment with:

eval "$(./bin/micromamba shell hook -s posix)"
micromamba activate langserve-rag-api

When using mamba, use either

mamba activate langserve-rag-api

or

source activate langserve-rag-api

Now you can launch the server with

uvicorn app.server:app --host 0.0.0.0 --port 8080 --reload

This will start the server in the reload mode where changes to the code base will reload the server automatically.

Testing the different API endpoints

You can go to http://localhost:8080 with your browser to see the available endpoints.

There are four llm endpoints that you can test out in their respective playgrounds:

Code structure

  • app/server.py - This is the main server. It sets up the endpoints and starts the uvicorn app. Endpoints are set up by calling add_routes-function and they utilize runnable LLM chains from app.chains:
add_routes(
    app,
    llm_chain,
    path='/llm'
)
  • app/chains.py - This file contains the chains that can be run. Each chain consists of callables that are called one after the other. User's question is given to a prompt, which is then passed to an LLM, which then produces output to a parser. In a case of a RAG setup, users question is first passed through a retriever that adds context to the prompt from retrieved documents:
salesman_chain = (
    {"context": retriever | combine_docs, "question": RunnablePassthrough()}
    | salesman_prompt
    | llm
    | StrOutputParser()
)
  • app/llms.py - This file specifies how the LLMs are created.
  • app/prompts.py - This file contains prompts used by the LLM chains. Some prompts take only the question as input, but others take also retrieved context (in RAG situations) and some take additional instruction on how the output should be parsed.
  • app/retriever.py - This file contains a custom retriever for the salesman RAG. This retriever could be any kind of retriever that takes a string of text (users question) and retrieves relevant documents. LangChain has plenty of existing retrievers for most use cases so creating a custom retriever is only relevant if those retrievers are not applicable for the data in question.
  • app/schema.py - This file contains a desired answering schema for the questions. Useful when using parsers as they can automatically create answering instructions for the LLM based on the schema and then detect the answer from the output of the model.
  • app/utils.py - This file contains utils that help with reading secrets.