The goal of this project is to create a scalable and fault tolerant API that can answer questions like ChatGPT with the help of the Llama 2.0 large language model. To approach this, I created a Python FastAPI that uses the llama-cpp-python library to communicate with the LLM. To scale this, I containerized this application into a docker image and decided to use kubernetes to help with horizontal scaling and container orchestration. I decided to use Terraform to deploy my cluser to AWS EKS.
- LLAMA 2.0 Model at https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/blob/main/llama-2-7b-chat.ggmlv3.q8_0.bin and save it to the
/app
folder - docker
- kubectl
- aws cli
- terraform
- Type
cd /terraform
to get into the terraform folder - Type
terraform init
to initialize the terraform project - Type
terraform plan
to check which resources will be generated (eks cluster and nodes for the cluster) - Type
terraform apply
to create the resources - Once the resources are created type
aws eks list-clusters
and you should see 'my-cluster' in the list - Type
aws eks update-kubeconfig --region us-west-2 --name my-cluster
to change the kubectl context to your cluster - Type
kubectl get nodes
to check on the status of the newly created node resources - Type
kubectl create ns fastapi-app
to create the namespace for your application in kubernetes - Type
cd ../k8s
to switch folders to get access to the manifests - Type
kubectl apply -f deployment.yaml
to create the deployment kubectl get deployments -n fastapi-app
to check on the status of the deploymentkubectl apply -f service.yaml
to create the service (your load balancer)kubectl get services -n fastapi-app
to check on the status of the service and to get the URL of the load balancer- Now you can use the URL from the previous step to call the API!
terraform destroy
to delete all resources from AWS (it is very costly)
curl --location 'localhost/ask' \
--header 'Content-Type: application/json' \
--data '{
"question": "Q: What are the names of the days of the week?"
}'
var myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");
var raw = JSON.stringify({
"question": "Q: What are the names of the days of the week?"
});
var requestOptions = {
method: 'POST',
headers: myHeaders,
body: raw,
redirect: 'follow'
};
fetch("localhost/ask", requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));
import requests
import json
url = "localhost/ask"
payload = json.dumps({
"question": "Q: What are the names of the days of the week?"
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
{
"answer": "\nA: The names of the days of the week, in order, are:\n\n1. Sunday\n2. Monday\n3. Tuesday\n4. Wednesday\n5. Thursday\n6. Friday\n7. Saturday"
}