Contents

Setup the vicuna1.5 inference

#In this post we explore how to deploy vicuna1.5 and expose its OpenAI compatible API

Reference

https://github.com/lm-sys/FastChat/blob/main/docs/langchain_integration.md

1. python deps

1
pip install httpx shortuuid tiktoken

2. dowbload the model

1
2
3
from huggingface_hub import snapshot_download
model = "lmsys/vicuna-13b-v1.5-16k"
snapshot_download(repo_id=model)

3. start the service

1
2
3
python -m fastchat.serve.controller --host 0.0.0.0 &
python -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002"  --host 0.0.0.0 --num-gpus 4 --model-path lmsys/vicuna-13b-v1.5-16k &
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000

4. make a test

1
pip install openai
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
```python
import openai
openai.api_key = ""  # empty key
openai.api_base = "http://127.0.0.1:8000/v1"  # use local server

models = openai.Model.list()

# print the first model's id
print(models.data[0].id)

# create a chat completion
chat_completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", 
                                                                                 "content": "if a+b=13 and a*b=42, then what are a and b respectively, do it step by step"}])
print(chat_completion.choices[0].message.content)

Comments