Contents

Setup the vicuna1.5 inference

Ying Sun included in LLM

08-06-2023 150 words One minute

Contents

#In this post we explore how to deploy vicuna1.5 and expose its OpenAI compatible API

Reference

https://github.com/lm-sys/FastChat/blob/main/docs/langchain_integration.md

1. python deps

1

pip install httpx shortuuid tiktoken

2. dowbload the model

1
2
3


from huggingface_hub import snapshot_download
model = "lmsys/vicuna-13b-v1.5-16k"
snapshot_download(repo_id=model)

3. start the service

1
2
3


python -m fastchat.serve.controller --host 0.0.0.0 &
python -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002"  --host 0.0.0.0 --num-gpus 4 --model-path lmsys/vicuna-13b-v1.5-16k &
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000

4. make a test

1

pip install openai

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


```python
import openai
openai.api_key = ""  # empty key
openai.api_base = "http://127.0.0.1:8000/v1"  # use local server

models = openai.Model.list()

# print the first model's id
print(models.data[0].id)

# create a chat completion
chat_completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", 
                                                                                 "content": "if a+b=13 and a*b=42, then what are a and b respectively, do it step by step"}])
print(chat_completion.choices[0].message.content)

Comments