/images/logo.png
A notebook for something

Understand the temperature, top-p and top-k in LLMs

When generating text with large language models (LLMs), temperature, top-p (nucleus sampling), and top-k are parameters used to control the randomness and diversity of the generated output. Each of these parameters influences the probability distribution from which the next token (word or subword) is sampled. Here’s a breakdown of how each parameter is implemented internally:

1. Temperature

Temperature is a parameter that adjusts the probability distribution of the next token by scaling the logits (raw scores) output by the model.

llama v3

1
2
3
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
--gpu-memory-utilization 0.80 --dtype bfloat16 \
--model gradientai/Llama-3-8B-Instruct-Gradient-4194k
1
2
3
4
5
6

Remote SSH into WSL2

In this post we show the steps to setup SSH into WSL2.

Reference:

https://www.hanselman.com/blog/how-to-ssh-into-wsl2-on-windows-10-from-an-external-machine

On WSL2

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sudo apt install openssh-server
sudo vim /etc/ssh/sshd_config

# uncomment these two lines
Port 22
ListenAddress 0.0.0.0

sudo service ssh start

ip a
# assuming the ip is 192.168.121.141 and username is pi

Build a docker image with support CUDA and conda

1. environment.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
name: main
channels:
  - defaults
  - conda-forge
  - nvidia
  - pytorch
dependencies:
  - python==3.10
  - pip
  - numpy
  - pandas
  - pyarrow
  - grpcio
  - grpcio-tools
  - protobuf
  - pip:
    - vllm==0.3.0
    - google-cloud-bigquery==3.17.2
    - google-cloud-storage==2.14.0
    - google-cloud-aiplatform==1.41.0
    - google-auth==2.27.0
    - autoawq

Deploy a vllm hosted LLM on k8s

In this post we show the steps to deploy a LLM with vllm on a GCP GKE environment.

1. Build a base docker image with cuda and conda

This docker image setup the env to host a LLM

  1. cuda:12.1.1
  2. miniconda
  3. a conda env named “main”, with python3.10 and some python libs

1.1 The dockerfile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44