Understand the temperature, top-p and top-k in LLMs

Ying Sun published on 09-15-2024

When generating text with large language models (LLMs), temperature, top-p (nucleus sampling), and top-k are parameters used to control the randomness and diversity of the generated output. Each of these parameters influences the probability distribution from which the next token (word or subword) is sampled. Here’s a breakdown of how each parameter is implemented internally:

1. Temperature

Temperature is a parameter that adjusts the probability distribution of the next token by scaling the logits (raw scores) output by the model.

Transformer Explained

Ying Sun published on 06-15-2024 included in AI/ML Tensorflow

A very simple intro to the Transformer's basic concepts

llama v3

Ying Sun published on 05-15-2024

1
2
3


python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
--gpu-memory-utilization 0.80 --dtype bfloat16 \
--model gradientai/Llama-3-8B-Instruct-Gradient-4194k

1
2
3
4
5
6

        Read More

        Remote SSH into WSL2
    
Ying Sun published on 03-15-2024
In this post we show the steps to setup SSH into WSL2.
Reference:
https://www.hanselman.com/blog/how-to-ssh-into-wsl2-on-windows-10-from-an-external-machine
On WSL2


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


sudo apt install openssh-server
sudo vim /etc/ssh/sshd_config

# uncomment these two lines
Port 22
ListenAddress 0.0.0.0

sudo service ssh start

ip a
# assuming the ip is 192.168.121.141 and username is pi

        Read More

        Build a docker image with support CUDA and conda
    
Ying Sun published on 03-01-2024
1. environment.yaml


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


name: main
channels:
  - defaults
  - conda-forge
  - nvidia
  - pytorch
dependencies:
  - python==3.10
  - pip
  - numpy
  - pandas
  - pyarrow
  - grpcio
  - grpcio-tools
  - protobuf
  - pip:
    - vllm==0.3.0
    - google-cloud-bigquery==3.17.2
    - google-cloud-storage==2.14.0
    - google-cloud-aiplatform==1.41.0
    - google-auth==2.27.0
    - autoawq

        Read More

        Deploy a vllm hosted LLM on k8s
    
Ying Sun published on 02-10-2024
In this post we show the steps to deploy a LLM with vllm on a GCP GKE environment.
1.  Build a base docker image with cuda and conda
This docker image setup the env to host a LLM

cuda:12.1.1
miniconda
a conda env named “main”, with python3.10 and some python libs

1.1 The dockerfile


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

        Read More

                    
                        1
                    
                

                    
                        2
                    
                

                    
                        3
                    
                

                    …
                

                    
                        5
                    
                
            
        2024 Ying Sun