Contents

Run a LLaMA 30B model with llama.cpp

Model Card

https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML

1
curl -OL https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GGML/resolve/main/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1.bin

Test run

Q: “how can I get to the Mars, what are the options and analyze the cost for each option”

A: “There are several ways to get to Mars, but currently, none of them are affordable for an individual. The most feasible options are:

    1. Join a space mission: Government space agencies such as NASA or Roscosmos send missions to Mars regularly. You can try to join their crew as a scientist or astronaut. However, the competition is fierce, and you would need to have the right qualifications and experience.
    1. Private space travel: Companies like SpaceX and Blue Origin are working on developing private space travel options. They offer suborbital flights for tourists and plan to launch crewed missions to Mars in the future. However, these trips are currently only available to a select few and cost millions of dollars per person.
    1. Crowdfunding: You can try crowdfunding your trip to Mars. This option requires you to raise funds from multiple donors to cover the costs of the mission. However, this option is unlikely to be successful unless you have a strong social media following or a unique and compelling reason for going to Mars.

The cost of each option varies depending on the specifics of the mission. NASA’s budget for their next Mars mission is estimated at $2.7 billion. Private space travel options may cost millions of dollars per person, making them unaffordable for most people. Crowdfunding campaigns can raise tens or hundreds of thousands of dollars, but it would be difficult to cover the full cost of a Mars mission this way. Overall, getting to Mars is currently only feasible for government agencies and private companies with significant funding, making it unlikely for an individual to afford.”

1
2
3
4

./main -t 10 -ngl 32 -m models/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1.bin \
    --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 \
    -p "### Instruction:how can I get to the Mars, what are the options and analyze the cost for each option\n### Response:"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
main: build = 658 (4de0334)
main: seed  = 1686819337
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090
llama.cpp: loading model from models/Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required  = 11488.86 MB (+ 3124.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 layers to GPU
llama_model_load_internal: total VRAM used: 10719 MB
......................................................
llama_init_from_file: kv self size  = 3120.00 MB

system_info: n_threads = 10 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 0


 ### Instruction:how can I get to the Mars, what are the options and analyze the cost for each option\n### Response: There are several ways to get to Mars, but currently, none of them are affordable for an individual. The most feasible options are:
1. Join a space mission: Government space agencies such as NASA or Roscosmos send missions to Mars regularly. You can try to join their crew as a scientist or astronaut. However, the competition is fierce, and you would need to have the right qualifications and experience.
2. Private space travel: Companies like SpaceX and Blue Origin are working on developing private space travel options. They offer suborbital flights for tourists and plan to launch crewed missions to Mars in the future. However, these trips are currently only available to a select few and cost millions of dollars per person.
3. Crowdfunding: You can try crowdfunding your trip to Mars. This option requires you to raise funds from multiple donors to cover the costs of the mission. However, this option is unlikely to be successful unless you have a strong social media following or a unique and compelling reason for going to Mars.
The cost of each option varies depending on the specifics of the mission. NASA's budget for their next Mars mission is estimated at $2.7 billion. Private space travel options may cost millions of dollars per person, making them unaffordable for most people. Crowdfunding campaigns can raise tens or hundreds of thousands of dollars, but it would be difficult to cover the full cost of a Mars mission this way. Overall, getting to Mars is currently only feasible for government agencies and private companies with significant funding, making it unlikely for an individual to afford. [end of text]

llama_print_timings:        load time =  7572.50 ms
llama_print_timings:      sample time =   104.06 ms /   353 runs   (    0.29 ms per token)
llama_print_timings: prompt eval time =  2807.09 ms /    30 tokens (   93.57 ms per token)
llama_print_timings:        eval time = 141835.98 ms /   352 runs   (  402.94 ms per token)
llama_print_timings:       total time = 149561.21 ms

Host a API

offload 32 layers to GPU

1
2

python -m llama_cpp.server --n_gpu_layers 32 --model models/Wizard-Vicuna-30B-Uncensored.ggmlv3.q2_K.bin