25 lines
1.0 KiB
Markdown
25 lines
1.0 KiB
Markdown
# llm-hosting
|
|
|
|
This is an extended article to not have to write everything in the main readme. This chapter takes care of hosting llm models on the server.
|
|
|
|
## deploy
|
|
|
|
```sh
|
|
kubectl apply -f llm/llama_cpp_hosting.yaml
|
|
```
|
|
|
|
## development
|
|
|
|
```sh
|
|
|
|
```
|
|
|
|
## links
|
|
|
|
Two examples of model files that are currently tried out:
|
|
* [https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct.IQ1_S.gguf?download=true](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct.IQ1_S.gguf?download=true)
|
|
* From [this page](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/tree/main).
|
|
* [https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true)
|
|
* From [this page](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main).
|
|
|