infrapuzzle/k8s/llm_hosting.md

25 lines
1.0 KiB
Markdown

# llm-hosting
This is an extended article to not have to write everything in the main readme. This chapter takes care of hosting llm models on the server.
## deploy
```sh
kubectl apply -f llm/llama_cpp_hosting.yaml
```
## development
```sh
```
## links
Two examples of model files that are currently tried out:
* [https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct.IQ1_S.gguf?download=true](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct.IQ1_S.gguf?download=true)
* From [this page](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/tree/main).
* [https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true)
* From [this page](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main).