infrapuzzle/k8s/llm_hosting.md

# llm-hosting

This is an extended article to not have to write everything in the main readme. This chapter takes care of hosting llm models on the server.

## deploy

```sh
kubectl apply -f llm/llama_cpp_hosting.yaml
```

## development

```sh

```

## links

Two examples of model files that are currently tried out:
* [https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct.IQ1_S.gguf?download=true](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct.IQ1_S.gguf?download=true)
  * From [this page](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/tree/main).
* [https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q8_0.gguf?download=true)
  * From [this page](https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main).