llm-hosting

This is an extended article to not have to write everything in the main readme. This chapter takes care of hosting llm models on the server.

deploy

kubectl apply -f llm/llama_cpp_hosting.yaml