Skip to content

Commit 7f9c878

Browse files
committed
Update docs
1 parent 2332202 commit 7f9c878

File tree

7 files changed

+23
-24
lines changed

7 files changed

+23
-24
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Sample http route for GKE Gateway to route traffic to sglang InferencePool

site-src/_includes/model-server-cpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
=== "CPU-Based Model Server"
1+
=== "CPU-Based vLLM deployment"
22

33
???+ warning
44

site-src/_includes/model-server-gpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
=== "GPU-Based Model Server"
1+
=== "GPU-Based vLLM deployment"
22

33
For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas as needed.
44
Create a Hugging Face secret to download the model [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).

site-src/_includes/model-server-sim.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
=== "vLLM Simulator Model Server"
1+
=== "vLLM Simulator deployment"
22

33
This option uses the [vLLM simulator](https://github.com/llm-d/llm-d-inference-sim/tree/main) to simulate a backend model server.
44
This setup uses the least amount of compute resources, does not require GPU's, and is ideal for test/dev environments.

site-src/_includes/model-server.md

Lines changed: 0 additions & 19 deletions
This file was deleted.

site-src/_includes/sglang-gpu.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
=== "GPU-Based SGLang deployment"
2+
3+
For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas as needed.
4+
Create a Hugging Face secret to download the model [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
5+
Ensure that the token grants access to this model.
6+
7+
Deploy a sample SGLang deployment with the proper protocol to work with the LLM Instance Gateway.

site-src/guides/index.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,12 @@ IGW_LATEST_RELEASE=$(curl -s https://api.github.com/repos/kubernetes-sigs/gatewa
4242
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/vllm/sim-deployment.yaml
4343
```
4444

45+
--8<-- "site-src/_includes/sglang-gpu.md"
46+
47+
```bash
48+
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/sglang/gpu-deployment.yaml
49+
```
50+
4551
### Install the Inference Extension CRDs
4652

4753
```bash
@@ -153,11 +159,15 @@ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extens
153159
inference-gateway inference-gateway <MY_ADDRESS> True 22s
154160
```
155161
1. Deploy the HTTPRoute:
156-
162+
163+
For vllm deployment:
157164
```bash
158165
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/gateway/gke/httproute.yaml
159166
```
160-
167+
For sglang deployment:
168+
```bash
169+
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/gateway/gke/httproute-sglang.yaml
170+
```
161171
1. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
162172

163173
```bash

0 commit comments

Comments
 (0)