|
1056 | 1056 | </span> |
1057 | 1057 | </a> |
1058 | 1058 |
|
| 1059 | +</li> |
| 1060 | + |
| 1061 | + <li class="md-nav__item"> |
| 1062 | + <a href="#router" class="md-nav__link"> |
| 1063 | + <span class="md-ellipsis"> |
| 1064 | + |
| 1065 | + <span class="md-typeset"> |
| 1066 | + Router |
| 1067 | + </span> |
| 1068 | + |
| 1069 | + </span> |
| 1070 | + </a> |
| 1071 | + |
1059 | 1072 | </li> |
1060 | 1073 |
|
1061 | 1074 | <li class="md-nav__item"> |
|
3884 | 3897 | </span> |
3885 | 3898 | </a> |
3886 | 3899 |
|
| 3900 | +</li> |
| 3901 | + |
| 3902 | + <li class="md-nav__item"> |
| 3903 | + <a href="#router" class="md-nav__link"> |
| 3904 | + <span class="md-ellipsis"> |
| 3905 | + |
| 3906 | + <span class="md-typeset"> |
| 3907 | + Router |
| 3908 | + </span> |
| 3909 | + |
| 3910 | + </span> |
| 3911 | + </a> |
| 3912 | + |
3887 | 3913 | </li> |
3888 | 3914 |
|
3889 | 3915 | <li class="md-nav__item"> |
|
4045 | 4071 |
|
4046 | 4072 |
|
4047 | 4073 | <h1 id="gateways">Gateways<a class="headerlink" href="#gateways" title="Permanent link">¶</a></h1> |
4048 | | -<p>Gateways manage the ingress traffic of running <a href="../services/">services</a>, |
4049 | | -provide an HTTPS endpoint mapped to your domain, handle auto-scaling and rate limits.</p> |
4050 | | -<blockquote> |
4051 | | -<p>If you're using <a href="https://sky.dstack.ai" target="_blank">dstack Sky <span class="twemoji external"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>, |
4052 | | -the gateway is already set up for you.</p> |
4053 | | -</blockquote> |
| 4074 | +<p>Gateways manage ingress traffic for running <a href="../services/">services</a>, handle auto-scaling and rate limits, enable HTTPS, and allow you to configure a custom domain. They also support custom routers, such as the <a href="https://docs.sglang.ai/advanced_features/router.html#" target="_blank">SGLang Model Gateway <span class="twemoji external"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>.</p> |
| 4075 | +<!-- > If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}, |
| 4076 | +> the gateway is already set up for you. --> |
| 4077 | + |
4054 | 4078 | <h2 id="apply-a-configuration">Apply a configuration<a class="headerlink" href="#apply-a-configuration" title="Permanent link">¶</a></h2> |
4055 | 4079 | <p>First, define a gateway configuration as a YAML file in your project folder. |
4056 | 4080 | The filename must end with <code>.dstack.yml</code> (e.g. <code>.dstack.yml</code> or <code>gateway.dstack.yml</code> are both acceptable).</p> |
@@ -4094,6 +4118,42 @@ <h3 id="backend">Backend<a class="headerlink" href="#backend" title="Permanent l |
4094 | 4118 | <p>Gateways in <code>kubernetes</code> backend require an external load balancer. Managed Kubernetes solutions usually include a load balancer. |
4095 | 4119 | For self-hosted Kubernetes, you must provide a load balancer by yourself.</p> |
4096 | 4120 | </details> |
| 4121 | +<h3 id="router">Router<a class="headerlink" href="#router" title="Permanent link">¶</a></h3> |
| 4122 | +<p>By default, the gateway uses its own load balancer to route traffic between replicas. However, you can delegate this responsibility to a specific router by setting the <code>router</code> property. Currently, the only supported external router is <code>sglang</code>.</p> |
| 4123 | +<h4 id="sglang">SGLang<a class="headerlink" href="#sglang" title="Permanent link">¶</a></h4> |
| 4124 | +<p>The <code>sglang</code> router delegates routing logic to the <a href="https://docs.sglang.ai/advanced_features/router.html#" target="_blank">SGLang Model Gateway <span class="twemoji external"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>.</p> |
| 4125 | +<p>To enable it, set <code>type</code> field under <code>router</code> to <code>sglang</code>:</p> |
| 4126 | +<div editor-title="gateway.dstack.yml"> |
| 4127 | + |
| 4128 | +<div class="highlight"><pre><span></span><code><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">gateway</span> |
| 4129 | +<span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">sglang-gateway</span> |
| 4130 | + |
| 4131 | +<span class="nt">backend</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">aws</span> |
| 4132 | +<span class="nt">region</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">eu-west-1</span> |
| 4133 | + |
| 4134 | +<span class="nt">domain</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">example.com</span> |
| 4135 | + |
| 4136 | +<span class="nt">router</span><span class="p">:</span> |
| 4137 | +<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">sglang</span> |
| 4138 | +<span class="w"> </span><span class="nt">policy</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cache_aware</span> |
| 4139 | +</code></pre></div> |
| 4140 | + |
| 4141 | +</div> |
| 4142 | + |
| 4143 | +<div class="admonition info"> |
| 4144 | +<p class="admonition-title">Policy</p> |
| 4145 | +<p>The <code>router</code> property allows you to configure the routing <code>policy</code>:</p> |
| 4146 | +<ul> |
| 4147 | +<li><code>cache_aware</code> — Default policy; combines cache locality with load balancing, falling back to shortest queue. </li> |
| 4148 | +<li><code>power_of_two</code> — Samples two workers and picks the lighter one. </li> |
| 4149 | +<li><code>random</code> — Uniform random selection. </li> |
| 4150 | +<li><code>round_robin</code> — Cycles through workers in order. </li> |
| 4151 | +</ul> |
| 4152 | +</div> |
| 4153 | +<blockquote> |
| 4154 | +<p>Currently, services using this type of gateway must run standard SGLang workers. See the <a href="../../../examples/inference/sglang/">example</a>.</p> |
| 4155 | +<p>Support for prefill/decode disaggregation and auto-scaling based on inter-token latency is coming soon.</p> |
| 4156 | +</blockquote> |
4097 | 4157 | <h3 id="public-ip">Public IP<a class="headerlink" href="#public-ip" title="Permanent link">¶</a></h3> |
4098 | 4158 | <p>If you don't need/want a public IP for the gateway, you can set the <code>public_ip</code> to <code>false</code> (the default value is <code>true</code>), making the gateway private. |
4099 | 4159 | Private gateways are currently supported in <code>aws</code> and <code>gcp</code> backends.</p> |
|
0 commit comments