Skip to content

Commit b8560f0

Browse files
Deploying to gh-pages from @ dstackai/dstack@40cb4a0 🚀
1 parent 78a3e1e commit b8560f0

File tree

10 files changed

+134
-84
lines changed

10 files changed

+134
-84
lines changed

assets/images/social/examples.png

369 Bytes
Loading

assets/images/social/partners.png

335 Bytes
Loading

docs/concepts/gateways/index.html

Lines changed: 66 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1056,6 +1056,19 @@
10561056
</span>
10571057
</a>
10581058

1059+
</li>
1060+
1061+
<li class="md-nav__item">
1062+
<a href="#router" class="md-nav__link">
1063+
<span class="md-ellipsis">
1064+
1065+
<span class="md-typeset">
1066+
Router
1067+
</span>
1068+
1069+
</span>
1070+
</a>
1071+
10591072
</li>
10601073

10611074
<li class="md-nav__item">
@@ -3884,6 +3897,19 @@
38843897
</span>
38853898
</a>
38863899

3900+
</li>
3901+
3902+
<li class="md-nav__item">
3903+
<a href="#router" class="md-nav__link">
3904+
<span class="md-ellipsis">
3905+
3906+
<span class="md-typeset">
3907+
Router
3908+
</span>
3909+
3910+
</span>
3911+
</a>
3912+
38873913
</li>
38883914

38893915
<li class="md-nav__item">
@@ -4045,12 +4071,10 @@
40454071

40464072

40474073
<h1 id="gateways">Gateways<a class="headerlink" href="#gateways" title="Permanent link">&para;</a></h1>
4048-
<p>Gateways manage the ingress traffic of running <a href="../services/">services</a>,
4049-
provide an HTTPS endpoint mapped to your domain, handle auto-scaling and rate limits.</p>
4050-
<blockquote>
4051-
<p>If you're using <a href="https://sky.dstack.ai" target="_blank">dstack Sky <span class="twemoji external"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>,
4052-
the gateway is already set up for you.</p>
4053-
</blockquote>
4074+
<p>Gateways manage ingress traffic for running <a href="../services/">services</a>, handle auto-scaling and rate limits, enable HTTPS, and allow you to configure a custom domain. They also support custom routers, such as the <a href="https://docs.sglang.ai/advanced_features/router.html#" target="_blank">SGLang Model Gateway <span class="twemoji external"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>.</p>
4075+
<!-- > If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
4076+
> the gateway is already set up for you. -->
4077+
40544078
<h2 id="apply-a-configuration">Apply a configuration<a class="headerlink" href="#apply-a-configuration" title="Permanent link">&para;</a></h2>
40554079
<p>First, define a gateway configuration as a YAML file in your project folder.
40564080
The filename must end with <code>.dstack.yml</code> (e.g. <code>.dstack.yml</code> or <code>gateway.dstack.yml</code> are both acceptable).</p>
@@ -4094,6 +4118,42 @@ <h3 id="backend">Backend<a class="headerlink" href="#backend" title="Permanent l
40944118
<p>Gateways in <code>kubernetes</code> backend require an external load balancer. Managed Kubernetes solutions usually include a load balancer.
40954119
For self-hosted Kubernetes, you must provide a load balancer by yourself.</p>
40964120
</details>
4121+
<h3 id="router">Router<a class="headerlink" href="#router" title="Permanent link">&para;</a></h3>
4122+
<p>By default, the gateway uses its own load balancer to route traffic between replicas. However, you can delegate this responsibility to a specific router by setting the <code>router</code> property. Currently, the only supported external router is <code>sglang</code>.</p>
4123+
<h4 id="sglang">SGLang<a class="headerlink" href="#sglang" title="Permanent link">&para;</a></h4>
4124+
<p>The <code>sglang</code> router delegates routing logic to the <a href="https://docs.sglang.ai/advanced_features/router.html#" target="_blank">SGLang Model Gateway <span class="twemoji external"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>.</p>
4125+
<p>To enable it, set <code>type</code> field under <code>router</code> to <code>sglang</code>:</p>
4126+
<div editor-title="gateway.dstack.yml">
4127+
4128+
<div class="highlight"><pre><span></span><code><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">gateway</span>
4129+
<span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">sglang-gateway</span>
4130+
4131+
<span class="nt">backend</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">aws</span>
4132+
<span class="nt">region</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">eu-west-1</span>
4133+
4134+
<span class="nt">domain</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">example.com</span>
4135+
4136+
<span class="nt">router</span><span class="p">:</span>
4137+
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">sglang</span>
4138+
<span class="w"> </span><span class="nt">policy</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cache_aware</span>
4139+
</code></pre></div>
4140+
4141+
</div>
4142+
4143+
<div class="admonition info">
4144+
<p class="admonition-title">Policy</p>
4145+
<p>The <code>router</code> property allows you to configure the routing <code>policy</code>:</p>
4146+
<ul>
4147+
<li><code>cache_aware</code> &mdash; Default policy; combines cache locality with load balancing, falling back to shortest queue. </li>
4148+
<li><code>power_of_two</code> &mdash; Samples two workers and picks the lighter one. </li>
4149+
<li><code>random</code> &mdash; Uniform random selection. </li>
4150+
<li><code>round_robin</code> &mdash; Cycles through workers in order. </li>
4151+
</ul>
4152+
</div>
4153+
<blockquote>
4154+
<p>Currently, services using this type of gateway must run standard SGLang workers. See the <a href="../../../examples/inference/sglang/">example</a>.</p>
4155+
<p>Support for prefill/decode disaggregation and auto-scaling based on inter-token latency is coming soon.</p>
4156+
</blockquote>
40974157
<h3 id="public-ip">Public IP<a class="headerlink" href="#public-ip" title="Permanent link">&para;</a></h3>
40984158
<p>If you don't need/want a public IP for the gateway, you can set the <code>public_ip</code> to <code>false</code> (the default value is <code>true</code>), making the gateway private.
40994159
Private gateways are currently supported in <code>aws</code> and <code>gcp</code> backends.</p>

docs/concepts/services/index.html

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4486,12 +4486,13 @@ <h2 id="apply-a-configuration">Apply a configuration<a class="headerlink" href="
44864486
<p>However, you'll need a gateway in the following cases:</p>
44874487
<ul>
44884488
<li>To use auto-scaling or rate limits</li>
4489+
<li>To enable a support custom router, e.g. such as the <a href="https://docs.sglang.ai/advanced_features/router.html#" target="_blank">SGLang Model Gateway <span class="twemoji external"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a></li>
44894490
<li>To enable HTTPS for the endpoint and map it to your domain</li>
44904491
<li>If your service requires WebSockets</li>
44914492
<li>If your service cannot work with a <a href="#path-prefix">path prefix</a></li>
44924493
</ul>
4493-
<p>Note, if you're using <a href="https://sky.dstack.ai" target="_blank">dstack Sky <span class="twemoji external"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"/></svg></span></a>,
4494-
a gateway is already pre-configured for you.</p>
4494+
<!-- Note, if you're using <a href="https://sky.dstack.ai">dstack Sky klzzwxh:0160{ .external }</a>{:target="_blank"},
4495+
a gateway is already pre-configured for you. -->
44954496
<p>If a <a href="../gateways/">gateway</a> is configured, the service endpoint will be accessible at
44964497
<code>https://&lt;run name&gt;.&lt;gateway domain&gt;/</code>.</p>
44974498
<p>If the service defines the <code>model</code> property, the model will be available via the global OpenAI-compatible endpoint

docs/reference/api/rest/openapi.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/reference/cli/dstack/server/index.html

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3862,13 +3862,14 @@ <h2 id="usage">Usage<a class="headerlink" href="#usage" title="Permanent link">&
38623862
<div class="termy">
38633863

38643864
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>dstack<span class="w"> </span>server<span class="w"> </span>--help
3865-
Usage:<span class="w"> </span>dstack<span class="w"> </span>server<span class="w"> </span><span class="o">[</span>-h<span class="o">]</span><span class="w"> </span><span class="o">[</span>--host<span class="w"> </span>HOST<span class="o">]</span><span class="w"> </span><span class="o">[</span>-p<span class="w"> </span>PORT<span class="o">]</span><span class="w"> </span><span class="o">[</span>-l<span class="w"> </span>LOG_LEVEL<span class="o">]</span><span class="w"> </span><span class="o">[</span>-y<span class="o">]</span><span class="w"> </span><span class="o">[</span>-n<span class="o">]</span>
3866-
<span class="w"> </span><span class="o">[</span>--token<span class="w"> </span>TOKEN<span class="o">]</span>
3865+
Usage:<span class="w"> </span>dstack<span class="w"> </span>server<span class="w"> </span><span class="o">[</span>-h<span class="o">]</span><span class="w"> </span><span class="o">[</span>--host<span class="w"> </span>HOST<span class="o">]</span><span class="w"> </span><span class="o">[</span>-p<span class="w"> </span>PORT<span class="o">]</span><span class="w"> </span><span class="o">[</span>-d<span class="w"> </span><span class="p">|</span><span class="w"> </span>-l<span class="w"> </span>LOG_LEVEL<span class="o">]</span><span class="w"> </span><span class="o">[</span>-y<span class="o">]</span>
3866+
<span class="w"> </span><span class="o">[</span>-n<span class="o">]</span><span class="w"> </span><span class="o">[</span>--token<span class="w"> </span>TOKEN<span class="o">]</span>
38673867

38683868
Options:
38693869
<span class="w"> </span>-h,<span class="w"> </span>--help<span class="w"> </span>Show<span class="w"> </span>this<span class="w"> </span><span class="nb">help</span><span class="w"> </span>message<span class="w"> </span>and<span class="w"> </span><span class="nb">exit</span>
38703870
<span class="w"> </span>--host<span class="w"> </span>HOST<span class="w"> </span>Bind<span class="w"> </span>socket<span class="w"> </span>to<span class="w"> </span>this<span class="w"> </span>host.<span class="w"> </span>Defaults<span class="w"> </span>to<span class="w"> </span><span class="m">127</span>.0.0.1
38713871
<span class="w"> </span>-p,<span class="w"> </span>--port<span class="w"> </span>PORT<span class="w"> </span>Bind<span class="w"> </span>socket<span class="w"> </span>to<span class="w"> </span>this<span class="w"> </span>port.<span class="w"> </span>Defaults<span class="w"> </span>to<span class="w"> </span><span class="m">3000</span>.
3872+
<span class="w"> </span>-d,<span class="w"> </span>--debug<span class="w"> </span>Enable<span class="w"> </span>debug<span class="w"> </span>logging<span class="w"> </span>level<span class="w"> </span><span class="o">(</span>same<span class="w"> </span>as<span class="w"> </span>-l<span class="w"> </span>debug<span class="o">)</span>
38723873
<span class="w"> </span>-l,<span class="w"> </span>--log-level<span class="w"> </span>LOG_LEVEL
38733874
<span class="w"> </span>Server<span class="w"> </span>logging<span class="w"> </span>level.<span class="w"> </span>Defaults<span class="w"> </span>to<span class="w"> </span>INFO.
38743875
<span class="w"> </span>-y,<span class="w"> </span>--yes<span class="w"> </span>Don<span class="s1">&#39;t ask for confirmation (e.g. update the config)</span>

docs/reference/dstack.yml/gateway/index.html

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1520,6 +1520,17 @@
15201520
<nav class="md-nav" aria-label="Root reference">
15211521
<ul class="md-nav__list">
15221522

1523+
<li class="md-nav__item">
1524+
<a href="#router" class="md-nav__link">
1525+
<span class="md-ellipsis">
1526+
1527+
router
1528+
1529+
</span>
1530+
</a>
1531+
1532+
</li>
1533+
15231534
<li class="md-nav__item">
15241535
<a href="#certificate" class="md-nav__link">
15251536
<span class="md-ellipsis">
@@ -3776,6 +3787,17 @@
37763787
<nav class="md-nav" aria-label="Root reference">
37773788
<ul class="md-nav__list">
37783789

3790+
<li class="md-nav__item">
3791+
<a href="#router" class="md-nav__link">
3792+
<span class="md-ellipsis">
3793+
3794+
router
3795+
3796+
</span>
3797+
</a>
3798+
3799+
</li>
3800+
37793801
<li class="md-nav__item">
37803802
<a href="#certificate" class="md-nav__link">
37813803
<span class="md-ellipsis">
@@ -3902,8 +3924,17 @@ <h6 class="reference-item" id="domain"><code>domain</code> - (Optional) The gate
39023924
<h6 class="reference-item" id="public_ip"><code>public_ip</code> - (Optional) Allocate public IP for the gateway. Defaults to <code>True</code>.<a class="headerlink" href="#public_ip" title="Permanent link">&para;</a></h6>
39033925
<h6 class="reference-item" id="_certificate"><a href="#certificate"><code>certificate</code></a> - (Optional) The SSL certificate configuration. Defaults to <code>type: lets-encrypt</code>.<a class="headerlink" href="#_certificate" title="Permanent link">&para;</a></h6>
39043926
<h6 class="reference-item" id="tags"><code>tags</code> - (Optional) The custom tags to associate with the gateway. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them.<a class="headerlink" href="#tags" title="Permanent link">&para;</a></h6>
3927+
<h3 id="router"><code>router</code><a class="headerlink" href="#router" title="Permanent link">&para;</a></h3>
3928+
<div class="tabbed-set tabbed-alternate" data-tabs="1:1"><input checked="checked" id="sglang-model-gateway" name="__tabbed_1" type="radio" /><div class="tabbed-labels"><label for="sglang-model-gateway">SGLang Model Gateway</label></div>
3929+
<div class="tabbed-content">
3930+
<div class="tabbed-block">
3931+
<h6 class="reference-item" id="type"><code>type</code> - The router type. Must be <code>sglang</code>.<a class="headerlink" href="#type" title="Permanent link">&para;</a></h6>
3932+
<h6 class="reference-item" id="policy"><code>policy</code> - (Optional) The routing policy. Options: <code>random</code>, <code>round_robin</code>, <code>cache_aware</code>, <code>power_of_two</code>. Defaults to <code>cache_aware</code>.<a class="headerlink" href="#policy" title="Permanent link">&para;</a></h6>
3933+
</div>
3934+
</div>
3935+
</div>
39053936
<h3 id="certificate"><code>certificate</code><a class="headerlink" href="#certificate" title="Permanent link">&para;</a></h3>
3906-
<div class="tabbed-set tabbed-alternate" data-tabs="1:2"><input checked="checked" id="lets-encrypt" name="__tabbed_1" type="radio" /><input id="acm" name="__tabbed_1" type="radio" /><div class="tabbed-labels"><label for="lets-encrypt">Let's encrypt</label><label for="acm">ACM</label></div>
3937+
<div class="tabbed-set tabbed-alternate" data-tabs="2:2"><input checked="checked" id="lets-encrypt" name="__tabbed_2" type="radio" /><input id="acm" name="__tabbed_2" type="radio" /><div class="tabbed-labels"><label for="lets-encrypt">Let's encrypt</label><label for="acm">ACM</label></div>
39073938
<div class="tabbed-content">
39083939
<div class="tabbed-block">
39093940
<h6 class="reference-item" id="type"><code>type</code> - Automatic certificates by Let's Encrypt. Must be <code>lets-encrypt</code>.<a class="headerlink" href="#type" title="Permanent link">&para;</a></h6>

docs/reference/plugins/rest_plugin/rest_plugin_openapi.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)