Skip to content

Commit 811643f

Browse files
authored
feat(logging): add fluent-bit log shipping (#3431)
* feat(logging): add fluent-bit log shipping Implements #3430. This PR is partially implemented using Cursor. * Fix pyright errors by using try/except/else pattern for optional imports * refactor(fluentbit): cleanup protocol lambdas and address codex comments * feat(fluentbit): validate next_token format and raise ServerClientError for malformed tokens * chore(fluentbit): address quick comments * feat(fluentbit): add tag prefix support to HTTPFluentBitWriter
1 parent a07ef35 commit 811643f

File tree

7 files changed

+1120
-4
lines changed

7 files changed

+1120
-4
lines changed

docs/docs/guides/server-deployment.md

Lines changed: 77 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ $ DSTACK_DATABASE_URL=postgresql+asyncpg://user:password@db-host:5432/dstack dst
159159

160160
By default, `dstack` stores workload logs locally in `~/.dstack/server/projects/<project_name>/logs`.
161161
For multi-replica server deployments, it's required to store logs externally.
162-
`dstack` supports storing logs using AWS CloudWatch or GCP Logging.
162+
`dstack` supports storing logs using AWS CloudWatch, GCP Logging, or Fluent-bit with Elasticsearch / Opensearch.
163163

164164
### AWS CloudWatch
165165

@@ -222,6 +222,78 @@ To store logs using GCP Logging, set the `DSTACK_SERVER_GCP_LOGGING_PROJECT` env
222222

223223
</div>
224224

225+
### Fluent-bit
226+
227+
To store logs using Fluent-bit, set the `DSTACK_SERVER_FLUENTBIT_HOST` environment variable.
228+
Fluent-bit supports two modes depending on how you want to access logs.
229+
230+
=== "Full mode"
231+
232+
Logs are shipped to Fluent-bit and can be read back through the dstack UI and CLI via Elasticsearch or OpenSearch.
233+
Use this mode when you want a complete integration with log viewing in dstack:
234+
235+
```shell
236+
$ DSTACK_SERVER_FLUENTBIT_HOST=fluentbit.example.com \
237+
DSTACK_SERVER_ELASTICSEARCH_HOST=https://elasticsearch.example.com:9200 \
238+
dstack server
239+
```
240+
241+
=== "Ship-only mode"
242+
243+
Logs are forwarded to Fluent-bit but cannot be read through `dstack`.
244+
The dstack UI/CLI will show empty logs. Use this mode when:
245+
246+
- You have an existing logging infrastructure (Kibana, Grafana, Datadog, etc.)
247+
- You only need to forward logs without reading them back through dstack
248+
- You want to reduce operational complexity by not running Elasticsearch/OpenSearch
249+
250+
```shell
251+
$ DSTACK_SERVER_FLUENTBIT_HOST=fluentbit.example.com \
252+
dstack server
253+
```
254+
255+
??? info "Additional configuration"
256+
The following optional environment variables can be used to customize the Fluent-bit integration:
257+
258+
**Fluent-bit settings:**
259+
260+
- `DSTACK_SERVER_FLUENTBIT_PORT` – The Fluent-bit port. Defaults to `24224`.
261+
- `DSTACK_SERVER_FLUENTBIT_PROTOCOL` – The protocol to use: `forward` or `http`. Defaults to `forward`.
262+
- `DSTACK_SERVER_FLUENTBIT_TAG_PREFIX` – The tag prefix for logs. Defaults to `dstack`.
263+
264+
**Elasticsearch/OpenSearch settings (for full mode only):**
265+
266+
- `DSTACK_SERVER_ELASTICSEARCH_HOST` – The Elasticsearch/OpenSearch host for reading logs. If not set, runs in ship-only mode.
267+
- `DSTACK_SERVER_ELASTICSEARCH_INDEX` – The Elasticsearch/OpenSearch index pattern. Defaults to `dstack-logs`.
268+
- `DSTACK_SERVER_ELASTICSEARCH_API_KEY` – The Elasticsearch/OpenSearch API key for authentication.
269+
270+
??? info "Fluent-bit configuration"
271+
Configure Fluent-bit to receive logs and forward them to Elasticsearch or OpenSearch. Example configuration:
272+
273+
```ini
274+
[INPUT]
275+
Name forward
276+
Listen 0.0.0.0
277+
Port 24224
278+
279+
[OUTPUT]
280+
Name es
281+
Match dstack.*
282+
Host elasticsearch.example.com
283+
Port 9200
284+
Index dstack-logs
285+
Suppress_Type_Name On
286+
```
287+
288+
??? info "Required dependencies"
289+
To use Fluent-bit log storage, install the `fluentbit` extras:
290+
291+
```shell
292+
$ pip install "dstack[all]" -U
293+
# or
294+
$ pip install "dstack[fluentbit]" -U
295+
```
296+
225297
## File storage
226298

227299
When using [files](../concepts/dev-environments.md#files) or [repos](../concepts/dev-environments.md#repos), `dstack` uploads local files and diffs to the server so that you can have access to them within runs. By default, the files are stored in the DB and each upload is limited to 2MB. You can configure an object storage to be used for uploads and increase the default limit by setting the `DSTACK_SERVER_CODE_UPLOAD_LIMIT` environment variable
@@ -426,8 +498,10 @@ If a deployment is stuck due to a deadlock when applying DB migrations, try scal
426498

427499
??? info "Can I run multiple replicas of dstack server?"
428500

429-
Yes, you can if you configure `dstack` to use [PostgreSQL](#postgresql) and [AWS CloudWatch](#aws-cloudwatch).
501+
Yes, you can if you configure `dstack` to use [PostgreSQL](#postgresql) and an external log storage
502+
such as [AWS CloudWatch](#aws-cloudwatch), [GCP Logging](#gcp-logging), or [Fluent-bit](#fluent-bit).
430503

431504
??? info "Does dstack server support blue-green or rolling deployments?"
432505

433-
Yes, it does if you configure `dstack` to use [PostgreSQL](#postgresql) and [AWS CloudWatch](#aws-cloudwatch).
506+
Yes, it does if you configure `dstack` to use [PostgreSQL](#postgresql) and an external log storage
507+
such as [AWS CloudWatch](#aws-cloudwatch), [GCP Logging](#gcp-logging), or [Fluent-bit](#fluent-bit).

docs/docs/reference/environment-variables.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,13 @@ For more details on the options below, refer to the [server deployment](../guide
113113
- `DSTACK_SERVER_CLOUDWATCH_LOG_GROUP`{ #DSTACK_SERVER_CLOUDWATCH_LOG_GROUP } – The CloudWatch Logs group for storing workloads logs. If not set, the default file-based log storage is used.
114114
- `DSTACK_SERVER_CLOUDWATCH_LOG_REGION`{ #DSTACK_SERVER_CLOUDWATCH_LOG_REGION } – The CloudWatch Logs region. Defaults to `None`.
115115
- `DSTACK_SERVER_GCP_LOGGING_PROJECT`{ #DSTACK_SERVER_GCP_LOGGING_PROJECT } – The GCP Logging project for storing workloads logs. If not set, the default file-based log storage is used.
116+
- `DSTACK_SERVER_FLUENTBIT_HOST`{ #DSTACK_SERVER_FLUENTBIT_HOST } – The Fluent-bit host for log forwarding. If set, enables Fluent-bit log storage.
117+
- `DSTACK_SERVER_FLUENTBIT_PORT`{ #DSTACK_SERVER_FLUENTBIT_PORT } – The Fluent-bit port. Defaults to `24224`.
118+
- `DSTACK_SERVER_FLUENTBIT_PROTOCOL`{ #DSTACK_SERVER_FLUENTBIT_PROTOCOL } – The protocol to use: `forward` or `http`. Defaults to `forward`.
119+
- `DSTACK_SERVER_FLUENTBIT_TAG_PREFIX`{ #DSTACK_SERVER_FLUENTBIT_TAG_PREFIX } – The tag prefix for logs. Defaults to `dstack`.
120+
- `DSTACK_SERVER_ELASTICSEARCH_HOST`{ #DSTACK_SERVER_ELASTICSEARCH_HOST } – The Elasticsearch/OpenSearch host for reading logs back through dstack. Optional; if not set, Fluent-bit runs in ship-only mode (logs are forwarded but not readable through dstack UI/CLI).
121+
- `DSTACK_SERVER_ELASTICSEARCH_INDEX`{ #DSTACK_SERVER_ELASTICSEARCH_INDEX } – The Elasticsearch/OpenSearch index pattern. Defaults to `dstack-logs`.
122+
- `DSTACK_SERVER_ELASTICSEARCH_API_KEY`{ #DSTACK_SERVER_ELASTICSEARCH_API_KEY } – The Elasticsearch/OpenSearch API key for authentication.
116123
- `DSTACK_ENABLE_PROMETHEUS_METRICS`{ #DSTACK_ENABLE_PROMETHEUS_METRICS } — Enables Prometheus metrics collection and export.
117124
- `DSTACK_DEFAULT_SERVICE_CLIENT_MAX_BODY_SIZE`{ #DSTACK_DEFAULT_SERVICE_CLIENT_MAX_BODY_SIZE } – Request body size limit for services running with a gateway, in bytes. Defaults to 64 MiB.
118125
- `DSTACK_SERVICE_CLIENT_TIMEOUT`{ #DSTACK_SERVICE_CLIENT_TIMEOUT } – Timeout in seconds for HTTP requests sent from the in-server proxy and gateways to service replicas. Defaults to 60.

pyproject.toml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,11 @@ nebius = [
215215
"nebius>=0.3.4,<0.4; python_version >= '3.10'",
216216
"dstack[server]",
217217
]
218+
fluentbit = [
219+
"fluent-logger>=0.10.0",
220+
"elasticsearch>=8.0.0",
221+
"dstack[server]",
222+
]
218223
all = [
219-
"dstack[gateway,server,aws,azure,gcp,verda,kubernetes,lambda,nebius,oci]",
224+
"dstack[gateway,server,aws,azure,gcp,verda,kubernetes,lambda,nebius,oci,fluentbit]",
220225
]

src/dstack/_internal/server/services/logs/__init__.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from dstack._internal.server.schemas.logs import PollLogsRequest
99
from dstack._internal.server.schemas.runner import LogEvent as RunnerLogEvent
1010
from dstack._internal.server.services.logs import aws as aws_logs
11+
from dstack._internal.server.services.logs import fluentbit as fluentbit_logs
1112
from dstack._internal.server.services.logs import gcp as gcp_logs
1213
from dstack._internal.server.services.logs.base import (
1314
LogStorage,
@@ -57,6 +58,29 @@ def get_log_storage() -> LogStorage:
5758
logger.debug("Using GCP Logs storage")
5859
else:
5960
logger.error("Cannot use GCP Logs storage: GCP deps are not installed")
61+
elif settings.SERVER_FLUENTBIT_HOST:
62+
if fluentbit_logs.FLUENTBIT_AVAILABLE:
63+
try:
64+
_log_storage = fluentbit_logs.FluentBitLogStorage(
65+
host=settings.SERVER_FLUENTBIT_HOST,
66+
port=settings.SERVER_FLUENTBIT_PORT,
67+
protocol=settings.SERVER_FLUENTBIT_PROTOCOL,
68+
tag_prefix=settings.SERVER_FLUENTBIT_TAG_PREFIX,
69+
es_host=settings.SERVER_ELASTICSEARCH_HOST,
70+
es_index=settings.SERVER_ELASTICSEARCH_INDEX,
71+
es_api_key=settings.SERVER_ELASTICSEARCH_API_KEY,
72+
)
73+
except LogStorageError as e:
74+
logger.error("Failed to initialize Fluent-bit Logs storage: %s", e)
75+
except Exception:
76+
logger.exception("Got exception when initializing Fluent-bit Logs storage")
77+
else:
78+
if settings.SERVER_ELASTICSEARCH_HOST:
79+
logger.debug("Using Fluent-bit Logs storage with Elasticsearch/OpenSearch")
80+
else:
81+
logger.debug("Using Fluent-bit Logs storage in ship-only mode")
82+
else:
83+
logger.error("Cannot use Fluent-bit Logs storage: fluent-logger is not installed")
6084
if _log_storage is None:
6185
_log_storage = FileLogStorage()
6286
logger.debug("Using file-based storage")

0 commit comments

Comments
 (0)