Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/ai-integration/ai-tasks-list-view.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ import LanguageContent from "@site/src/components/LanguageContent";
* In the **AI Tasks - List view**, you can manage RavenDB's AI tasks -
create new tasks, edit existing ones, or delete them as needed.

* To inspect errors raised by AI tasks and how those errors affect each task's health,
use the [AI Task Errors view](../monitoring/task-errors/studio-views.mdx#ai-task-errors-view).
See the [Task errors overview](../monitoring/task-errors/overview.mdx) for an introduction.

* In this article:
* [AI Tasks - list view](../ai-integration/ai-tasks-list-view.mdx#ai-tasks---list-view)

Expand Down
19 changes: 19 additions & 0 deletions docs/ai-integration/gen-ai-integration/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ import LanguageContent from "@site/src/components/LanguageContent";
* [How to create and run a GenAI task](../../ai-integration/gen-ai-integration/overview.mdx#how-to-create-and-run-a-genai-task)
* [Runtime](../../ai-integration/gen-ai-integration/overview.mdx#runtime)
* [Tracking of processed document parts](../../ai-integration/gen-ai-integration/overview.mdx#tracking-of-processed-document-parts)
* [Monitoring the tasks](../../ai-integration/gen-ai-integration/overview.mdx#monitoring-the-tasks)
* [Licensing](../../ai-integration/gen-ai-integration/overview.mdx#licensing)
* [Supported services](../../ai-integration/gen-ai-integration/overview.mdx#supported-services)
* [Common use cases](../../ai-integration/gen-ai-integration/overview.mdx#common-use-cases)
Expand Down Expand Up @@ -223,6 +224,24 @@ added or modified.

<hr />

## Monitoring the tasks

* The status and state of each GenAI task are visible in the
[AI Tasks - list view](../../ai-integration/ai-tasks-list-view.mdx).

* Task performance and activity over time can be analyzed in the _AI Tasks Stats_ view.
Learn more about the stats view in the
[Ongoing Tasks Stats](../../studio/database/stats/ongoing-tasks-stats/overview.mdx) article.

* Errors raised by GenAI tasks, and how those errors affect each task's health, are tracked
in the [Task Errors view](../../monitoring/task-errors/studio-views.mdx#task-errors-view).
The [AI Task Errors view](../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view),
opened from the `AI Hub`, shows the same errors pre-filtered to AI tasks only.
For an introduction to task error monitoring, see the
[Task errors overview](../../monitoring/task-errors/overview.mdx).

<hr />

## Licensing

For RavenDB to support the GenAI Integration feature, you need a `RavenDB AI` license type.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,13 @@ import Panel from '@site/src/components/Panel';
* [5.1.11.25](../../../server/administration/snmp/snmp-overview.mdx#511125) – Total number of enabled embeddings generation tasks.
* [5.1.11.26](../../../server/administration/snmp/snmp-overview.mdx#511126) – Total number of active embeddings generation tasks.

* Errors raised by embeddings generation tasks, and how those errors affect each task's
health, are tracked in the [Task Errors view](../../../monitoring/task-errors/studio-views.mdx#task-errors-view).
The [AI Task Errors view](../../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view),
opened from the `AI Hub`, shows the same errors pre-filtered to AI tasks only.
For an introduction to task error monitoring, see the
[Task errors overview](../../../monitoring/task-errors/overview.mdx).

</Panel>

<Panel heading="Get embeddings generation task details">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,13 @@ import Panel from '@site/src/components/Panel';
* [5.1.11.25](../../../server/administration/snmp/snmp-overview.mdx#511125) – Total number of enabled embeddings generation tasks.
* [5.1.11.26](../../../server/administration/snmp/snmp-overview.mdx#511126) – Total number of active embeddings generation tasks.

* Errors raised by embeddings generation tasks, and how those errors affect each task's
health, are tracked in the [Task Errors view](../../../monitoring/task-errors/studio-views.mdx#task-errors-view).
The [AI Task Errors view](../../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view),
opened from the `AI Hub`, shows the same errors pre-filtered to AI tasks only.
For an introduction to task error monitoring, see the
[Task errors overview](../../../monitoring/task-errors/overview.mdx).

</Panel>

<Panel heading="Get embeddings generation task details">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,13 @@ import Panel from '@site/src/components/Panel';
* [5.1.11.25](../../../server/administration/snmp/snmp-overview.mdx#511125) – Total number of enabled embeddings generation tasks.
* [5.1.11.26](../../../server/administration/snmp/snmp-overview.mdx#511126) – Total number of active embeddings generation tasks.

* Errors raised by embeddings generation tasks, and how those errors affect each task's
health, are tracked in the [Task Errors view](../../../monitoring/task-errors/studio-views.mdx#task-errors-view).
The [AI Task Errors view](../../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view),
opened from the `AI Hub`, shows the same errors pre-filtered to AI tasks only.
For an introduction to task error monitoring, see the
[Task errors overview](../../../monitoring/task-errors/overview.mdx).

</Panel>

<Panel heading="Get embeddings generation task details">
Expand Down
4 changes: 4 additions & 0 deletions docs/monitoring/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"position": 1,
"label": "Monitoring"
}
4 changes: 4 additions & 0 deletions docs/monitoring/task-errors/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"position": 1,
"label": "Task Errors"
}
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
140 changes: 140 additions & 0 deletions docs/monitoring/task-errors/configuration.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
title: "Task errors: Configuration"
sidebar_label: "Configuration options"
description: "Configuration keys for task error monitoring."
sidebar_position: 3
---

import Admonition from '@theme/Admonition';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
import LanguageSwitcher from "@site/src/components/LanguageSwitcher";
import LanguageContent from "@site/src/components/LanguageContent";
import Panel from "@site/src/components/Panel";
import ContentFrame from "@site/src/components/ContentFrame";

# Task errors: Configuration

<Admonition type="note" title="">

* This page covers the configuration keys that control task error monitoring.

* To learn how to apply these keys (where to set them, scope, syntax), see the
[Configuration Overview](../../server/configuration/configuration-options.mdx).

* To learn about task errors and how task health is determined, see the
[Task errors overview](../../monitoring/task-errors/overview.mdx).

* In this article:
* [Task health thresholds](../../monitoring/task-errors/configuration.mdx#task-health-thresholds)
* [ETL.ProcessHealthStatusImpairedThreshold](../../monitoring/task-errors/configuration.mdx#etlprocesshealthstatusimpairedthreshold)
* [ETL.ProcessHealthStatusFailedThreshold](../../monitoring/task-errors/configuration.mdx#etlprocesshealthstatusfailedthreshold)
* [Tuning the thresholds](../../monitoring/task-errors/configuration.mdx#tuning-the-thresholds)
* [Validation rules](../../monitoring/task-errors/configuration.mdx#validation-rules)

</Admonition>

<Panel heading="Task health thresholds">

Two configuration keys define the boundaries between the three task health states
(`Healthy`, `Impaired`, and `Failed`). Each task is classified by its error ratio
(described on the
[Task errors overview](../../monitoring/task-errors/overview.mdx#how-health-is-computed)):
`Healthy` below the Impaired threshold, `Impaired` between the two thresholds, and
`Failed` above the Failed threshold. A task moves between states as the ratio crosses
each threshold.

Both keys can be set server-wide or per database, and both apply to AI tasks
(Embeddings Generation, GenAI) as well as ETL tasks despite their `ETL.` prefix.

<ContentFrame>

### ETL.ProcessHealthStatusImpairedThreshold

* Error-rate threshold above which a task's health is classified as `Impaired`.
* A task whose recent error rate exceeds this value transitions from `Healthy` to `Impaired`.

- **Type**: `float`
- **Default**: `0.1`
- **Range**: `[0, 1]`
- **Scope**: Server-wide or per database

</ContentFrame>

---

<ContentFrame>

### ETL.ProcessHealthStatusFailedThreshold

* Error-rate threshold above which a task's health is classified as `Failed`.
* A task whose recent error rate exceeds this value transitions from `Impaired` to `Failed`.

- **Type**: `float`
- **Default**: `0.9`
- **Range**: `[0, 1]`
- **Scope**: Server-wide or per database

</ContentFrame>

---

<ContentFrame>

### Tuning the thresholds

The defaults are tuned for typical workloads where most tasks should run cleanly and any
sustained error rate is meaningful. Two situations commonly call for adjusting them:
workloads that legitimately accept a high item-failure rate, and operational environments
that need earlier escalation.

A per-database setting always overrides the server-wide setting, so different workloads on
the same server can use different sensitivity.

#### Tuning the Impaired threshold

The default of `0.1` is conservative. Even a small ratio of recent failures flips a task to
`Impaired`, which makes sense when failures are expected to be rare and the goal is to flag
a task as soon as it starts misbehaving.

* Raise the threshold (for example to `0.2` or `0.3`) when the workload routinely produces
item errors that you do not want to escalate. A typical case is an ETL or AI task
processing user-generated data that often fails validation; the task is doing its job,
the failures are not actionable, and flipping to `Impaired` on every batch is noisy.

* Lower the threshold (for example to `0.05`) when you want earlier alerting on tasks that
are starting to slip. The cost is more frequent `Impaired` classifications and the alerts
that ride on them.

#### Tuning the Failed threshold

The default of `0.9` is permissive. A task only flips to `Failed` when its recent error
rate is overwhelming - effectively, when most of its recent batches have failed.

* Raise the threshold (for example to `0.95`) when you want `Failed` to mean "essentially
broken" and tolerate substantial impairment without escalating. Useful when `Failed`
triggers automated responses that should be reserved for genuinely catastrophic states.

* Lower the threshold (for example to `0.7`) when you want stronger and earlier escalation
on degraded tasks. The cost is more frequent `Failed` classifications and the automated
responses that ride on them.

</ContentFrame>

---

<ContentFrame>

### Validation rules

RavenDB validates both keys at server startup. The server refuses to start if any of the
following is violated:

* Each threshold value must be between `0` and `1`, inclusive.
* `ETL.ProcessHealthStatusFailedThreshold` must be strictly greater than
`ETL.ProcessHealthStatusImpairedThreshold`. Equal values are rejected.

</ContentFrame>

</Panel>
Loading