Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 32 additions & 24 deletions docs/user/ppl/cmd/ad.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,38 @@
# ad (deprecated by ml command)
# ad (deprecated by ml command)

## Description

The `ad` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed-in-time RCF for processing time-series data, batch RCF for processing non-time-series data.
## Syntax
The `ad` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search results returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed-in-time RCF for processing time-series data, batch RCF for processing non-time-series data.

## Fixed In Time RCF For Time-series Data
## Syntax

ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] \<time_field\> [date_format] [time_zone] [category_field]
* number_of_trees: optional. Number of trees in the forest. **Default:** 30.
* shingle_size: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8.
* sample_size: optional. The sample size used by stream samplers in this forest. **Default:** 256.
* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
* time_decay: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001.
* anomaly_rate: optional. The anomaly rate. **Default:** 0.005.
* time_field: mandatory. Specifies the time field for RCF to use as time-series data.
* date_format: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss".
* time_zone: optional. Used for setting time zone for time_field. **Default:** "UTC".
* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
The following sections describe the syntax for each RCF algorithm type.

## Fixed in time RCF for time-series data

`ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] <time_field> [date_format] [time_zone] [category_field]`
* `number_of_trees`: optional. Number of trees in the forest. **Default:** 30.
* `shingle_size`: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8.
* `sample_size`: optional. The sample size used by stream samplers in this forest. **Default:** 256.
* `output_after`: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
* `time_decay`: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001.
* `anomaly_rate`: optional. The anomaly rate. **Default:** 0.005.
* `time_field`: mandatory. Specifies the time field for RCF to use as time-series data.
* `date_format`: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss".
* `time_zone`: optional. Used for setting time zone for time_field. **Default:** "UTC".
* `category_field`: optional. Specifies the category field used to group inputs. Each category will be independently predicted.

## Batch RCF For Non-time-series Data

ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]
* number_of_trees: optional. Number of trees in the forest. **Default:** 30.
* sample_size: optional. Number of random samples given to each tree from the training data set. **Default:** 256.
* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
* training_data_size: optional. **Default:** size of your training data set.
* anomaly_score_threshold: optional. The threshold of anomaly score. **Default:** 1.0.
* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
## Batch RCF for non-time-series data

`ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]`
* `number_of_trees`: optional. Number of trees in the forest. **Default:** 30.
* `sample_size`: optional. Number of random samples given to each tree from the training dataset. **Default:** 256.
* `output_after`: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
* `training_data_size`: optional. **Default:** size of your training dataset.
* `anomaly_score_threshold`: optional. The threshold of anomaly score. **Default:** 1.0.
* `category_field`: optional. Specifies the category field used to group inputs. Each category will be independently predicted.


## Example 1: Detecting events in New York City from taxi ridership data with time-series data

This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data.
Expand All @@ -51,6 +55,7 @@ fetched rows / total rows = 1/1
+---------+---------------------+-------+---------------+
```


## Example 2: Detecting events in New York City from taxi ridership data with time-series data independently with each category

This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values.
Expand All @@ -74,6 +79,7 @@ fetched rows / total rows = 2/2
+----------+---------+---------------------+-------+---------------+
```


## Example 3: Detecting events in New York City from taxi ridership data with non-time-series data

This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data.
Expand All @@ -96,6 +102,7 @@ fetched rows / total rows = 1/1
+---------+-------+-----------+
```


## Example 4: Detecting events in New York City from taxi ridership data with non-time-series data independently with each category

This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values.
Expand All @@ -119,6 +126,7 @@ fetched rows / total rows = 2/2
+----------+---------+-------+-----------+
```


## Limitations

The `ad` command can only work with `plugins.calcite.enabled=false`.
19 changes: 10 additions & 9 deletions docs/user/ppl/cmd/addcoltotals.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
# AddColTotals
# addcoltotals


# Description

The `addcoltotals` command computes the sum of each column and add a summary event at the end to show the total of each column. This command works the same way `addtotals` command works with row=false and col=true option. This is useful for creating summary reports with subtotals or grand totals. The `addcoltotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if its specified in field-list or in the case of no field-list specified.
The `addcoltotals` command computes the sum of each column and adds a summary event at the end to show the total of each column. This command works the same way `addtotals` command works with row=false and col=true option. This is useful for creating summary reports with subtotals or grand totals. The `addcoltotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if its specified in field-list or in the case of no field-list specified.

# Syntax
## Syntax

Use the following syntax:

`addcoltotals [field-list] [label=<string>] [labelfield=<field>]`

- `field-list`: Optional. Comma-separated list of numeric fields to sum. If not specified, all numeric fields are summed.
- `labelfield=<field>`: Optional. Field name to place the label. If it specifies a non-existing field, adds the field and shows label at the summary event row at this field.
- `label=<string>`: Optional. Custom text for the totals row labelfield\'s label. Default is \"Total\".

# Example 1: Basic Example
# Example 1: Basic example

The example shows placing the label in an existing field.
The following example PPL query shows how to use `addcoltotals` to place the label in an existing field.

```ppl
source=accounts
Expand All @@ -38,9 +39,9 @@ fetched rows / total rows = 4/4
+-----------+---------+
```

# Example 2: Adding column totals and adding a summary event with label specified.
# Example 2: Adding column totals and adding a summary event with label specified

The example shows adding totals after a stats command where final summary event label is \'Sum\' and row=true value was used by default when not specified. It also added new field specified by labelfield as it did not match existing field.
The following example PPL query shows how to use `addcoltotals` to add totals after a stats command where final summary event label is \'Sum\' and row=true value was used by default when not specified. It also added new field specified by labelfield as it did not match existing field.

```ppl
source=accounts
Expand All @@ -63,7 +64,7 @@ fetched rows / total rows = 3/3

# Example 3: With all options

The example shows using addcoltotals with all options set.
The following example PPL query shows how to use `addcoltotals` with all options set.

```ppl
source=accounts
Expand Down
19 changes: 10 additions & 9 deletions docs/user/ppl/cmd/addtotals.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# AddTotals
# addtotals


## Description

The `addtotals` command computes the sum of numeric fields and appends a row with the totals to the result. The command can also add row totals and add a field to store row totals. This is useful for creating summary reports with subtotals or grand totals. The `addtotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if it\'s specified in field-list or in the case of no field-list specified.
The `addtotals` command computes the sum of numeric fields and appends a row with the totals to the result. The command can also add row totals and add a field to store row totals. This is useful for creating summary reports with subtotals or grand totals. The `addtotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if it's specified in field-list or in the case of no field-list specified.

## Syntax

Use the following syntax:

`addtotals [field-list] [label=<string>] [labelfield=<field>] [row=<boolean>] [col=<boolean>] [fieldname=<field>]`

- `field-list`: Optional. Comma-separated list of numeric fields to sum. If not specified, all numeric fields are summed.
Expand All @@ -16,9 +17,9 @@ The `addtotals` command computes the sum of numeric fields and appends a row wit
- `label=<string>`: Optional. Custom text for the totals row labelfield\'s label. Default is \"Total\". This is applicable when col=true. This does not have any effect when labelfield and fieldname parameter both have same value.
- `fieldname=<field>`: Optional. Calculates total of each row and add a new field to store this total. This is applicable when row=true.

## Example 1: Basic Example
## Example 1: Basic example

The example shows placing the label in an existing field.
The following example PPL query shows how to use `addtotals` to place the label in an existing field.

```ppl
source=accounts
Expand All @@ -41,9 +42,9 @@ fetched rows / total rows = 4/4
+-----------+---------+-------+
```

## Example 2: Adding column totals and adding a summary event with label specified.
## Example 2: Adding column totals and adding a summary event with label specified

The example shows adding totals after a stats command where final summary event label is \'Sum\'. It also added new field specified by labelfield as it did not match existing field.
The following example PPL query shows how to use `addtotals` to add totals after a stats command where final summary event label is \'Sum\'. It also added new field specified by labelfield as it did not match existing field.

```ppl
source=accounts
Expand All @@ -66,7 +67,7 @@ fetched rows / total rows = 5/5
+----------------+-----------+---------+-----+-------+
```

if row=true in above example, there will be conflict between column added for column totals and column added for row totals being same field \'Total\', in that case the output will have final event row label null instead of \'Sum\' because the column is number type and it cannot output String in number type column.
if row=true in the preceding example, there will be conflict between column added for column totals and column added for row totals being same field \'Total\', in that case the output will have final event row label null instead of \'Sum\' because the column is number type and it cannot output String in number type column.

```ppl
source=accounts
Expand All @@ -91,7 +92,7 @@ fetched rows / total rows = 5/5

## Example 3: With all options

The example shows using addtotals with all options set.
The following example PPL query shows how to use `addtotals` with all options set.

```ppl
source=accounts
Expand Down
26 changes: 16 additions & 10 deletions docs/user/ppl/cmd/append.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,26 @@
# append
# append

## Description

The `append` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (The main search).
The `append` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (the main search).

The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
## Syntax

append \<sub-search\>
* sub-search: mandatory. Executes PPL commands as a secondary search.
## Syntax

Use the following syntax:

`append <sub-search>`
* `sub-search`: mandatory. Executes PPL commands as a secondary search.


## Limitations

* **Schema Compatibility**: When fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with `eval` or using `fields` to select non-conflicting columns).

## Example 1: Append rows from a count aggregation to existing search result

This example appends rows from "count by gender" to "sum by gender, state".
## Example 1: Append rows from a count aggregation to existing search results

The following example appends rows from "count by gender" to "sum by gender, state".

```ppl
source=accounts | stats sum(age) by gender, state | sort -`sum(age)` | head 5 | append [ source=accounts | stats count(age) by gender ]
Expand All @@ -37,9 +42,10 @@ fetched rows / total rows = 6/6
+----------+--------+-------+------------+
```

## Example 2: Append rows with merged column names

This example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type.
## Example 2: Append rows with merged column names

The following example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type.

```ppl
source=accounts | stats sum(age) as sum by gender, state | sort -sum | head 5 | append [ source=accounts | stats sum(age) as sum by gender ]
Expand Down
26 changes: 16 additions & 10 deletions docs/user/ppl/cmd/appendcol.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
# appendcol
# appendcol

## Description

The `appendcol` command appends the result of a sub-search and attaches it alongside with the input search results (The main search).
## Syntax
The `appendcol` command appends the result of a sub-search and attaches it alongside the input search results (the main search).

appendcol [override=\<boolean\>] \<sub-search\>
## Syntax

Use the following syntax:

`appendcol [override=<boolean>] <sub-search>`
* override=<boolean>: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict. **Default:** false.
* sub-search: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input.
* `sub-search`: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input.

## Example 1: Append a count aggregation to existing search result

## Example 1: Append a count aggregation to existing search results

This example appends "count by gender" to "sum by gender, state".

Expand Down Expand Up @@ -40,7 +43,8 @@ fetched rows / total rows = 10/10
+--------+-------+----------+------------+
```

## Example 2: Append a count aggregation to existing search result with override option

## Example 2: Append a count aggregation to existing search results with override option

This example appends "count by gender" to "sum by gender, state" with override option.

Expand Down Expand Up @@ -71,9 +75,10 @@ fetched rows / total rows = 10/10
+--------+-------+----------+------------+
```


## Example 3: Append multiple sub-search results

This example shows how to chain multiple appendcol commands to add columns from different sub-searches.
The following example PPL query shows how to use `appendcol` to chain multiple appendcol commands to add columns from different sub-searches.

```ppl
source=employees
Expand Down Expand Up @@ -101,9 +106,10 @@ fetched rows / total rows = 9/9
+------+-------------+-----+------------------+---------+
```


## Example 4: Override case of column name conflict

This example demonstrates the override option when column names conflict between main search and sub-search.
The following example PPL query demonstrates how to use `appendcol` with the override option when column names conflict between main search and sub-search.

```ppl
source=employees
Expand Down
19 changes: 12 additions & 7 deletions docs/user/ppl/cmd/appendpipe.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
# appendpipe
# appendpipe

## Description

The `appendpipe` command appends the result of the subpipeline to the search results. Unlike a subsearch, the subpipeline is not run first.The subpipeline is run when the search reaches the appendpipe command.
The `appendpipe` command appends the result of the subpipeline to the search results. Unlike a subsearch, the subpipeline is not run first. The subpipeline is run when the search reaches the appendpipe command.
The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
## Syntax

appendpipe [\<subpipeline\>]
* subpipeline: mandatory. A list of commands that are applied to the search results from the commands that occur in the search before the `appendpipe` command.
## Syntax

Use the following syntax:

`appendpipe [<subpipeline>]`
* `subpipeline`: mandatory. A list of commands that are applied to the search results from the commands that occur in the search before the `appendpipe` command.

## Example 1: Append rows from a total count to existing search result

## Example 1: Append rows from a total count to existing search results

This example appends rows from "total by gender" to "sum by gender, state" with merged column of same field name and type.

Expand Down Expand Up @@ -37,6 +40,7 @@ fetched rows / total rows = 6/6
+------+--------+-------+-------+
```


## Example 2: Append rows with merged column names

This example appends rows from "count by gender" to "sum by gender, state".
Expand Down Expand Up @@ -65,6 +69,7 @@ fetched rows / total rows = 6/6
+----------+--------+-------+
```


## Limitations

* **Schema Compatibility**: Same as command `append`, when fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with `eval` or using `fields` to select non-conflicting columns).
Loading
Loading