Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Find the docs at https://disyinformationssysteme.github.io/cadenza-analytics-pyt
* Flask
* Pandas
* requests-toolbelt

* chardet

## Example:
Example extensions can be found in [examples](https://github.com/DisyInformationssysteme/cadenza-analytics-python/tree/main/examples).
Expand Down
81 changes: 47 additions & 34 deletions docs/intro.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
<pre>
This is `cadenzaanalytics` version {{version}}.

<pre>
<b>!! This module is currently in beta status !!</b>

It can be used for testing, but there may be breaking changes before a full release.
Expand All @@ -8,13 +10,13 @@

# disy Cadenza Analytics Extensions

An Analytics Extension extends the functional spectrum of [disy Cadenza](https://www.disy.net/en/products/disy-cadenza/) with an analysis function or a visualisation type.
An Analytics Extension is a web service that exchanges structured data with disy Cadenza via the Cadenza API.
An Analytics Extension extends the functional spectrum of [disy Cadenza](https://www.disy.net/en/products/disy-cadenza/) with an analysis function or a visualisation type.
An Analytics Extension is a web service that exchanges structured data with disy Cadenza via the Cadenza API.
A user can integrate an analysis extension into disy Cadenza via the Management Center and manage it there (if they have the appropriate rights).

As of disy Cadenza Autumn 2023 (9.3), the following types and capabilities of analysis extensions are officially supported:

- **Visualization**
- **Visualization**
The Analytics Extension type `visualization` provides a new visualization type for displaying a bitmap image (PNG).

- **Data enrichment**
Expand All @@ -27,24 +29,25 @@ As of disy Cadenza Autumn 2023 (9.3), the following types and capabilities of an

An Analytics Extension defines one endpoint that, depending on the HTTP method of the request, is used to supply the Extension's configuration to disy Cadenza, or exchange data and results with Cadenza respectively.

<!--- Beware: when building documentation locally, path to image must not be relative to this document, but relative to the one that includes this md file!
<!--- Beware: when building documentation locally, path to image must not be relative to this document, but relative to the one that includes this md file!
(in this case: src/cadenzaanalytics/__init__.py -> <img src="../../docs/communication.png"... )
but when building per github action, the path must be relative to root
but when building via github action, the path must be relative to root
--->
<img src="communication.png" alt="(Image: Communication between disy Cadenza and Analytics Extension)" width="800">

When receiving an `HTTP(S) GET` request, the endpoint returns a JSON representation of the extention's configuration.
This step is executed once when registering the Analytics Extension from the disy Cadenza Management Center GUI and does not need to be repeated unless the extension's configuration changes.

By sending an `HTTP(S) POST` request to the same endpoint and including the data, metadata and parameters as specified in the extension's configuration as payload, the extension is executed.
By sending an `HTTP(S) POST` request to the same endpoint and including the data, metadata and parameters as specified in the extension's configuration as payload, the extension is executed.
This step is executed each time that the Analytics Extension is invoked from the disy Cadenza GUI and Cadenza takes care of properly formatting the payload.

The `cadenzaanalytics` module provides the functionality to abstract the required communication and easily configure the Analytics Extension's responses to the above requests.
The `cadenzaanalytics` module provides the functionality to abstract the required communication and easily configure the Analytics Extension's responses to the above requests.


# Installation

As long as this package is in beta, it is only available on GitHub, and an installation via source is necessary. In the near future this package will also be made available via the Python Package Index (PyPI).
As long as this package is in beta, it is only available on GitHub, and an installation via source is necessary.
In the near future this package will also be made available via the Python Package Index (PyPI).

Furthermore, a corresponding version will be packaged as source code with each release of disy Cadenza.

Expand All @@ -56,34 +59,32 @@ The `cadenzaanalytics` package has the following dependencies:
* [Flask](https://flask.palletsprojects.com/en/3.0.x/)
* [Pandas](https://pandas.pydata.org/)
* requests-toolbelt
* chardet

For each disy Cadenza version, the correct corresponding library version needs to be used.
The disy Cadenza main version is reflected in the corresponding major and minor version of `cadenzaanalytics` (e.g. 10.4.0 for Cadenza 10.4), while the last version segment is increased for both bugfixes and functional changes.

For Cadenza 10.2 and earlier versions, `cadenzaanalytics` used a semantic versioning scheme.
The first version of disy Cadenza that supported Analytics Extensions is disy Cadenza Autumn 2023 (9.3).
For Cadenza 10.2 and earlier versions, `cadenzaanalytics` used a semantic versioning scheme.
The first version of disy Cadenza that supported Analytics Extensions is disy Cadenza Autumn 2023 (9.3).

<!--
## Installation via PyPI

The simplest way to install `cadenzaanalytics` is from the [Python Package Index (PyPI)](https://pypi.org/project/cadenzaanalytics/) using the package installer [`pip`](https://pypi.org/project/pip/). To install the most recent version, simply execute
The simplest way to install `cadenzaanalytics` is from the [Python Package Index (PyPI)](https://pypi.org/project/cadenzaanalytics/) using the package installer [`pip`](https://pypi.org/project/pip/).
To install the most recent version, simply execute
```console
pip install cadenzaanalytics
```

In order to install a specific version of `cadenzaanalytics`, e.g. to develop an Analytics Extension for an older version of disy Cadenza, specify the version in the `pip` call:

```console
pip install cadenzaanalytics==0.1.21
pip install cadenzaanalytics==10.3.0
```
-->


## Installation from Source
The source of the package can be obtained from the project's public [GitHub repository](https://github.com/DisyInformationssysteme/cadenza-analytics-python).
Alternatively with each release of disy Cadenza, the offline source code of the matching version of `cadenzaanalytics` is packaged in the distributions `developer.zip`.
The source of the package can be obtained from the project's public [GitHub repository](https://github.com/DisyInformationssysteme/cadenza-analytics-python).

Once the repository is locally available, the package can be installed using the package installer [`pip`](https://pypi.org/project/pip/).
Once the repository is locally available, the package can be installed using the package installer [`pip`](https://pypi.org/project/pip/).
To install the package from source, navigate to the root folder of the project and run:

```console
Expand Down Expand Up @@ -112,7 +113,7 @@ We specify what data can be passed from disy Cadenza to the Anylytics Extension
my_attribute_group = ca.AttributeGroup(
name='my_data',
print_name='Any numeric attribute',
data_types=[ca.DataType.INT64,
data_types=[ca.DataType.INT64,
ca.DataType.FLOAT64],
min_attributes=1,
max_attributes=1
Expand Down Expand Up @@ -142,7 +143,7 @@ my_param = ca.Parameter(
```
This object again requires a `name` and a `print_name`, as well as a [`ParameterType`](cadenzaanalytics/data/parameter_type.html).
Optionally, we can specify whether a parameter is mandatory and/or a default value for it.
Multiple parameters can be defined.
Multiple parameters can be defined.

As an alternative to requesting input of a parameter in one of the standard data types, a list from which a user selects a value can be defined via the `SELECT` type:

Expand Down Expand Up @@ -175,15 +176,15 @@ my_extension = ca.CadenzaAnalyticsExtension(
```

The `relative_path` defines the endpoint, i.e. the subdirectory of the URL under wich the extension will be available after deployment.
Further parameters include the `print_name` shown in Cadenza, and the attribute groups and parameters defined above.
Further parameters include the `print_name` shown in Cadenza, and the attribute groups and parameters defined above.
Additionally, the appropriate [`ExtensionType`](cadenzaanalytics/data/extension_type.html) (visualization, enrichment, or calculation) must be specified.

The `analytics_function` is the name of the Python method that should be invoked (see next section).

## Including Custom Analytics Code

The analysis function `my_analytics_function` (or whatever you choose to name it) is the method that contains the specific functionality for the extension.
It implements what the extension should be doing when being invoked from disy Cadenza.
It implements what the extension should be doing when being invoked from disy Cadenza.
This method takes two arguments, `metadata` and `data`, which both will be passed to it automatically when the extension is invoked from Cadenza.

```python
Expand All @@ -192,14 +193,14 @@ def my_analytics_function (metadata: ca.RequestMetadata, data: pd.DataFrame):
return #something
```

The actual content and return type of this function will depend both on the extension type (visualization, enrichment, or calculation) and naturally the actual analytics code that the extension should execute.
The actual content and return type of this function will depend both on the extension type (visualization, enrichment, or calculation) and naturally the actual analytics code that the extension should execute.

### Reading Data, Metadata and Parameters

Accessing the data that is transferred from Cadenza is simple.
Within the defined analytics function, a [Pandas DataFrame](https://pandas.pydata.org/) `data` is automatically available, which holds all the data passed from Cadenza.

Same as the `data` object, a [`RequestMetadata`](cadenzaanalytics/request/request_metadata.html) object is also automatically available in the analysis function as `metadata`.
Same as the `data` object, a [`RequestMetadata`](cadenzaanalytics/request/request_metadata.html) object is also automatically available in the analysis function as `metadata`.

The `metadata` object contains information on the columns in the `data` DataFrame, such as their print name and type in disy Cadenza, their column name in the pandas DataFrame, or additional information like a `geometry_type`, where applicable.

Expand All @@ -213,7 +214,8 @@ if 'my_data' in columns_by_attribute_group:
my_data = data[column.name]
```

While it is also possible to directly access the columns of `data` by name or by index, this is less robust, since the actual column names of the dataframe depend on their configuration in disy Cadenza and changing them there might lead to the extension not functioning properly anymore. However it is possible to get the metadata to a specific colum of the `data` DataFrame.
While it is also possible to directly access the columns of `data` by name or by index, this is less robust, since the actual column names of the dataframe depend on their configuration in disy Cadenza and changing them there might lead to the extension not functioning properly anymore.
However, it is possible to get the metadata to a specific colum of the `data` DataFrame.

```python
for column_name, column_data in data.items():
Expand All @@ -231,7 +233,7 @@ The table shows the mapping to Pyton data types:
| Number (Long) | pandas.Long64Dtype | `1` | |
| Floating point number (Double) | pandas.Float64Dtype | `1.23` | |
| Date | string | `"2022-11-12T12:34:56+13:45[Pacific/Chatham]"` | A date is represented as an [ISO string with time zone offset from UTC](https://en.wikipedia.org/wiki/ISO_8601#Coordinated_Universal_Time_(UTC)) (UTC) and additional time zone identifier in brackets. |
| Geometry | string | `"POINT(8.41594949941623 49.0048124984033)"` | A geometry is represented as a [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) string.<br><br>*Note:* By default, coordinates use the WGS84 projection. |
| Geometry | string | `"POINT(8.41594949941623 49.0048124984033)"` | A geometry is represented as a [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) string.<br><br>*Note:* By default, coordinates use the WGS84 projection. |


Parameters are stored in `metadata` as well.
Expand All @@ -251,6 +253,7 @@ A [`CsvResponse`](cadenzaanalytics/response/csv_response.html) is used for calcu
The response must include the data and the proper metadata.

The following minimal example echos the data received from disy Cadenza as part of an `AttributeGroup` named `'any_data'` back to it without modification.
Therefore, it just forwards the original metadata as the metadata of the response.

```python
def echo_analytics_function(metadata: ca.RequestMetadata, data: pd.DataFrame):
Expand Down Expand Up @@ -283,7 +286,7 @@ response_columns = [
### Data Enrichment

A [`CsvResponse`](cadenzaanalytics/response/csv_response.html) is used for enrichments as well.
The response must be in the format of a text, a CSV file or a DataFrame so that it fits.
The response must be in the format of a text, a CSV file or a DataFrame so that it fits.

TODO

Expand All @@ -307,7 +310,7 @@ return ca.ImageResponse(image)


### Returning an Error
In order to abort the execution of the function with an error and pass an according message to disy Cadenza, a [`ErrorResponse`](cadenzaanalytics/response/error_response.html) can be returned.
In order to abort the execution of the function with an error and pass an according message to disy Cadenza, an [`ErrorResponse`](cadenzaanalytics/response/error_response.html) can be returned.

```python
if my_data is None:
Expand All @@ -316,20 +319,26 @@ if my_data is None:

## Registering the Extension

TBD
Finally, the extension needs to be registered with a [`CadenzaAnalyticsExtensionService`](cadenzaanalytics/cadenza_analytics_extension_service.html).
This makes the service available at the configured endpoint.


```python
analytics_service = ca.CadenzaAnalyticsExtensionService()
analytics_service.add_analytics_extension(my_extension)
```

TODO "directory" service multiple extensions
<!-- TODO "directory" service multiple extensions -->

# Deployment
# Deployment
Since `cadenzaanalytics` is built on the [Flask framework](https://flask.palletsprojects.com/en/stable), the deployment options for a Cadenza Analytics Extension are basically the same as for any Flask application.
Below, we present a few options, a more comprehensive overview can be found in the [Deploying to Production](https://flask.palletsprojects.com/en/stable/deploying/index.html) section of the official Flask documentation.

Since `cadenzaanalytics` is built on the [Flask framework](https://flask.palletsprojects.com/en/3.0.x/), ...
## Local Execution (development only)
For development purposes, using the built-in development server, debugger, and reloader is the most convenient.
However, it should not be used in production, as it has not been designed for security, stability, or efficiency.

## Local Execution
The development server can either be invoked from within the python code...

```python
if __name__ == '__main__':
Expand All @@ -338,3 +347,7 @@ if __name__ == '__main__':
```

## WSGI Deployment

## Docker

<!-- TODO: mention proxyfix? -->
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ authors = [
"Daniel Dittmar <daniel.dittmar@disy.net>",
"Matthias Budde <matthias.budde@disy.net>"
]
version="10.4.0a0.dev"
version="10.3.0a3.dev"
description = "Official Python Package for creation of disy Cadenza analytics extensions"
readme = "README.md"
license = "Apache-2.0"
Expand All @@ -30,6 +30,7 @@ Werkzeug = "3.0.4"
Flask-Cors = "3.0.10"
requests-toolbelt = "1.0.0"
pandas = " ^2.0.2"
chardet = "5.2.0"

[project]
name = "cadenzaanalytics"
Expand Down
2 changes: 0 additions & 2 deletions src/cadenzaanalytics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@

The purpose of this module is to encapsulate the communication via the Cadenza API.

This is `cadenzaanalytics` version {{version}}.

.. include:: ../../docs/intro.md
"""
from cadenzaanalytics.cadenza_analytics_extension import CadenzaAnalyticsExtension
Expand Down