LIZARD – Fraud Pattern Visualizer

IPCEI Next Generation Cloud Infrastructure and Services

The content of this repository comprises the work which is being developed in the scope of the RESCUE (RESilient Cloud for EUropE) project, a part of the IPCEI-CIS (IPCEI Next Generation Cloud Infrastructure and Services), the key digital policy project aimed at strengthening Europe's digital and technological sovereignty.

LIZARD – Fraud Pattern Visualizer

Visualized Indicators for Zonal Anomaly Risk DETECTION.

Features • Quick Start • Dashboard Panels • Cloud Integration • API Docs • Getting Started

✨ Features

🔍 Investigation Dashboard

Interactive visualization panels for comprehensive fraud analysis:

Panel	Description	Key Features
Timeline	Temporal event analysis	Anomaly detection, bucket aggregation, drill-down
Globe/Map	Geographic visualization	Hex aggregation, travel routes, clustering
Pie Chart	Categorical distribution	Multi-field grouping, drill-down, anomaly coloring
Bar Chart	Comparative analysis	Horizontal/vertical, sorting, drill-down
Scatter Plot	Correlation analysis	X/Y field selection, regression line, outlier detection
Link Graph	Entity relationships	Network visualization, community detection
Job Progress	Scheduler monitoring	Live job status, run/pause controls (Cloud mode)

🤖 Analytics Engine

Simple Mode – Fast statistical detection (Z-Score + EWMA + MAD)
Advanced Mode – ML-based detection (Isolation Forest)
Configurable Sensitivity – High/Balanced/Low presets
Real-time Scoring – Event-level anomaly scores with explanations

📊 Data Management

AI-Assisted Mapping – Automatic schema detection and field mapping
Data Workbench – Query builder, views, and data exploration
Multi-source Support – CSV upload with validation
Custom Fields – Extend schema with domain-specific fields

☁️ Cloud Integration

LIZARD can operate in two execution modes — Local and Cloud — switchable at runtime via the UI header toggle or the Settings page.

Capability	Local Mode	Cloud Mode
Data storage	SQLite / local files	Azure Blob Storage, ADLS Gen2, DBFS
Compute	In-process pandas	Databricks Spark clusters & SQL Warehouses
Connectivity	Direct	Direct or via Application Gateways
Authentication	N/A	Service Principal (OAuth), PAT, Username/Password
Scheduler	Inactive	Background health checks, config sync, stale detection
Export	Local download	Export to Azure Blob or DBFS

🚀 Quick Start

For full installation instructions, environment configuration, Docker deployment, and development commands, see the Getting Started Guide.

TL;DR:

git clone https://github.com/AmadeusITGroup/lizard.git
cd lizard
python -m venv .venv && source .venv/bin/activate
make install-all && make ui-install && make data
# Terminal 1: make api   → http://localhost:8000/docs
# Terminal 2: make ui    → http://localhost:5173

📊 Dashboard Panels

Timeline Panel

Visualize events over time with anomaly detection.

Features:

Configurable time buckets (30s to 1 day)
Metrics: Count, Sum, Average, Max, Min
Simple (Z-Score) and Advanced (Isolation Forest) analytics
Click on anomalous buckets to see detailed events
Data source filtering

Globe/Map Panel

Geographic visualization of events with travel route analysis.

Features:

2D Map and 3D Globe views
Hex aggregation for dense data
User travel routes with arc visualization
Color by anomaly score or category
Click on locations for event details

Pie Chart Panel

Analyze categorical distribution of events.

Features:

Multi-field grouping (composite keys)
Unlimited categories (3-200 slider)
"Others" aggregation toggle
Click-to-drill-down with breadcrumb navigation
Anomaly severity coloring
Detail drawer with anomaly events table

Bar Chart Panel

Compare event counts/metrics across categories.

Features:

Vertical and horizontal orientation
Multi-field grouping
Sorting by value, label, or anomaly score
Click-to-drill-down with breadcrumb navigation
Detail drawer with anomaly events table
Show/hide value labels

Scatter Plot Panel

Correlation and outlier analysis.

Features:

Select any fields for X/Y axes
Color by category or anomaly score
Size by numeric field
Optional regression line
Log scale toggle for axes
Click points for detail drawer

Job Progress Panel

Live scheduler monitoring dashboard panel (Cloud mode only).

Features:

Real-time job status (running/stopped, OK/error counts)
Per-job enable/disable toggle
Manual "Run Now" trigger for any job
Run count, error count, and last-run duration display
Start/stop scheduler from the panel

☁️ Cloud Integration

LIZARD's cloud integration is layered across six phases, each building on the last. All cloud features are opt-in — the application runs fully locally by default.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│  React UI                                                       │
│  ┌──────────────┐ ┌───────────────┐ ┌────────────────────────┐  │
│  │ Cloud Toggle │ │ Cloud Browser │ │ Settings Page           │  │
│  │ (AppBar)     │ │ (Storage/DBFS)│ │ (Gateways/Connections) │  │
│  └──────┬───────┘ └───────┬───────┘ └───────────┬────────────┘  │
│         │                 │                      │               │
│  ┌──────┴─────────────────┴──────────────────────┴────────────┐  │
│  │  CloudContext (mode, config, scheduler, API helpers)        │  │
│  └────────────────────────────┬────────────────────────────────┘  │
└───────────────────────────────┼──────────────────────────────────┘
                                │ REST API
┌───────────────────────────────┼──────────────────────────────────┐
│  FastAPI Backend              │                                   │
│  ┌────────────────────────────┴───────────────────────────────┐  │
│  │  cloud_api.py (config, test, browse, engine, export,       │  │
│  │               analytics, cluster, scheduler, health)       │  │
│  └────────────────────────────┬───────────────────────────────┘  │
│                               │                                   │
│  ┌────────────────────────────┴───────────────────────────────┐  │
│  │  cloud/ package                                             │  │
│  │  ├── config.py          YAML config + runtime state         │  │
│  │  ├── connectivity.py    Gateway registry + endpoint resolve │  │
│  │  ├── diagnostics.py     Error types + connection testing    │  │
│  │  ├── audit.py           Ring-buffer audit log               │  │
│  │  ├── cluster_manager.py Databricks cluster/warehouse ops    │  │
│  │  ├── output_storage.py  Export to Blob / DBFS               │  │
│  │  ├── analytics_engine.py  Anomaly + clustering wrappers     │  │
│  │  ├── scheduler.py       Background job scheduler            │  │
│  │  └── execution/                                             │  │
│  │      ├── base.py        ExecutionEngine ABC + ExecutionResult│  │
│  │      ├── local_engine.py  Pandas engine (wraps existing)    │  │
│  │      ├── spark_engine.py  Databricks Spark SQL engine       │  │
│  │      └── engine_factory.py  Engine selection + caching      │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                               │                                   │
│              ┌────────────────┼────────────────┐                  │
│              ▼                ▼                ▼                  │
│    Azure Blob / ADLS    Databricks         DBFS                  │
│    (via azure-sdk)      (via databricks-sdk)                     │
└──────────────────────────────────────────────────────────────────┘

Execution Modes

Local Mode (default) — All computation runs in-process with pandas. Data lives in the local SQLite database. No cloud credentials required.

Cloud Mode — Data is read from Azure Blob Storage, ADLS Gen2, or Databricks DBFS. Pipelines are translated to Spark SQL and submitted to a Databricks cluster or SQL Warehouse. Results are collected back as pandas DataFrames for the API response.

Switch modes at any time using the toggle in the app bar, or via POST /cloud/mode.

Configuration

Cloud configuration is stored in lizard-cloud.yaml (auto-created on first use) and can be edited via the Settings UI or directly in YAML.

# lizard-cloud.yaml
mode: local  # or "cloud"

gateways:
  - name: tst-gateway
    fqdn: gateway-tst.corp.com
    environment: TST
    exposed_workspaces: ["123456789"]
    exposed_storage_accounts: ["myaccount"]

databricks_connections:
  - name: dev-workspace
    workspace_id: "123456789"
    connectivity: gateway        # "direct" or "gateway"
    gateway_name: tst-gateway    # required when connectivity=gateway
    auth:
      type: service_principal    # or "developer_token", "username_password"
      tenant_id: ${AZURE_TENANT_ID}
      client_id: ${AZURE_CLIENT_ID}
      client_secret: ${AZURE_CLIENT_SECRET}
    cluster:
      cluster_id: 0123-456789-abc
      cluster_name: analytics-cluster

storage_connections:
  - name: fraud-lake
    account_name: myaccount
    container: raw-data
    endpoint_type: dfs           # "blob" or "dfs" (ADLS Gen2)
    connectivity: gateway
    gateway_name: tst-gateway
    auth:
      type: service_principal
      tenant_id: ${AZURE_TENANT_ID}
      client_id: ${AZURE_CLIENT_ID}
      client_secret: ${AZURE_CLIENT_SECRET}

Note: Environment variable references (${VAR_NAME}) are resolved at load time, so secrets never need to appear in the YAML file.

Application Gateways

When direct network access to Azure resources is blocked (common in enterprise environments), LIZARD routes traffic through Application Gateways. The connectivity system:

Looks up the gateway by name from the config
Validates that the target resource (workspace or storage account) is exposed through that gateway
Rewrites the endpoint URL to route through the gateway FQDN

Resource	Direct URL	Gateway URL
Databricks	`https://adb-{workspace_id}.azuredatabricks.net`	`https://{gateway_fqdn}/adb-{workspace_id}`
Blob Storage	`https://{account}.blob.core.windows.net`	`https://{gateway_fqdn}/{account}.blob.core.windows.net`
ADLS Gen2	`https://{account}.dfs.core.windows.net`	`https://{gateway_fqdn}/{account}.dfs.core.windows.net`

Connection Testing

The Settings page provides a multi-step connection test that validates:

Config validation — Connection name exists, required fields present
Endpoint resolution — URL is correctly built (direct or gateway)
Network reachability — HTTP(S) connectivity to the endpoint
Authentication — Token exchange succeeds with the configured credentials
Resource access — Can list clusters, containers, or files

Each step returns a status (ok, warning, error, skipped) with actionable remediation hints.

Execution Engine

The execution engine abstracts over pandas (local) and Spark SQL (cloud):

Engine	Class	When Used
Local	`LocalPandasEngine`	Default — wraps the existing `PipelineExecutor`
Spark	`SparkDatabricksEngine`	When a Databricks connection with a cluster or warehouse is configured

The engine_factory module selects the appropriate engine based on the current config and caches it. Pipelines are translated to SQL for the Spark engine:

# Pipeline step types → SQL
source   → FROM {table}
filter   → WHERE {conditions}
aggregate → GROUP BY ... HAVING ...
sort     → ORDER BY ...
select   → SELECT {columns}
join     → JOIN ... ON ...
distinct → SELECT DISTINCT ...

Cluster Management

In cloud mode, LIZARD can manage Databricks compute resources:

List clusters — View all clusters in a workspace with their state
Start / Stop clusters — On-demand cluster lifecycle management
Get cluster status — Detailed info including autoscale, Spark version, workers
List SQL Warehouses — View serverless/pro warehouses

Export & Output Storage

Pipeline results can be exported directly to cloud storage:

Azure Blob Storage — Upload as CSV or Parquet to any configured container
Databricks DBFS — Write to the Databricks file system

POST /cloud/export { "pipeline": [...], "connection_type": "storage", # or "dbfs" "connection_name": "fraud-lake", "format": "parquet", "container": "exports", "path_prefix": "/lizard-results" }

Cloud Analytics

The cloud analytics engine wraps the existing anomaly detection and clustering algorithms to work through the execution engine:

Anomaly Detection (POST /cloud/analytics/anomaly) — Simple (Z-Score + EWMA + MAD) or Advanced (Isolation Forest) detection on pipeline results
Geo-temporal Clustering (POST /cloud/analytics/clustering) — DBSCAN clustering on latitude/longitude/time features

In local mode, these run in-process with pandas. In cloud mode, data is fetched via Spark SQL and then analyzed locally (Spark-native analytics planned for a future release).

Background Scheduler

When cloud mode is active, a lightweight background scheduler runs periodic maintenance tasks:

Job	Interval	Description
`health_check`	5 min	Pings all configured connections, logs results to audit
`config_sync`	10 min	Re-reads `lizard-cloud.yaml` to pick up external changes
`audit_trim`	1 hour	Trims the audit ring buffer (prevents unbounded growth)
`stale_detection`	30 min	Warns about connections that haven't been tested recently

The scheduler is thread-based (no external dependencies like Celery). Jobs can be enabled/disabled, triggered manually, or monitored from the Job Progress dashboard panel.

Audit Log

All cloud operations are recorded in an in-memory ring-buffer audit log:

Connection tests, health checks, exports, analytics runs
Categorized by system, config, connection, analytics, export
Queryable via GET /cloud/health or the audit API
Automatically trimmed by the scheduler

Cloud Data Browser

The Cloud Browser page (available only in cloud mode) lets you interactively browse:

Azure Storage — List containers → list blobs → preview datasets
DBFS — Navigate the Databricks file system with breadcrumb navigation
Dataset Preview — Read CSV/Parquet/JSON files and display the first N rows in a table

📁 Project Structure

Lizard/
├── app/                        # FastAPI backend
│   ├── main.py                 # Main API routes & viz endpoints
│   ├── mapping_api.py          # Data mapping endpoints
│   ├── rules_api.py            # Rules engine endpoints
│   ├── workbench_api.py        # Data workbench endpoints
│   └── cloud_api.py            # Cloud mode API (config, browse, engine,
│                                #   cluster, export, analytics, scheduler, health)
├── analytics/                  # Analytics modules
│   ├── simple_anomaly.py       # Z-Score + EWMA detection
│   ├── advanced_anomaly.py     # Isolation Forest detection
│   ├── rules_engine.py         # Rules evaluation engine
│   └── clustering.py           # Geo-temporal clustering
├── cloud/                      # Cloud integration package
│   ├── __init__.py             # Feature flag (CLOUD_DEPS_AVAILABLE)
│   ├── config.py               # YAML config: load, save, get_config singleton
│   ├── constants.py            # URL templates, endpoint types, defaults
│   ├── connectivity.py         # GatewayRegistry + EndpointResolver
│   ├── diagnostics.py          # ConfigurationError, ConnectivityError, test helpers
│   ├── audit.py                # Ring-buffer audit log (record, get_entries, get_stats)
│   ├── cluster_manager.py      # Databricks cluster/warehouse operations
│   ├── output_storage.py       # Export to Azure Blob / DBFS
│   ├── analytics_engine.py     # Anomaly + clustering wrappers
│   ├── scheduler.py            # Background job scheduler
│   └── execution/              # Execution engine abstraction
│       ├── __init__.py
│       ├── base.py             # ExecutionEngine ABC + ExecutionResult
│       ├── local_engine.py     # LocalPandasEngine (wraps PipelineExecutor)
│       ├── spark_engine.py     # SparkDatabricksEngine (Spark SQL)
│       └── engine_factory.py   # Engine selection, caching, reset
├── domain/                     # Domain models
│   ├── models.py               # SQLAlchemy ORM models
│   └── schemas.py              # Pydantic schemas
├── mapping/                    # Data mapping
│   ├── ai_mapper.py            # AI-assisted field mapping
│   ├── validation.py           # Data validation rules
│   └── expr.py                 # Expression evaluation
├── connectors/                 # Data connectors
│   └── csv/                    # CSV loader
├── ui-react/                   # React frontend (Vite + MUI)
│   ├── src/
│   │   ├── api.ts              # API client (viz, cloud, scheduler)
│   │   ├── components/
│   │   │   ├── AppLayout.tsx           # App shell with nav + cloud toggle
│   │   │   ├── CloudModeToggle.tsx     # Cloud/Local switch in header
│   │   │   ├── JobProgressPanel.tsx    # Scheduler dashboard panel
│   │   │   ├── AddVisualizationDialog.tsx # Panel picker (incl. jobs)
│   │   │   ├── AnomalyDetailDrawer.tsx
│   │   │   ├── GlobeDeck.tsx
│   │   │   ├── MappingManager.tsx
│   │   │   └── RulesManager.tsx
│   │   ├── sections/           # Dashboard visualization panels
│   │   │   ├── TimelinePanel.tsx
│   │   │   ├── GlobePanel.tsx
│   │   │   ├── MapPanel.tsx
│   │   │   ├── PieChartPanel.tsx
│   │   │   ├── BarChartPanel.tsx
│   │   │   ├── ScatterPlotPanel.tsx
│   │   │   └── LinkGraphPanel.tsx
│   │   ├── pages/
│   │   │   ├── Dashboard.tsx           # Drag-and-drop panel dashboard
│   │   │   ├── MappingPage.tsx
│   │   │   ├── RulesPage.tsx
│   │   │   ├── WorkbenchPage.tsx
│   │   │   ├── DataManagerPage.tsx
│   │   │   ├── CloudBrowserPage.tsx    # Azure Storage / DBFS browser
│   │   │   └── SettingsPage.tsx        # Cloud config, gateways, connections
│   │   └── context/
│   │       ├── FiltersContext.tsx       # Global date/source filters
│   │       └── CloudContext.tsx         # Cloud mode, config, scheduler state
│   └── package.json
├── scripts/                    # Utility scripts
│   └── generate_synthetic.py
├── infra/                      # Docker & deployment
│   ├── Dockerfile.api
│   ├── Dockerfile.ui
│   ├── docker-compose.yml
│   └── nginx.conf
├── tests/                      # Test suite
│   ├── test_execution.py       # Execution engine tests
│   ├── test_cluster_manager.py # Cluster manager tests
│   ├── test_output_storage.py  # Export / output storage tests
│   ├── test_analytics_engine.py # Cloud analytics tests
│   ├── test_scheduler.py       # Scheduler tests
│   ├── test_api_ingest.py      # Ingest API tests
│   └── ...
├── Makefile                    # Development commands
├── pyproject.toml              # Python dependencies
├── lizard-cloud.yaml           # Cloud configuration (auto-created)
└── README.md

📖 API Documentation

Once the API is running, access:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Core Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/upload/csv`	POST	Upload CSV with mapping
`/viz/grid`	GET	Query events with analytics
`/viz/globe`	GET	Globe visualization data
`/viz/schema`	GET	Get field schema
`/viz/distinct/{field}`	GET	Get distinct values
`/viz/top-users`	GET	Top users by event count
`/viz/data-sources`	GET	List data sources
`/rules/`	GET	List detection rules
`/rules/evaluate`	POST	Evaluate rules on data
`/mapping/templates`	GET	List mapping templates
`/mapping/suggest`	POST	AI-suggest field mapping
`/workbench/query`	POST	Execute workbench query

Cloud Endpoints

Endpoint	Method	Description
`/cloud/config`	GET	Get cloud configuration (secrets redacted)
`/cloud/config`	PUT	Update and persist cloud configuration
`/cloud/mode`	POST	Switch between local and cloud mode
`/cloud/test-connection`	POST	Multi-step connection test
`/cloud/health`	GET	Comprehensive cloud health check
`/cloud/status`	GET	Lightweight cloud status summary
`/cloud/providers`	GET	List configured data-source providers

Cloud Data Browsing

Endpoint	Method	Description
`/cloud/browse/storage/{conn}/containers`	GET	List Azure Storage containers
`/cloud/browse/storage/{conn}/blobs`	GET	List blobs in a container
`/cloud/browse/dbfs/{conn}`	GET	List DBFS directory contents
`/cloud/browse/preview`	POST	Preview a cloud dataset

Cloud Compute & Execution

Endpoint	Method	Description
`/cloud/clusters/{conn}`	GET	List Databricks clusters
`/cloud/clusters/{conn}/{id}`	GET	Get cluster status
`/cloud/clusters/{conn}/{id}/start`	POST	Start a cluster
`/cloud/clusters/{conn}/{id}/stop`	POST	Stop a cluster
`/cloud/warehouses/{conn}`	GET	List SQL Warehouses
`/cloud/engine/status`	GET	Current execution engine status
`/cloud/engine/set-compute`	POST	Set cluster/warehouse for Spark
`/cloud/engine/reset`	POST	Reset cached engines

Cloud Analytics & Export

Endpoint	Method	Description
`/cloud/analytics/anomaly`	POST	Run anomaly detection on pipeline
`/cloud/analytics/clustering`	POST	Run geo-temporal clustering on pipeline
`/cloud/export`	POST	Export pipeline results to Blob/DBFS

Scheduler

Endpoint	Method	Description
`/cloud/scheduler`	GET	Scheduler status and registered jobs
`/cloud/scheduler/start`	POST	Start the background scheduler
`/cloud/scheduler/stop`	POST	Stop the background scheduler
`/cloud/scheduler/jobs/{name}/run`	POST	Trigger a job immediately
`/cloud/scheduler/jobs/{name}/enable`	POST	Enable a scheduled job
`/cloud/scheduler/jobs/{name}/disable`	POST	Disable a scheduled job

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
analytics		analytics
app		app
assets		assets
cloud		cloud
connectors		connectors
docs		docs
domain		domain
infra		infra
mapping		mapping
scripts		scripts
tests		tests
ui-react		ui-react
.env.sample		.env.sample
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
README.md		README.md
lizard-cloud.yaml.example		lizard-cloud.yaml.example
pyproject.toml		pyproject.toml
rescue_logo.png		rescue_logo.png
ruff.toml		ruff.toml
tox.ini		tox.ini

Folders and files

Latest commit

History

Repository files navigation

LIZARD – Fraud Pattern Visualizer

✨ Features

🔍 Investigation Dashboard

🤖 Analytics Engine

📊 Data Management

☁️ Cloud Integration

🚀 Quick Start

📊 Dashboard Panels

Timeline Panel

Globe/Map Panel

Pie Chart Panel

Bar Chart Panel

Scatter Plot Panel

Job Progress Panel

☁️ Cloud Integration

Architecture Overview

Execution Modes

Configuration

Application Gateways

Connection Testing

Execution Engine

Cluster Management

Export & Output Storage

Cloud Analytics

Background Scheduler

Audit Log

Cloud Data Browser

📁 Project Structure

📖 API Documentation

Core Endpoints

Cloud Endpoints

Cloud Data Browsing

Cloud Compute & Execution

Cloud Analytics & Export

Scheduler

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages