Skip to content

Commit

Permalink
Update BI documentation (#897)
Browse files Browse the repository at this point in the history
* Update BI documentation

* Update Studio

* Update engine

* Update Alerting

* Updates

* Updates

* Updates
  • Loading branch information
mikhail-vl authored Jan 14, 2025
1 parent 5fff20f commit d60c0a8
Show file tree
Hide file tree
Showing 22 changed files with 191 additions and 273 deletions.
23 changes: 12 additions & 11 deletions big/alerting/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@ import Youtube from "@theme/Youtube";

# Business Alerting

**Alerting** is a system to observe how your data changes and act when a change occurs.
Business Alerting is a module of Business Engine configured using Business Studio. The [Business Engine](../engine/) is responsible for executing the configuration you specified in the Business Studio.

The three main alerting components:
## Alert components

**Alerting** is a system to observe how your data changes and act when a change occurs. The three main alerting components:

1. **An alert rule**. It is an instruction to evaluate the observable data.
Most alert rules have parameters like time frame to check, how often, query to run (SQL and PromQL, etc.), and thresholds.
Expand All @@ -25,13 +27,18 @@ The three main alerting components:
width="50%"
/>

To summarize the schema from above, you describe WHAT to observe and specify the rules of HOW exactly. Then every time the rule is broken, a detailed record with specifics is created.
To summarize the schema from above:

Following the created alert records, alert actions are initiated.
- Describe WHAT to observe.
- Specify the rules of HOW exactly.
- Every time the rule is broken, a detailed record with specifics is created.
- Following the created alert records, alert actions are initiated.

## Similarities and differences with Grafana Alerting

The schema below depicts the Alerting as it is side-by-side with the Business Alerting, so you can see the similarities and differences. Each of the main alerting components (rule, record, and action) has a corresponding software module.
The schema below depicts the Alerting as it is side-by-side with the Business Alerting, so you can see the similarities and differences.

Each of the main alerting components (rule, record, and action) has a corresponding software module.

<Image
title="Grafana and Business Alerting comparison."
Expand All @@ -47,9 +54,3 @@ The alerting records are created by the Alert Manager. Every time a rule is brok
For the alert actions, Grafana has an extensive notification alerting channel system. Based on the amount of questions we received and came across, it has a steep learning curve. It allows you to set the channels to configure sending text, Slack messages, emails and OnCall.

Webhooks, which are 3rd party APIs, can also be added as triggered by an alert rule record. However, even if the possibility exists the implementation might be foggy for many.

## Business Alerting

In the Business Alerting, use Business Studio to manage alert rules and alert actions.

The Business Engine is resposible for executing the configuration you specified in the Business Studio.
49 changes: 30 additions & 19 deletions big/alerting/manage-alert-rules.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,8 @@ import Image from "@theme/Image";
Use the **Alert rules** tab to manage alert rules. Here you can:

1. Add a new alert rule.
2. Review the alert rule running states for each existing alert rule. There are two possible states:

- **Active**,
- **Paused**.

3. Review the alert rule alerting state for each existing alert rule. There are four possible statuses:

- **Scheduled**,
- **OK**,
- **Alerting**,
- **Error**.

2. Review the alert rule running states for each existing alert rule.
3. Review the alert rule alerting state for each existing alert rule.
4. Pause/Start the alert rule.
5. Delete the alert rule.

Expand All @@ -30,20 +20,21 @@ Use the **Alert rules** tab to manage alert rules. Here you can:
src="/img/blog/2024-12-31-big-2.2.0/tab-alert-rules.png"
/>

:::note Future releases
## Alert rules states

In the future, the **Alert Rule** tab will provide an interface to work with hundreds of alert rules by allowing grouping, filtering, etc., to ensure easy navigation and control.
:::
To temporate pause alert execution click on **Pause** icon for the specific alert rule. Alert can be started anytime to resume execution.

## Alert rules statuses

Right after an alert is created it becomes **Active** and **Scheduled**. Any active alert could be paused. After any modification, the alert status changes to **Scheduled**.
Right after an alert is created it becomes **Active** and **Scheduled**. After modification, the alert status changes to **Scheduled** if it requires reexecution with a new set of parameters.

- **Scheduled**. The alert is scheduled, but never run yet. This status is assigned right after the alert is created or modified by the user and API.
- **Scheduled**. The alert is scheduled, but never run yet with a current set of parameters. This status is assigned right after the alert is created or modified by the user and API.
- **OK**. The alert has been run and the thresholds are NOT breached and the Regex pattern is NOT found.
- **Alerting**. The alert has been run and the thresholds are being breached or the Regex pattern is found.
- **Error**. Something is wrong which could be the query, annotation or action.

### Flow schema

Reference the flow schema to get a better understanding of how statuses change in the Business Engine.

<Image
Expand All @@ -64,8 +55,10 @@ The **Add a new rule** window looks as follows:
The new alert rule/edit window has the following configuration elements to specify:

- **Title** is an alert name.
- **Schedule** is a frequency of how often the rule should run. With CRON expressions your schedule can be as complex as needed.
- **Target Dashboard** and **Target Panel** are drop-downs to select from the existing ones. The alert rule will take queries and thresholds from there automatically.
- **Schedule** is a frequency of how often the rule should run.
- With CRON expressions your schedule can be as complex as needed.
- **Target Dashboard** and **Target Panel** are drop-downs to select from the existing ones.
- The alert rule will take queries and thresholds from there automatically.
- **Time Range** could be either taken from the dashboard or specified custom.
- The alert **evaluation** could be set to **Thresholds** or **Regex Pattern**.
- For the **Thresholds**, the alert examines the data against thresholds set in the panel options.
Expand All @@ -75,3 +68,21 @@ The new alert rule/edit window has the following configuration elements to speci
- specify **Panel** to create and attach an annotation to a panel,
- specify **Dashboard** to create and attach an annotation to a dashboard (i.e. all panels of this dashboard),
- specify **Disabled** to disable the creation of any annotation following the alert rule breach.

## Multi-frames

The Business Engine supports multi-frame data sets. That means that the alert rule assigned to a dashboard and panel will be applied to all data frames fetched from the connected data source.

<Image
title="Dashboard and panel are mandatory parameters for an alert rule."
src="/img/big/business-engine/multi-frames.png"
/>

## Transformations

:::note BETA

Support of Grafana transformation is in beta state. This is due to the number of existing transformations.
:::

Grafana transformations are supported. However, please, note, that Grafana offers a large number of transformations, hence, there is a chance that some of them will not work correctly.
12 changes: 9 additions & 3 deletions big/alerting/thresholds.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The alert rule configuration in the Business Studio looks as follows.
src="/img/big/business-alerting/thresh-conf.png"
/>

All devices, except the Tampa South 124, have all ranges of metrics allowed. The Tampa South 124 device generates an annotation when a metric value is above 30.
All devices, except the `Tampa South 124`, have all ranges of metrics allowed. The `Tampa South 124` device generates an annotation when a metric value is above `30`.

## Annotations

Expand All @@ -41,5 +41,11 @@ They highlight all time ranges when the data values (here, it is temperature) we

Please note that no values (no registered temperature data) exist between the highest peak and the lowest deep on the graph. The temperature changes abruptly from max to min.

- The temperature for all four alerts goes above 30,
- For the second alert, it goes up to 90.5 degrees (illustrated with a tooltip).
- The temperature for all four alerts goes above `30`,
- For the second alert, it goes up to `90.5` degrees (illustrated with a tooltip).

## Annotations ranges

Depends on the **Apply to** evaluation parameters annotations will be created as a range or a single line.

Evaluating using **All value** or **Percentage** will create an annotations range.
12 changes: 11 additions & 1 deletion big/alerting/variables.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,11 @@ Here is an example of the Time Series visualization where one data series trigge
src="/img/big/business-alerting/var-1.png"
/>

To access the alert rule configuration in the Business Studio, select the Business Engines from the main screen, then switch to the **Alert rules** tab. From there, either use the existing alert rule from the list to reconfigure or create a new rule by clicking on the **+ Add** button.
To access the alert rule configuration in the Business Studio, select the Business Engines from the main screen, then switch to the **Alert rules** tab.

### Configuration

From there, either use the existing alert rule from the list to reconfigure or create a new rule by clicking on the **+ Add** button.

<Image
title="How to open an alert rule to review/modification or create a new alert rule."
Expand All @@ -39,3 +43,9 @@ Below is what the configuration of the alert rule using dashboard variables migh
title="With variables, the same alert rule can have multiple thresholds."
src="/img/big/business-alerting/var-3.png"
/>

### Custom variable values

Grafana supports a wide range of data sources. From the beginning, we ensured the compatibility between the Business Engine and SQL and Prometheus data sources. We are actively working on ensuring many other data sources are compatible.

However, in the case when variables for your particular data source are not yet supported, we allow to specify variable values directly.
4 changes: 2 additions & 2 deletions big/engine/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Business Engine API is available to use in 3rd-party applications to integrate w

All environment variables are displayed on the **Environment** tab in Business Studio. In the future, we plan to allow them to be modified.

The environment info is taken by using the [GET /environment endpoint](/big/api/).
The environment info is taken by using the [GET `/environment` endpoint](/big/api/).

<Image
title="Environment variables for the Charlie Engine 1."
Expand All @@ -24,7 +24,7 @@ The environment info is taken by using the [GET /environment endpoint](/big/api/

The timeline of alerting states is a subset of all alert rule records and includes only instances where alerting state changes.

We added Business Engine API endpoint to [get timeline of alerting states](/big/api/).
We added [GET `/alerts/history/timeline` endpoint](/big/api/) to get timeline of alerting states.

For instance, below is the complete list of alert rule states.

Expand Down
26 changes: 7 additions & 19 deletions big/engine/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ tags:

# Configuration

Business Engine is configured using environment variables.
Business Engine supports environment variables to update default configuration.

:::note Future releases

In the future release, the **Environent** tab in Business Studio will allow to configure them in real-time.
In the future release, the **Environent** tab in Business Studio will allow to update configuration in real-time.
:::

## Timescale database

Timescale (PostgreSQL) database is used to store Business Engine database and required for start-up. We selected Timescale database because it can be used to store Grafana configuration for High Availability setup.
Timescale (PostgreSQL) database is used to store Business Engine database and required for start-up. We selected Timescale database because it can be used to store Grafana configuration for [High Availability](../high-availability.mdx) setup.

```shell
########################### Database Configuration #############################################
Expand Down Expand Up @@ -64,7 +64,7 @@ GRAFANA_TOKEN=SERVICE-ACCOUNT-TOKEN

### Health Check

Depends on the configuration, Grafana might take longer time to respond to the requests from Business Engine. This scenario may lead to the failure of the startup health check. Simply put, if the Business Engine can't verify that Grafana is responding, it shuts itself down.
Depends on the configuration, Grafana might take longer time to respond to the requests from Business Engine. This scenario may lead to the failure of the startup health check.

To avoid this false health check failure, we added timing parameters that determine how long the Business Engine waits before checking Grafana's active status using Grafana token and URL to connect to.

Expand All @@ -85,7 +85,7 @@ GRAFANA_HEALTH_CHECK_RETRY=2

### GRAFANA_REQUEST_TIMEOUT

Data Source requests timeout can be increased for slow requests. By default, the waiting is 10 seconds.
Data Source requests timeout can be increased for slow requests. By default, the wait time is 10 seconds.

```shell
##
Expand All @@ -96,7 +96,7 @@ GRAFANA_REQUEST_TIMEOUT=10000

## Business Engine

API Server and Scheduler starts on port 3001 and 3002 by default. They can be changed to start in `Host` mode, otherwise use port mapping to assign different ports.
API Server and Scheduler starts on port 3001 and 3002 by default. They can be changed to start in the `Host` network mode, otherwise use port mapping to assign different ports.

```shell
##
Expand All @@ -110,18 +110,6 @@ ENGINE_SERVER_PORT=3001
ENGINE_SCHEDULER_PORT=3002
```

### ENGINE_NODE_ID

Node Id is required for high availability and load balancing cluster configuration. Should be unique for each Engine.

```Shell
##
## Unique Node Id for distributed alert scheduling
## Should be unique for each Engine
##
ENGINE_NODE_ID=1
```

### ENGINE_API_DOCUMENTATION

Swagger UI can be enable to experiment with Business Engine API.
Expand All @@ -136,7 +124,7 @@ ENGINE_API_DOCUMENTATION=false

### ENGINE_ALERT_BATCH_SIZE

You can increase batch size to increase performance for alerts with multiple variable executions. It may increase load on the data source for heavy queries and limited resources.
You can increase batch size to improve performance for alerts with multiple variable executions. It may create additional load on the data source for heavy queries and limited resources.

```shell
##
Expand Down
23 changes: 4 additions & 19 deletions big/engine/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,17 @@ import Image from "@theme/Image";

# Business Engine

We reimagined the Alert Manager and came up with the Business Engine:
We reimagined the native Alerting in Grafana and came up with the Business Engine with Alerting module:

- It uses dashboards as configuration which means it retrieves dashboard queries and thresholds and uses them as alert rule parameters. That eliminates the duplicative work when users have to enter the same specifics twice.
- It uses dashboards as configuration which means it retrieves dashboard queries and thresholds and uses them as alert rule parameters. That eliminates the duplicative work when users have to enter and update the same specifics twice.
- It is installed as a separate container which makes the system architecture flexible.

<Image
title="Conceptual workflow from the user POV."
src="/img/big/business-engine/workflow.png"
/>

## Multi-frames
:::note Grafana 11

As you know, an alert rule is created for a particular panel on a particular dashboard.

<Image
title="Dashboard and panel are mandatory parameters for an alert rule."
src="/img/big/business-engine/multi-frames.png"
/>

The Business Engine supports multi-frame data sets. That means that the alert rule assigned to a dashboard and panel will be applied to all data frames fetched from the connected data source.

## Transformations

:::note BETA

Support of Grafana transformation is in beta state. This is due to the number of existing transformations.
The Business Engine requires Grafana 11. The Business Intelligence platform will always be compatible with the most current Grafana version.
:::

Grafana transformations are supported. However, please, note, that Grafana offers a large number of transformations, hence, there is a chance that some of them will not work correctly.
11 changes: 10 additions & 1 deletion big/engine/prometheus.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,22 @@ import Image from "@theme/Image";

# Prometheus

Business Engine provides Prometheus metrics for performance monitoring using provisioned `Business Engine` dashboard.
Business Engine provides Prometheus metrics for performance monitoring using ready-to-use, out-of-the-box Business Engine dashboard with all necessary metrics for kick-off the successful monitoring.

Having Prometheus monitoring built-in in the Business Engine allows you to monitor the Business Intelligence platform itself.

<Image
title="Grafana dashboard displays Business Engine metrics stored in Prometheus."
src="/img/big/business-engine/prometheus.png"
/>

## Endpoints

The Business Engine consists of two endpoints which provides metrics:

- API Server - `engine:3001/metrics`
- Scheduler - `engine:3002/metrics`

## Configuration

Sample configuration to collect metrics from Server API and Scheduler processes.
Expand Down
8 changes: 5 additions & 3 deletions big/getting-started.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,21 @@ import Youtube from "@theme/Youtube";

The Business Intelligence platform utilize Docker containers to be modular and scalable. The getting started Community configuration can be downloaded from [GitHub repository](https://github.com/volkovlabs/business-intelligence).

## Requirements
:::note Grafana 11

- Business Intelligence 2.X supports **Grafana 11**.
The Business Engine requires Grafana 11. The Business Intelligence platform will always be compatible with the most current Grafana version.
:::

## Docker containers

The `docker-compose.yml` file consists of the following containers and can be used to start the platform:

- **Grafana** includes the provisioned dashboards and datasources.
- **Grafana** includes the provisioned dashboards and data sources.
- **Timescale** is required to store Business Engine configuration.
- **Business Engine** has a service account key to access Grafana HTTP APIs. It evaluates alert rules and calls actions when alert statuses change.
- **Prometheus** collects and store performance metrics from Business Engine.
- **JSON server** is an action example based on NodeJS, which accepts alert payload and saves it to the files for testing purposes.
- **Data Emulator** is a NodeJS scripts to populate data to Timescale database and demonstrate variables functionality.

<Code
url="https://github.com/VolkovLabs/business-intelligence/blob/main/docker-compose.yml"
Expand Down
5 changes: 3 additions & 2 deletions big/high-availability.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,13 @@ All Business Intelligence components could exist in clusters, where a cluster is
- Business Engine Cluster:
- Requests to Server API are distributed behind the Load Balancer.
- Schedulers distribute execution of alert rules automatically.
- Grafana Cluster visualizes data and provides Grafana API for Business Engine to retrieve configuration and data.
- Grafana Cluster visualizes data and provides Grafana API for Business Engine.
- Prometheus Cluster stores Business Engine performance data.
- May store Production metrics.
- PostgreSQL (Timescale) Cluster stores:
- Business Engine database.
- Grafana configuration database.
- Production data.
- May store Production data.

Below is the picture to illustrate the current High Availability setup.

Expand Down
Loading

0 comments on commit d60c0a8

Please sign in to comment.