-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
453 additions
and
518 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
(cluster-automation)= | ||
# Automation | ||
|
||
Automation in CrateDB Cloud allows users to streamline and manage routine | ||
database operations efficiently. Two primary automation features available are | ||
the SQL Scheduler and Table Policies, both of which facilitate the maintenance | ||
and optimization of database tasks. | ||
|
||
:::{important} | ||
- Automation is available for all newly deployed clusters. | ||
- For existing clusters, the feature can be enabled on demand. (Contact | ||
[support](https://support.crate.io/) for activation.) | ||
|
||
Automation utilizes a dedicated database user `gc_admin` with full cluster | ||
privileges to execute scheduled tasks and persists data in the `gc` schema. | ||
::: | ||
|
||
## SQL Scheduler | ||
|
||
The SQL Scheduler is designed to automate routine database tasks by scheduling | ||
SQL queries to run at specific times, in UTC time. This feature supports | ||
creating job descriptions with valid [cron patterns](https://www.ibm.com/docs/en/db2oc?topic=task-unix-cron-format) | ||
and SQL statements, enabling a wide range of tasks. Users can manage these jobs | ||
through the Cloud UI, adding, removing, editing, activating, and deactivating | ||
them as needed. | ||
|
||
### Use Cases | ||
|
||
- Regularly updating or aggregating table data. | ||
- Automating export and import of data. | ||
- Deleting old/redundant data to maintain database efficiency. | ||
|
||
### Accessing and Using the SQL Scheduler | ||
|
||
SQL Scheduler can be found in the "Automation" tab in the left-hand | ||
navigation menu. There are two tabs relevant to the SQL Scheduler: | ||
|
||
|
||
**SQL Scheduler** shows a list of your existing jobs. In the list, you can | ||
activate/deactivate each job with a toggle in the "Active" column. You can | ||
also edit and delete jobs with buttons on the right side of the list. | ||
|
||
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-overview.png) | ||
|
||
|
||
**Logs** shows a list of *scheduled* job runs, whether they failed or succeeded, | ||
execution time, run time, and the error in case they were unsuccessful. In case | ||
of an error, more details can be viewed showing the executed query and a stack | ||
trace. You can filter the logs by status or by a specific job. | ||
|
||
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-logs.png) | ||
|
||
### Examples | ||
|
||
#### Cleanup of Old Files | ||
|
||
Cleanup tasks represent a common use case for these types of automated jobs. | ||
This example deletes records older than 30 days from a specified table once a | ||
day: | ||
|
||
```sql | ||
DELETE FROM "sample_data" | ||
WHERE | ||
"timestamp_column" < NOW() - INTERVAL '30 days'; | ||
``` | ||
|
||
How often you run it, of course, depends on you, but once a day is common for | ||
cleanup. This expression runs every day at 2:30 PM UTC: | ||
|
||
Schedule: `30 14 * * *` | ||
|
||
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-example-cleanup.png) | ||
|
||
#### Copying Logs into a Persistent Table | ||
|
||
Another useful example might be copying data to another table for archival | ||
purposes. This specifically copies from the system logs table into one of | ||
our own tables. | ||
|
||
```sql | ||
CREATE TABLE IF NOT EXISTS "logs"."persistent_jobs_log" ( | ||
"classification" OBJECT (DYNAMIC), | ||
"ended" TIMESTAMP WITH TIME ZONE, | ||
"error" TEXT, | ||
"id" TEXT, | ||
"node" OBJECT (DYNAMIC), | ||
"started" TIMESTAMP WITH TIME ZONE, | ||
"stmt" TEXT, | ||
"username" TEXT, | ||
PRIMARY KEY (id) | ||
) CLUSTERED INTO 1 SHARDS; | ||
|
||
INSERT INTO | ||
"logs"."persistent_jobs_log" | ||
SELECT | ||
* | ||
FROM | ||
sys.jobs_log | ||
ON CONFLICT ("id") DO NOTHING; | ||
``` | ||
|
||
In this example, we schedule the job to run every hour: | ||
|
||
Schedule: `0 * * * *` | ||
|
||
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-example-copying.png) | ||
|
||
:::{note} | ||
Limitations and Known Issues: | ||
* Only one job can run at a time; subsequent jobs will be queued until the | ||
current one completes. | ||
* Long-running jobs may block the execution of queued jobs, leading to | ||
potential delays. | ||
::: | ||
|
||
|
||
## Table Policies | ||
|
||
Table policies allow automating maintenance operations for **partitioned tables**. | ||
Automated actions can be set up to be executed daily based on a pre-configured | ||
ruleset. | ||
|
||
![Table policy list](../_assets/img/cluster-table-policy.png) | ||
|
||
### Overview | ||
|
||
Table policy overview can be found in the left-hand navigation menu under | ||
"Automation". From the list of policies, you can create, delete, edit, or | ||
(de)activate them. Logs of executed policies can be found in the "Logs" tab. | ||
|
||
![Table policy list](../_assets/img/cluster-table-policy-logs.png) | ||
|
||
A new policy can be created with the "Add New Policy" button. | ||
|
||
![Table policy list](../_assets/img/cluster-table-policy-create.png) | ||
|
||
After naming the policy and selecting the tables/schemas to be impacted, you | ||
must specify the time column. This column, which should be a timestamp used for | ||
partitioning, will determine the data affected by the policy. It is important | ||
that this time column is consistently present across all targeted tables/schemas. | ||
While you can apply the policy to tables without the specified time column, | ||
it will not get executed for those. If your tables have different timestamp | ||
columns, consider setting up separate policies for each to ensure accuracy. | ||
|
||
:::{note} | ||
The "Time Column" must be of type `TIMESTAMP`. | ||
::: | ||
|
||
Next, a condition is used to determine affected partitions. The system is | ||
time-based. A partition is eligible for action if the value in the partitioned | ||
column is smaller (`<`), or smaller or equal (`<=`) than the current date minus | ||
`n` days, months, or years. | ||
|
||
### Actions | ||
|
||
Following actions are supported: | ||
* **Delete:** Deletes eligible partitions along with their data. | ||
* **Set replicas:** Changes the replication factor of eligible partitions. | ||
* **Force merge:** Merges segments on eligible partitions to ensure a specified number. | ||
|
||
After filling out the info, you can see the affected schemas/tables and the | ||
number of affected partitions if the policy gets executed at this very moment. | ||
|
||
### Examples | ||
|
||
Consider a scenario where you have a table and wish to optimize space on your | ||
cluster. For older data, which might already be snapshotted, it may be | ||
sufficient for it to exist just once in the cluster without replication. In | ||
such cases, high availability is not a priority, and you plan to retain the data | ||
for only 60 days. | ||
|
||
Assume the following table schema: | ||
|
||
```sql | ||
CREATE TABLE data_table ( | ||
ts TIMESTAMP, | ||
ts_day GENERATED ALWAYS AS date_trunc('day',ts), | ||
val DOUBLE | ||
) PARTITIONED BY (ts_day); | ||
``` | ||
|
||
For the outlined scenario, the policies would be as follows: | ||
|
||
**Policy 1 - Saving replica space:** | ||
* **Time Column:** `ts_day` | ||
* **Condition:** `older than 30 days` | ||
* **Actions:** `Set replicas to 0.` | ||
|
||
**Policy 2 - Data removal:** | ||
* **Time Column:** `ts_day` | ||
* **Condition:** `older than 60 days` | ||
* **Actions:** `Delete eligible partition(s)` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
(cluster-console)= | ||
# Console | ||
|
||
The Console in the CrateDB Cloud Console allows users to execute SQL queries | ||
seamlessly against their CrateDB Cloud cluster. The Console can be accessed | ||
by users having the "Organization Admin" role in the left-hand navigation menu | ||
within a cluster. | ||
|
||
- **Table and Schema Tree View:** Easily navigate through your database | ||
structure. | ||
- **Client-Side Query Validation:** Ensure your SQL queries are correct before | ||
execution. | ||
- **Multiple Query Execution:** Run several queries in sequence. | ||
- **Query History:** Access and manage your past queries. | ||
|
||
:::{important} | ||
- The Console is available for all newly deployed clusters. | ||
- For older clusters, this feature can be enabled on demand. Contact | ||
[support](https://support.crate.io/) for activation. | ||
|
||
The Console currently utilizes a dedicated database user `gc_admin` with full | ||
cluster privileges. | ||
::: | ||
|
||
:::{note} | ||
**Multi-Query Execution:** | ||
When running multiple queries at once, the Console executes them sequentially, | ||
not within a single session or transaction. If one query fails, the subsequent | ||
queries will not be executed. Currently, session settings are not persisted | ||
between queries. | ||
::: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
(cluster-export)= | ||
# Export | ||
|
||
The "Export" section allows users to download specific tables/views. When you | ||
first visit the Export tab, you can specify the name of a table/view, | ||
format (CSV, JSON, or Parquet) and whether you'd like your data to be | ||
gzip compressed (recommended for CSV and JSON files). | ||
|
||
:::{important} | ||
- Size limit for exporting is 1 GiB | ||
- Exports are held for 3 days, then automatically deleted | ||
::: | ||
|
||
:::{note} | ||
**Limitations with Parquet**: | ||
Parquet is a highly compressed data format for very efficient storage of | ||
tabular data. Please note that for OBJECT and ARRAY columns in CrateDB, | ||
the exported data will be JSON encoded when saving to Parquet | ||
(effectively saving them as strings). This is due to the complexity of | ||
encoding structs and lists in the Parquet format, where determining the | ||
exact schema might not be possible. When re-importing such a Parquet | ||
file, make sure you pre-create the table with the correct schema. | ||
::: | ||
|
||
|
||
|
||
|
Oops, something went wrong.