title |
---|
Fragment |
A fragment metadata folder is called <timestamped_name>
` and located here:
my_array # array folder
| ...
|_ __fragments # array fragments folder
|_ <timestamped_name> # fragment folder
| |_ __fragment_metadata.tdb # fragment metadata
| |_ a0.tdb # fixed-sized attribute
| |_ a1.tdb # var-sized attribute (offsets)
| |_ a1_var.tdb # var-sized attribute (values)
| |_ a2.tdb # fixed-sized nullable attribute
| |_ a2_validity.tdb # fixed-sized nullable attribute (validities)
| |_ ...
| |_ d0.tdb # fixed-sized dimension
| |_ d1.tdb # var-sized dimension (offsets)
| |_ d1_var.tdb # var-sized dimension (values)
| |_ ...
| |_ t.tdb # timestamp attribute
| |_ ...
| |_ dt.tdb # delete timestamp attribute
| |_ ...
| |_ dci.tdb # delete condition index attribute
| |_ ...
| |_ __coords.tdb # legacy coordinates
|_ ...
There can be any number of fragments in an array. The fragment folder contains:
- A single fragment metadata file named
__fragment_metadata.tdb
. - Any number of data files.
- For each fixed-sized attribute or dimension, there is a single data file
a0.tdb
(d0.tdb
) containing the cell values of the attribute (dimension). - For each var-sized attribute or dimension, there are two data files;
a1_var.tdb
(d1_var.tdb
) containing the cell values of the attribute (dimension) anda1.tdb
(d1.tdb
) containing the starting 64-bit offsets of the values of each cell. - For each nullable attribute, there is an additional file
a2_validity.tdb
that contains its validity vector (a sequence of bytes where zero indicates that a cell is null). - The names of the data files are not dependent on the names of the attributes/dimensions. The file names are determined by the order of the attributes and dimensions in the array schema.
- New in version 14 The timestamp fixed attribute (
t.tdb
) is, for fragments consolidated with timestamps, the time at which a cell was added. - New in version 15 The delete timestamp fixed attribute (
dt.tdb
) is, for fragments consolidated with delete conditions, the time at which a cell was deleted. - New in version 15 The delete condition Delete commit file index fixed attribute (
dci.tdb
) is, for fragments consolidated with delete conditions, the index of the delete condition (inside of Tile Processed Conditions) that deleted the cell.
- For each fixed-sized attribute or dimension, there is a single data file
Data files containing cell values are filtered with the filters specified in the Filters field of the corresponding attribute or dimension.
Data files containing cell offsets are filtered with the filters specified in the Offsets filters field of the array schema.
Data files containing cell validity vectors are filtered with the filters specified in the Validity filters field of the array schema.
Timestamp, delete timestamp, and delete condition index attributes are filtered with the filters specified in the Coords filters field of the array schema.
Note
Prior to version 9, data files were named after their corresponding attributes or dimensions.
In version 8 only, certain characters of the data files' name were percent-encoded. These characters are !#$%&'()*+,/:;=?@[]
, as specified in RFC 3986, as well as "<>\|
, which are not allowed in Windows file names.
The fragment metadata file has the following on-disk format:
Field | Type | Description |
---|---|---|
R-Tree | R-Tree | The serialized R-Tree |
Tile offsets for attribute/dimension 1 | Tile Offsets | The serialized on-disk tile offsets for attribute/dimension 1 |
… | … | … |
Tile offsets for attribute/dimension N | Tile Offsets | The serialized on-disk tile offsets for attribute/dimension N |
Variable tile offsets for attribute/dimension 1 | Tile Offsets | The serialized on-disk variable tile offsets for attribute/dimension 1 |
… | … | … |
Variable tile offsets for attribute/dimension N | Tile Offsets | The serialized on-disk variable tile offsets for attribute/dimension N |
Variable tile sizes for attribute/dimension 1 | Tile Sizes | The serialized in-memory variable tile sizes for attribute/dimension 1 |
… | … | … |
Variable tile sizes for attribute/dimension N | Tile Sizes | The serialized in-memory variable tile sizes for attribute/dimension N |
Validity tile offsets for attribute/dimension 1 | Tile Offsets | New in version 7 The serialized on-disk validity tile offsets for attribute/dimension 1 |
… | … | … |
Validity tile offsets for attribute/dimension N | Tile Offsets | New in version 7 The serialized on-disk validity tile offsets for attribute/dimension N |
Tile mins for attribute/dimension 1 | Tile Mins/Maxes | New in version 11 The serialized mins for attribute/dimension 1 |
… | … | … |
Variable mins for attribute/dimension N | Tile Mins/Maxes | New in version 11 The serialized mins for attribute/dimension N |
Tile maxes for attribute/dimension 1 | Tile Mins/Maxes | New in version 11 The serialized maxes for attribute/dimension 1 |
… | … | … |
Variable maxes for attribute/dimension N | Tile Mins/Maxes | New in version 11 The serialized maxes for attribute/dimension N |
Tile sums for attribute/dimension 1 | Tile Sums | New in version 11 The serialized sums for attribute/dimension 1 |
… | … | … |
Variable sums for attribute/dimension N | Tile Sums | New in version 11 The serialized sums for attribute/dimension N |
Tile null counts for attribute/dimension 1 | Tile Null Count | New in version 11 The serialized null counts for attribute/dimension 1 |
… | … | … |
Tile null counts for attribute/dimension N | Tile Null Count | New in version 11 The serialized null counts for attribute/dimension N |
Fragment min, max, sum, null count | Tile Fragment Min Max Sum Null Count | New in version 11 The serialized fragment min max sum null count |
Processed conditions | Tile Processed Conditions | New in version 16 The serialized processed conditions |
Metadata footer | Footer | Basic metadata gathered in the footer |
Note
Prior to version 3, fragment metadata are stored with a different structure.
The R-Tree is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Dimension number | uint32_t |
Removed in version 5 Number of dimensions. Can also be obtained from the array schema. |
Fanout | uint32_t |
The tree fanout |
Datatype | uint8_t |
Removed in version 5 The domain's datatype. Dimensions are no longer guaranteed to have the same datatype since version 5. |
Num levels | uint32_t |
The number of levels in the tree |
Num MBRs at level 1 | uint64_t |
The number of MBRs at level 1 |
MBR 1 at level 1 | MBR | First MBR at level 1 |
… | … | … |
MBR N at level 1 | MBR | N-th MBR at level 1 |
… | … | … |
Num MBRs at level L | uint64_t |
The number of MBRs at level L |
MBR 1 at level L | MBR | First MBR at level L |
… | … | … |
MBR N at level L | MBR | N-th MBR at level L |
Each MBR entry has format:
Field | Type | Description |
---|---|---|
1D range for dimension 1 | 1DRange |
The 1-dimensional range for dimension 1 |
… | … | … |
1D range for dimension D | 1DRange |
The 1-dimensional range for dimension D |
For fixed-sized dimensions, the 1DRange
format is:
Field | Type | Description |
---|---|---|
Range minimum | uint8_t |
The minimum value with the same datatype as the dimension |
Range maximum | uint8_t |
The maximum value with the same datatype as the dimension |
For var-sized dimensions, the 1DRange
format is:
Field | Type | Description |
---|---|---|
Range length | uint64_t |
The number of bytes of the 1D range |
Minimum value length | uint64_t |
The number of bytes of the minimum value |
Range minimum | uint8_t |
The minimum (var-sized) value with the same datatype as the dimension |
Range maximum | uint8_t |
The maximum (var-sized) value with the same datatype as the dimension |
Tile offsets refer to each on-disk data tile's starting byte offset.
Tile offsets is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num tile offsets | uint64_t |
Number of tile offsets |
Tile offset 1 | uint64_t |
Offset 1 |
… | … | … |
Tile offset N | uint64_t |
Offset N |
The tile size refers to the in-memory size.
It is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num tile sizes | uint64_t |
Number of tile sizes |
Tile size 1 | uint64_t |
Size 1 |
… | … | … |
Tile size N | uint64_t |
Size N |
The tile mins maxes is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num values | uint64_t |
Number of values |
Value 1 | type |
Value 1 or Offset 1 |
… | … | … |
Value N | type |
Value N or Offset N |
Var buffer size | uint64_t |
Var buffer size |
Var buffer | uint8_t |
Var buffer |
The tile sums is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num values | uint64_t |
Number of values |
Value 1 | uint64_t |
Sum 1 |
… | … | … |
Value N | uint64_t |
Sum N |
The tile null count is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num values | uint64_t |
Number of values |
Value 1 | uint64_t |
Count 1 |
… | … | … |
Value N | uint64_t |
Count N |
The fragment min max sum null count is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Min size | uint64_t |
Size of the min value for attribute/dimension 1 |
Min value | uint8_t |
Buffer for min value for attribute/dimension 1 |
Max size | uint64_t |
Size of the max value for attribute/dimension 1 |
Max value | uint8_t |
Buffer for max value for attribute/dimension 1 |
Sum | uint64_t |
Sum value for attribute/dimension 1 |
Null count | uint64_t |
Null count value for attribute/dimension 1 |
… | … | … |
Min size | uint64_t |
Size of the min value for attribute/dimension N |
Min value | uint8_t |
Buffer for min value for attribute/dimension N |
Max size | uint64_t |
Size of the max value for attribute/dimension N |
Max value | uint8_t |
Buffer for max value for attribute/dimension N |
Sum | uint64_t |
Sum value for attribute/dimension N |
Null count | uint64_t |
Null count value for attribute/dimension N |
Tile and fragment mins, maxes, sums and null counts are colloquially referred to as "tile metadata".
Note
Prior to version 21, tile metadata for nullable fixed-size strings on dense arrays might be incorrect and implementations must not rely on them.
The processed conditions is a generic tile and is the list of delete/update conditions that have already been applied for this fragment and don't need to be applied again, sorted by filename, with the following internal format:
Field | Type | Description |
---|---|---|
Num | uint64_t |
Number of processed conditions |
Condition size | uint64_t |
Condition size 1 |
Condition | uint8_t |
Condition marker filename 1 |
… | … | … |
Condition size | uint64_t |
Condition size N |
Condition | uint8_t |
Condition marker filename N |
The footer is a simple blob (i.e., not a generic tile) with the following internal format:
Field | Type | Description |
---|---|---|
Version number | uint32_t |
Format version number of the fragment |
Array schema name size | uint64_t |
New in version 10 Size of the array schema name |
Array schema name | string |
New in version 10 Array schema name |
Dense | uint8_t |
Whether the array is dense (1) or not (0) |
Null non-empty domain | uint8_t |
Indicates whether the non-empty domain is null (1) or not (0) |
Non-empty domain | MBR | An MBR denoting the non-empty domain |
Number of sparse tiles | uint64_t |
Number of sparse tiles |
Last tile cell num | uint64_t |
For sparse arrays, the number of cells in the last tile in the fragment |
Includes timestamps | uint8_t |
New in version 14 Whether the fragment includes timestamps (1) or not (0) |
Includes delete metadata | uint8_t |
New in version 15 Whether the fragment includes delete metadata (1) or not (0) |
File sizes | uint64_t[] |
The size in bytes of each attribute/dimension file in the fragment. For var-length attributes/dimensions, this is the size of the offsets file. |
File var sizes | uint64_t[] |
The size in bytes of each var-length attribute/dimension file in the fragment. |
File validity sizes | uint64_t[] |
The size in bytes of each attribute/dimension validity vector file in the fragment. |
R-Tree offset | uint64_t |
The offset to the generic tile storing the R-Tree in the metadata file. |
Tile offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile offsets for attribute/dimension 1. |
… | … | … |
Tile offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile offsets for attribute/dimension N |
Tile var offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the variable tile offsets for attribute/dimension 1. |
… | … | … |
Tile var offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the variable tile offsets for attribute/dimension N. |
Tile var sizes offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the variable tile sizes for attribute/dimension 1. |
… | … | … |
Tile var sizes offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the variable tile sizes for attribute/dimension N. |
Tile validity offset for attribute/dimension 1 | uint64_t |
New in version 7 The offset to the generic tile storing the tile validity offsets for attribute/dimension 1. |
… | … | … |
Tile validity offset for attribute/dimension N | uint64_t |
New in version 7 The offset to the generic tile storing the tile validity offsets for attribute/dimension N |
Tile mins offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile mins for attribute/dimension 1. |
… | … | … |
Tile mins offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile mins for attribute/dimension N |
Tile maxes offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile maxes for attribute/dimension 1. |
… | … | … |
Tile maxes offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile maxes for attribute/dimension N |
Tile sums offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile sums for attribute/dimension 1. |
… | … | … |
Tile sums offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile sums for attribute/dimension N |
Tile null counts offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile null counts for attribute/dimension 1. |
… | … | … |
Tile null counts offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile null counts for attribute/dimension N |
Fragment min max sum null count offset | uint64_t |
The offset to the generic tile storing the fragment min max sum null count data. |
Processed conditions offset | uint64_t |
New in version 16 The offset to the generic tile storing the processed conditions. |
Array schema name size | uint64_t |
The total number of characters of the array schema name. |
Array schema name | uint8_t[] |
The array schema name. |
Footer length | uint64_t |
Sum of bytes of the above fields. |
Note
Prior to version 10, the Footer length field was present only when the array had at least one variable-sized dimension. Implementations had to obtain the format version from the fragment folder's timestamped name.
The on-disk format of each data file is:
Field | Type | Description |
---|---|---|
Tile 1 | Tile | The data of tile 1 |
… | … | … |
Tile N | Tile | The data of tile N |
Prior to version 5, dimension data for sparse cells are combined in a single tile that is stored in the __coords.tdb
file. The tile is filtered with the filters specified in the Coords filters field of the array schema.
Coordinates of a multi-dimensional array are placed in either zipped or unzipped order. In zipped order, coordinates of a cell are placed next to each other and ordered by the cell index, while in unzipped order, all coordinates values of a dimension are placed next to each other and ordered by the dimension index.
- Since version 2, coordinates are always stored unzipped.
- In version 1, coordinates are stored unzipped if a compression filter exists in the filter list. Otherwise, they are stored zipped.
Prior to version 3, fragment metadata is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Version number | uint32_t |
Format version number of the fragment |
Non-empty domain size | uint64_t |
Size of non-empty domain |
Non-empty domain | uint8_t[] |
Byte array of coordinate pairs storing the non-empty domain |
Num MBRs | uint64_t |
Number of MBRs in fragment |
MBR 1 | uint8_t[] |
Byte array of coordinate pairs storing MBR 1 |
… | … | … |
MBR N | uint8_t[] |
Byte array of coordinate pairs storing MBR N |
Num bounding coords | uint64_t |
Number of bounding coordinates |
Bounding coords | uint8_t[] |
Byte array of coordinate pairs storing the first/last coordinates in the fragment |
Tile offsets | Legacy Tile Offsets | The offsets of each tile in the attribute files |
Tile var offsets | Legacy Tile Offsets | The offsets of each variable tile in the attribute files |
Variable tile sizes | Legacy Tile Sizes | The sizes of each variable tile in the attribute files |
Last tile cell num | For sparse arrays, the number of cells in the last time in the fragment. Ignored on dense arrays. | |
File sizes | uint64_t[] |
The size in bytes of each attribute/dimension file in the fragment. For var-length attributes/dimensions, this is the size of the offsets file. |
File var sizes | uint64_t[] |
The size in bytes of each var-length attribute/dimension file in the fragment. |
Legacy tile offsets and sizes is a simple blob (i.e., not a generic tile) with the following internal format:
Field | Type | Description |
---|---|---|
Num tile offsets/sizes, attribute 1 | uint64_t |
Number of tile offsets/sizes for attribute 1 |
Tile offset/size 1, attribute 1 | uint64_t |
Offset/Size 1 for attribute 1 |
… | … | … |
Tile offset/size N, attribute 1 | uint64_t |
Offset/Size N for attribute 1 |
… | … | … |
Num tile offsets/sizes, attribute N | uint64_t |
Number of tile offsets/sizes for attribute N |
Tile offset/size 1, attribute N | uint64_t |
Offset/Size 1 for attribute N |
… | … | … |
Tile offset/size N, attribute N | uint64_t |
Offset/Size N for attribute N |