When connection to the database, both schemas are connected to and both schemas are reflected. This means that for all queries both schemas are queried by default and their results combined.
For registering and modifying there is still only a single "active" schema per
DataRegistry()
(i.e., DbConnection()
) instance. If the database was
connected to with production_mode=False
(the default), registered datasets
will go into the working schema. If production_mode=True
registered datasets
will go into the production schema. The same logic is true for modifying
registry entries.
Update delete functionality
- The
delete()
function now takesname
,version_string
,owner
andowner_type
as arguments, rather than simply thedataset_id
. - One can still delete by the
dataset_id
using the CLI, which now includes a confirmation step.
Make more options for querying
- There is now a
~=
query operator that can utalise the.ilike
filter to allow non-case-sensitive filering with wildcards (i.e., the%
character). dregs ls
can now filter on the dataset name, including%
wildcards, using the--name
option.dregs_ls
can return arbitrary columns using the--return_cols
option
Some changes to the way the relative_path
is automatically generated from the
name
and version
.
- All automatically generated
relative_paths
are placed in a top level.gen_paths
directory, e.g.,<root_dir>/<schema>/<owner_type>/<owner>/.gen_paths
. This is to prevent clashes with user specifiedrelative_paths
. - Single files no longer have their filename changed when automatically
generating the
relative_path
. Instead a directory containing thename
andversion
is created, and the file is copied there. This preserves the filname suffix in the relative path.
Update documentation for release, new installation instructions etc.
- Update default NERSC site to
/global/cfs/cdirs/lsst/utilities/desc-data-registry
- Update default schema names (now stored in
src/dataregistry/schema/default_schema_names.yaml
- There is now a
reg_admin
account which is the only account to create the initial schemas. The schema creation script has been updated to give the correctreg_writer
andreg_reader
privileges. - Remove
version_suffix
- Update
dregs ls
to be a bit cleaner. Also hasdregs ls --extended
option to give back more quantities. Also can now query on a keyword usingdregs ls --keyword <keyword>
- Added
modify
to CLI to update datasets from the command line
There cannot be a unique constraint in the database for the owner
,
owner_type
and relative_path
, as multiple entries can share theose values,
however we require that at any one time only one dataset has their data at this
location. Added a check during register to ensure the relative_path
is
avaliable.
- Bump database version to 3.3.0, removed
is_overwritten
,replace_date
,replace_uid
columns - Added
replaced
bit to thevalid
bitmask
The tables_required
list, when doing a query, was only build from the return
column list. This means if a filter used a table not in the returned column
list the proper join would not be made. This has been corrected.
- Added
replace()
function for datasets. This is functionally very similar toregister()
, but it allows users to overwrite previous datasets whilst keeping the same name/version/suffix/owner/ownertype combination. Documentation updated. - Datasets now have a
replace_iteration
counter and areplace_id
value which points to the dataset that replaced them. To reflect that the unique constraints now include thereplace_iteration
column. - Database version bumped to 3.2.0
- Tests now use the
property_dict
return type and first make sure that the correct number of results was found before checking the results.
- Update the
schema.yaml
file to include unique constraints and table indexes. - Update the unique constraints for the dataset table to be
owner
,owner_type
,name
,version
,version_suffix
.
When registering a dataset that is overwriting a previous dataset, don't tag
the previous datasets as valid=False
until any data copying is successful.
Add ability to tag datasets with keywords/labels to make them easier to catagorize.
- Can tag keywords when registering datasets through the Python API or CLI. Can
add keywords after registration using the
add_keywords()
method in the Python API. - Database version bumped to 3.0.0
- New table
keyword
that stores both the system and user keywords. - New table
dataset_keyword
that links keywords to datasets. - System keywords are stored in
src/dataregistry/schema/keywords.yaml
, which is used to populate thekeywords
table during database creation. - Added
datareg.Registrar.dataset.get_keywords()
function to return the list of currently registered keywords. - When the keyword table is queried, an automatic join is made with the dataset-keyword association table. So the user can query for all datasets with a given keyword, for example.
- Added keywords information to the documentation
- Can run
dregs show keywords
from CLI to display all pre-registered keywords
Separate out creation of production schema and non-production schema since, under normal circumstances, there will be a single "real" production schema (owner type == production only) but possibly multiple non-production schemas to keep track of entries for the other owner types. Add a field to the provenance table so a schema can discover the name of its associated production schema and form foreign key constraints correctly.
Bumped database version to 2.3.0. This code requires database version >= 2.3.0
- Add check during dataset registration to raise an exception if the
root_dir
does not exist - Add check before copying any data (i.e.,
old_location != None
) that the user has write permission to theroot_dir
folder.
Add ability to register "external" datasets. For example datasets that are not physically managed by the registry, or are offsite, therefore only a database entry is created.
- Database version bumped to 2.2.0
- Added
location_type
column todataset
table (can be either "onsite", "external" or "dummy"). - Added
contact_email
andurl
column todataset
table. One of these is required when registering alocation_type="external"
dataset. - Removed
is_external_link
column fromdataset
table as it is redundant. - Renamed
execution.locale
toexecution.site
in theexecution
table.
Version 0.4.0 focuses around being able to manipulate data already within the dataregistry, i.e., adding the ability to delete and modify previous datasets.
Registrar
now has a class for each table. They inherit from aBaseTable
class, this means that shared functions, like deleting entries, are available for all tables. (#92)- Working with tables via the python interface has slightly different syntax (see user changelog below). (#92)
is_valid
is removed as adataset
property. It has been replaced withstatus
which is a bitmask (bit 0="valid", bit 1= "deleted" and bit 2="archived"), so now datasets can a combination of multiple states. (#93)archive_date
,archive_path
,delete_date
,delete_uid
andmove_date
have been added as newdataset
fields. (#93)- Database version bumped to
2.0.1
(#93) dataset
entries can be deleted (see below) (#94)- The CI for the CLI is now pure Python (i.e., there is no more bash script to ingest dummy entries into the registry for testing).
- Can no longer "bump" a dataset that has a version suffix (trying to do so will raise an error). If a user wants to make a new version of a dataset with a suffix they can still do so by manually specifying the version and suffix (#97 ).
- Dataset entries can be modified (see below, #100)
- All database tables (
dataset
,execution
, etc) have a more universal syntax. The functionality is still accessed via theRegistrar
class, but now for example to register a dataset it'sRegistrar.dataset.register()
, similarly for an executionRegistrar.execution.register()
(#92). The docs and tutorials have been updated (#95). dataset
entries can now be deleted using theRegistrar.dataset.delete(dataset_id=...)
function. This will also delete the raw data within theroot_dir
. Note that the entry in the database will always remain (with an updatedstatus
field to indicate it has been deleted). (#94)- Documentation has been updated to make things a bit clearer. Now split into more focused tutorials (#95).
- Certain dataset quantities can be modified after registration (#100). Documentation has been updated with examples.