Releases: man-group/ArcticDB
v4.0.1
v1.6.2
v4.0.0
⚠️ API changes
For Library.get_description_batch
, Library.read_metadata_batch
and Library.write_batch
, a DataError
object will now be returned in the position in the list returned corresponding to the symbol/version pair there was an issue reading/writing. Note this may require code changes to support the new error handling behaviour - as a result it is being considered a breaking change as described above.
- get description batch method: method rationalisation (#814)
- read metadata batch method: method rationalisation (#814)
- Write batch method: method rationalisation (#814)
🚀 Features
- Pandas 2.0 support (#343) (#540) (#804) (#846)
- Modifications have been made to the normalisation and denormalisation processes for
pandas.Series
andpandas.DataFrame
to match the new defaults in pandas 2.0. - Handling of 0-row DataFrames for improved correctness and usability.
- Empty Column are now properly handled, especially regarding the change of defaults for empty collections for Pandas 1.X and Pandas 2.X.
- Extended the tests to reflect changes in behaviour due to pandas 2.0's new defaults.
- Please note, PyArrow remains unsupported in this integration.
- Modifications have been made to the normalisation and denormalisation processes for
- conda-build: Bring support for Azure Blob Storage (#840) (#854) (#853) (#857)
- Add uri support for mongodb (#761)
- Code coverage analysis and report workflow (#783) (#784)
- Add documentation with doxygen (#736)
🐛 Fixes
- Update support status: Pandas DataFrame and Series backed by PyArrow are not supported (#882)
- Added pymongo to the list of installation dependencies (#891)
- Resolved dependency issues for the mergeability check step (#822)
- Fixed issue where AWS authentication wasn't used, even though the option was enabled (#843)
- Resolved issue of early read termination in 'has_symbol' (#836)
- Test: Ensured that
QueryBuilder
is pickleable with all possible clauses (#861) - Fixed issue with the 'latest_only' option for the 'list_versions' method (#839)
- Added the ability for users to specify LMDB map size in the Arctic URI (#811)
- Fixed issue 767: Segfault in batch write with string columns has been resolved (#827)(874)
- Renamed ArcticNativeNotYetImplemented in a way that maintains backward compatibility, to fix issue #774 (#821)
- Modified Azure SDK to favour winhttp over libcurl on Windows for improved SSL verification (#851)
- Updated the maximum batch size for Azure operations (#878)
Uncategorized
- Maintenance: Added a minimal Security Policy (#823)
- Fixed documentation following an exception renaming (#824)
- Resolved issues in the publish step (#825)
- Added documentation for setting LMDB map size (#826)
- Incorporated notebooks into the documentation (#844)
- Maintenance: Removed unused definitions from protocol buffers (#856)
- Enhanced error handling to fail document build on Sphinx errors (#883)
- Maintenance: Replaced deprecated ZSTD_getDecompressedSize function (#855)
- Refactored non-functional library manager, addressing Issue #812 (#828)
- Made minor improvements to the documentation (#841)
- Improved handling of the deprecated S3 option "force_uri_lib_config" (#833)
- Corrected the release date of version 3.0.0 in README.md (#858)
The wheels are on Pypi. Below are for debugging:
v3.0.0
🔒 Security + Forwards Incompatible Change
- S3 and Azure: Do not save sensitive or ephemeral config in the config library (#803)
This fixes a security issue with ArcticDB where creds were kept in storage for:
- Azure
- AWS if the access keys are supplied in the URI instead of aws_auth=True.
These instructions explain how to upgrade your storage to remove the credentials. See also issue #802 .
Compatibility matrix
Storage | Library created with < v3. Library accessed with >= v3. |
Library created with or upgraded to >= v3. Library accessed with < v3. |
---|---|---|
S3 with aws_auth=True |
Continues to work |
Raises InternalException: E_INVALID_ARGUMENT S3 Endpoint must be specified .Will work again if access=_RBAC_&secret=_RBAC_&force_uri_lib_config=true is in the URI passed to Arctic() |
S3 with access and secret . |
Will now use the creds passed to A future release might print a warning with instructions to upgrade. |
Raises InternalException: E_INVALID_ARGUMENT S3 Endpoint must be specified .Will work if force_uri_lib_config=true is in the URI passed to Arctic() |
Azure | Operations on the library will fail with various internal error messages |
Full details:
What's happened?
Whilst reviewing our codebase we discovered a way that access-keys for ArcticDB storage backends could be saved into the storage in clear text.
This behavior was by design, but there is a chance that this has happened for some third-party users without being obvious.
This depends on the backend used and how you connect to the storage.
What is the exact scope of the issue?
If you created an ArcticDB library, either with an S3 bucket and passed the access-keys as part of the URI, or with Azure Blob Storage with the access-keys as part of the connection-string, then the credentials were saved into the storage account as part of the ArcticDB library config.
If you then shared that storage account with others using different roles or access-keys, then those users would in theory have been able to access the credentials used to create the library.
What have you done to address this?
We've updated ArcticDB so that all new libraries do not do this, even if the credentials are passed in with the URI/connection-string.
We've prepared a storage-update script which you can run to see if the credentials are there, and then remove them if they are.
What is the impact if I am affected?
If you have shared that storage account with anyone else using different roles/credentials, then your original credentials have also been accessible to those users.
It's possible those users recorded the credentials, and because those credentials must have had write-access to create the library, they could have made changes to the data or otherwise used those credentials.
What can I do to check if I'm affected?
See these instructions.
If needed you can check on previous versions of ArcticDB using the code referenced on github:
#802 (comment)
What should I do if I am affected?
Follow these instructions.
This change is not forwards compatible, so users on earlier clients may need to upgrade:
- S3 libraries created with 3.0.0 will not be readable by earlier ArcticDB versions unless force_uri_lib_config=True in their connection string.
- Azure libraries created with 3.0.0 will not be readable by earlier ArcticDB versions.
Then,
- Rotate your credentials.
- If you've shared access to that storage account then please also check the integrity of your data and anything else accessible via those credentials.
What was the cause?
Previous use cases of ArcticDB had split storage accounts. One account was used to configure libraries and other accounts held the data for those libraries. Credentials to read those data-libraries were then stored into the configuration account and passed to users as needed for access to the data. This code was not caught during our review, and so was not disabled or removed when we made ArcticDB available to others. When we added Azure Blob storage support subsequently, the side-effect of saving anything in the connection-string to storage was not anticipated.
Having reviewed the codebase again we are confident that this was the only way that credentials could be saved into storage using our public API.
We plan to continue supporting our split storage solution for some users, but it should always be very clear when access-keys are being stored and what the risks are for that.
🚀 Features
- Conda-forge build now supports Azure Blob Storage
- Enhancement/728/make iclause responsible for processing structure (#752)
- Add more info in the CI readme; Prepare var for real storage tests (#663)
- Enhancement 702: Add option to create library if it does not exist when calling get_library (#775)
- Enhancement 714: Expose library methods to list symbols with staged data, and to delete staged data (#778)
- Enhancement 737: Support empty-type columns in QueryBuilder operations (#794)
- conda-build: Adapt C++ test suite for Linux (#713)
🐛 Fixes
- conda-build: Use default compilers for macOS (#662)
- Bugfix/nativeversionstore write metadata batch should never return dataerror objects (#782)
- Add handling of unspecified ca path in azure uri (#771)
- Add dep. on packaging (#795)
- Fix get_num_rows for NativeVersionStore (#800)
Uncategorized
- First version of AWS S3 setup guide (#708)
- fix(docs): central docs URL from API docs homepage (#755)
- Add none type (#646)
- Azure getting started guide (#749)
- Docs fixes (#762)
- Decouple storage headers from implementations & storage.hpp (#763)
- Bugfix 554: Remove unused argument from write_batch (#769)
- Partially revert #763 for consistency (#766)
- Make it clear to not commit directly to ArcticDB feedstock but use PRs instead (#741)
- maint: pandas 2.0 forward compatible changes (#540)
- test: Test the absence of implace modification on datetime64 normalization for pandas 2.0 (#801)
- Update README.md (#799)
- test: Remove test for fallback to pickle (#805)
- Docs - update release number (#816)
- conda-build: Pin cmake (#815)
- Update releasing.md (#817)
- ArcticDB 3.0.0 update BSL table (#820)
v2.0.0
This version contains breaking changes to the ArcticDB API. As per the SemVer versioning scheme, we have bumped the major version.
⚠️ API changes
-
Write batch metadata method: method rationalisation (#476)
-
Append batch metadata method: method rationalisation (#548)
For Library.write_metadata_batch and Library.append_batch, a DataError object will now be returned in the position in the list returned corresponding to the symbol/version pair there was an issue writing. Note this may require code changes to support the new error handling behaviour - as a result it is being considered a breaking change as described above.
See the docs for read_batch
, which uses the same exception return mechanism.
- (Minor) The internal protobuf field
arcticc.pb2.descriptors_pb2.TypeDescriptor.MICROS_UTC
has changed name toNANOSECONDS_UTC
. This is only visible via theArctic
API as a string viaget_description
&get_description_batch
ondtype
attributes, so external users will only be affected by this if you are parsing these strings.
🚀 Features
- Projections, group-by, and aggregations added to the processing framework (#712)
- Reduce memory footprint of head and tail methods (#583)
- Per symbol parallelisation for write batch metadata method (#476)
- This can result in significant performance improvements when using this method over many symbols.
- Per symbol parallelisation for append batch metadata method (#548)
- This can result in significant performance improvements when using this method over many symbols.
🐛 Fixes
- Ensure content hash is copied during restore version + fixing timestamp-uniqueness-related flaky tests (#600)
- Restrict supported string types to type equality rather than has isinstance (#704)
- Incorrect initialisation of LoadParameter::load_from_time_ (#697)
- Ensure compact_incomplete and recursive normalization obey the library setting for pruning (#705)
Uncategorized
- Update release process to detail the process for pre-releases (#688)
- Unify release and pre-release hotfixing (#725)
- Skip test_diff_long_stream_descriptor_mismatch on MacOS (#693)
- maint: Remove
VariantStorage
(#695) - maint: Rename
datetime64[ns]
-related fields and datatypes (#592) - run C++ tests for conda build / ci (#486)
The wheels are on Pypi. Below are for debugging:
v1.6.1
🐛 Fixes
- Add a more strict check for chars in the symbol names (#627)
- Fix
as_of
with timestamp reading entire version chain rather than just reading up-to the required version (#596)as_of=<timestamp>
reads will be significantly faster for symbols with many versions
- Fix to ensure batch prune previous methods clean up index and data keys as well as version keys (#623)
- Only log ErrorCategory::INTERNAL errors (#676)
- Enable importing DataError from arcticdb (#657)
- Refactor underlying segment write scheduling (#532)
- This can result in a significant performance improvement for large writes
Uncategorized
The wheels are on Pypi. Below are for debugging:
v1.6.0
⚠️ API Changes
- Modify read_batch to return a
DataError
object rather than raising an exception (#629)
For Library.read_batch, a DataError
object will now be returned in the position in the list returned corresponding to the symbol/version pair there was an issue retrieving. This contains the symbol and version requested, and the exception string thrown. For two well-defined categories of error, the error category and specific error code are also included:
ErrorCategory.MISSING_DATA - ErrorCode.E_NO_SUCH_VERSION: The version requested by the user does not exist
ErrorCategory.STORAGE - ErrorCode.E_KEY_NOT_FOUND: At least one of the keys required to read the specified version does not exist in the storage.
Otherwise these fields of the DataError object are left as None.
See also the API docs.
- Accessing a Library that does not exist now throws an
arcticdb.exceptions.LibraryNotFound
rather than an internal exception.
🚀 Features
- Support negative integers in
as_of
(#589)
This PR enables negative indexing to select "the last-but-nth version". This is equivalent to negative indexing in Python. See API docs.
For example:
read(...as_of=0...) will select the first version
read(...as_of=-1...) will select the most recent version
read(...as_of=-2...) will select the version prior to the most recent version
🐛 Fixes
Various fixes to make our use of LMDB more correct. These only affect LMDB-backed Arctic instances (not S3). These should resolve the segfaults users have been experiencing with LMDB.
- Only open library once in the interpreter lifetime (#585)
- Fix for LMDB DBI Handling (#597)
- Delete relevant part of LMDB tree upon library delete (#601)
Also,
- Fix for write batch with dedup (#595)
- Fix for read batch when as of is TimeStamp (#617)
- Fix numeric isin filtering for some cases with a mix of signed and unsigned integers (#604)
Uncategorized
- Update demo notebook (#570)
- fix: Remove uneeded
FMT_COMPILE
(#578) - build: Use Pandas 2.0 forward compatible API (#582)
- Make batch tests much faster (#586)
- Remove tests skips in test_storage_lock.cpp for Windows (#550)
- Update python line in issue-template so it works on windows (#608)
- Update faq.md (#616)
- Windows is Beta (#610)
- Rename lib to be consistent - lib vs library (#622)
- Update docs to point how to use AWS_PROFILE (#619)
- maint: Remove
VariantStorageFactory
and its implementations (#625) - docs: Add development guidelines for testing combinations (#644)
- conda-build: Pin pybind11 to < 2.11 (#647)
- Add checklist to pull request template (#643)
The wheels are on Pypi. Below are for debugging:
v1.5.0
🚀 Features
- ☁️ ArcticDB now supports Azure Blob Storage! (#427 #464) ☁️.
- Note: This does functionality is not yet available in the Conda release of ArcticDB - it is only available in the PyPI release.
- Performance improvements for:
write_batch
(#467)
- Added optional library encoding option (#401)
- Add environment variable controlling whether AWS S3 should verify SSL certificates (
S3Storage.VerifySSL
). This defaults to1
- set to0
to disable. (#553) - Specify compilations optimizations for Windows build (#543)
- Improve documentation for Arctic 1.0 -> ArcticDB migration (#546)
- Add test for filtering down the string pool with Nones and NaNs (#533)
- ArcticDB now supports logging to a file for log messages (#573)
🐛 Fixes
- Rename write_batch_pickle to write_pickle_batch. Note that this is a breaking API change. (#516)
- Fix compilation errors issued by clang-16. (#542)
- Fix for read batch when filtering by column. Previously this would raise an error - it now works. (#567)
- Suppress Pandas warning about nanoseconds being discarded (#571)
The wheels are on Pypi. Below are for debugging:
v1.4.1
🚀 Features
- Significant performance improvements for
get_description_batch
, exploiting per-symbol parallelism. Bench-marking suggests an order of magnitude improvement for 1000 symbols.
🐛 Fixes
get_description_batch
datetimes now include UTC tzinfo (fixing the batch equivalent of issue #197).
The wheels are on Pypi. Below are for debugging:
v1.4.0
📣 Notices 📣
1.4.0 has changed the default value of prune_previous_versions
.
Prior to 1.4.0, if you did not pass a specific value into prune_previous_versions
, prior versions would have been removed after successful completion of the write
, update
or append
operation. By changing the default value, previous versions will now be kept by default.
To maintain the behaviour of previous releases, pass prune_previous_versions=True
or manually call prune_previous_versions
.
🚀 Features
- Support an option to allow Arctic to override storage endpoint and credentials. Useful if replicating a bucket containing existing ArcticDB libraries to another region. (#502)
Other Changes:
- If
list_symbols
is expected to be slow (no recent cache), a warning will be printed. (#489) - Major refactor to the analytical engine of ArcticDB (#471)
prune_previous_versions
has been set toFalse
by default (#485)- Fix logging levels not being configurable from env var (#490)
- Add UTC timezone info to dates returned by SymbolDescription::get_description (#480)
- Use RFC 3986 URL encoding for S3 interactions (#503)
Uncategorized
- Demo notebook (#477)
- conda-build: Use fmt < 10 (#479)
- Add guide on how to release ArcticDB to PyPi and conda-forge (#478)
- Try removing enum-compat (#491)
- Address post merging comments from prune_previous_versions set defaul… (#487)
- Add requirements file (#501)
- Remove reference to Arcticc and fix file config (#508)
The wheels are on Pypi. Below are for debugging:
Full Changelog: v1.3.0...v1.4.0