-
-
Notifications
You must be signed in to change notification settings - Fork 43
Home
Storage.Net defines two different storage types:
- Blob Storage is used to store arbitrary files of any size, that do not have any structure. The data is essentially a binary file. Examples of a blog storage is Azure Blob Storage, Amazon S3, local folder etc.
- Messaging is an asynchronous mechanism to send and receive messages between disconnected systems. For instance MSMQ, Azure Service Bus, Amazon Simple Queue etc.
Some effort has been made to document the supported storage options, you are welcome to contribute to documentation, or browse the following sections:
Blob Storage stores files. A file has only two properties - ID
and raw data. If you build an analogy with disk filesystem, file ID is a file name.
Blob Storage is really simple abstraction - you read or write file data by it's ID, nothing else.
The entry point to a blog storage is IBlobStorage interface. This interface is small but contains all possible methods to work with blobs, such as uploading and downloading data, listing storage contents, deleting files etc. The interface is kept small so that new storage providers can be added easily, without implementing a plethora of interface methods.
In addition to this interface, there are plency of extension methods which enrich the functionality, therefore you will see more methods than this interface actually declares. They add a lot of useful and functionality rich methods to work with storage. For instance, IBlobStorage
upload functionality only works with streams, however extension methods allows you to upload text, stream, file or even a class as a blob. Extension methods are also provider agnostic, therefore all the rich functionality just works and doesn't have to be reimplemented in underlying data provider.
All the storage implementations can be created either directly or using factory methods available in the Storage.Net.StorageFactory.Blobs
class. More methods appear in that class as you reference a NuGet package containing specific implementations, however there are a few built-in implementations available out of the box as well. After referencing an appropriate package from NuGet you can call to a storage factory to create a respective storage implementation:
You can also use connection strings to create blob storage instances. Connection strings are often useful if you want to completely abstract yourself from the underlying implementation. Please read the appropriate implementation details for connection string details. For instance, to create an instance of Azure Blob Storage provider you could write:
IBlobStorage storage = StorageFactory.Blobs.FromConnectionString("azure.blobs://...parameters...");
In this example we create a blob storage implementation which happens to be Microsoft Azure blob storage. The project is referencing an appropriate nuget package. As blob storage methods promote streaming we create a MemoryStream
over a string for simplicity sake. In your case the actual stream can come from a variety of sources.
using Storage.Net;
using Storage.Net.Blobs;
using System.IO;
using System.Text;
namespace Scenario
{
public class DocumentationScenarios
{
public async Task RunAsync()
{
//create the storage using a factory method
IBlobStorage storage = StorageFactory.Blobs.AzureBlobStorageWithSharedKey(
"storage name",
"storage key");
//upload it
string content = "test content";
using (var s = new MemoryStream(Encoding.UTF8.GetBytes(content)))
{
await storage.WriteAsync("mycontainer/someid", s);
}
//read back
using (var s = new MemoryStream())
{
using (Stream ss = await storage.OpenReadAsync("mycontainer/someid"))
{
await ss.CopyToAsync(s);
//content is now "test content"
content = Encoding.UTF8.GetString(s.ToArray());
}
}
}
}
}
This is really simple, right? However, the code looks really long and boring. If I need to just save and read a string why the hell do I need to dance around with streams? That was examply my point when trying to use external SDKs. Why do we need to work in an ugly way if all we want to do is something simple? Therefore with Storage.Net you can decrease this code to just two lines of code:
public async Task BlobStorage_sample2()
{
IBlobStorage storage = StorageFactory.Blobs.AzureBlobStorageWithSharedKey(
"storage name",
"storage key");
//upload it
await storage.WriteTextAsync("mycontainer/someid", "test content");
//read back
string content = await storage.ReadTextAsync("mycontainer/someid");
}
You can find the list of supported blob storage implementations here.
The basic architecture of blobs is depicted in the following diagram:
All of the core methods are defined in the IBlobStorage
interface. This is the interface that's enough for a new storage provider to implement in order to add a new storage provider.
However, some providers support more than just basic operations, for instance Azure Blob Storage supports blob leasing, shared access signatures etc., therefore it actually implements IAzureBlobStorage
interface that in turn implements IBlobStorage
interface, and extends the functionality further. Same goes for AWS S3 and others.
However, when you are browsing IBlobStorage
interface, intellisesnse will shows you a plethora of methods that are not there. This is because there are plenty of extension methods defined for it. Extension methods add extra useful stuff, such as ability to write/read strings, JSON objects and so on, but they in turn use only methods from IBlobStorage
. The decision to split those methods into extension methods was because that is logical functionality not dependent on any underlying implementation. Also implementing new storage providers is much easier, as you only have to implement a subset of methods.
Transform sinks is another awesome feature of Storage.Net that works across all the storage providers. Transform sinks allow you to transform data stream for both upload and download to somehow transform the underlying stream of data. Examples of transform sinks would be gzipping data transparently, encrypting it, and so on.
Let's say you would like to gzip all of the files that you upload/download to a storage. You can do that in the following way:
IBlobStorage myGzippedStorage = StorageFactory.Blobs
.AzureBlobStorageWithSharedKey("name", "key")
.WithGzipCompression();
Then use the storage as you would before - all the data is compressed as you write it (with any WriteXXX
method) and decompressed as you read it (with any ReadXXX
method).
For more details on sinks available and how to implement your own sink, read this page.
Due to the nature of the transforms, they can change both the underlying data, and stream size, therefore there is an issue with storage providers, as they need to know beforehand the size of the blob you are uploading. The matter becomes more complicated when some implementations need to calculate other statistics of the data before uploading i.e. hash, CRC and so on. Therefore the only reliable way to stream transformed data is to actually perform all of the transofrms, and then upload it. In this implementation, Storage.Net uses in-memory transforms to achieve this, however does it extremely efficiently by using Microsoft.IO.RecyclableMemoryStream package that performs memory pooling and reclaiming for you so that you don't need to worry about software slowdows. You can read more about this technique here.
This also means that today a transform sink can upload a stream only as large as the amount of RAM available on your machine. I am, however, thinking of ways to go further than that, and there are some beta implementations available that might see the light soon.
Messaging is inteded for message passing between one or more systems in disconnected fashion. You can send a message somewhere and current or remote system picks it up for processing later when required. This paradigm somehow fits into CQRS and Message Passing architectural ideas.
To name a few examples, Apache Kafka, RabbitMQ, Azure Service Bus are all falling into this category - essentially they are designed to pass messages. Some systems are more advanced to others of course, but most often it doesn't really matter.
Storage.Net supports many messaging providers out of the box, including Azure Service Bus Topics and Queues, Azure Event Hub and others.
There are two abstractions available - message publisher and message receiver. As the name stands, one is publishing messages, and another is receiving them on another end.
To publish messages you will usually construct an instance of IMessagePublisher
with an appropriate implementation. All the available implementations can be created using factory methods in the Storage.Net.StorageFactory.Messages
class. More methods appear in that class as you reference an assembly containing specific implementations.
Similarly, to receive messages you can use factory methods to create receivers which all implement IMessageReceiver
interface.
The primary method of this interface
Task StartMessagePumpAsync(
Func<IEnumerable<QueueMessage>, Task> onMessageAsync,
int maxBatchSize = 1,
CancellationToken cancellationToken = default);
starts a message pump that listens for incoming queue messages and calls Func<IEnumerable<QueueMessage>, Task>
as a call back to pass those messages to your code.
maxBatchSize
is a number specifying how many messages you are ready to handle at once in your callback. Choose this number carefully as specifying number too low will result in slower message processing, whereas number too large will increase RAM requirements for your software.
cancellationToken
is used to signal the message pump to stop. Not passing any parameter there will result in never stopping message pump. See example below in Use Cases for a pattern on how to use this parameter.
You can find the list of supported messaging implementations here.
Storage.Net provides built-in capability to handle large message content by allowing you to offload message content over a certain threshold to an external blob storage. It works in the following way:
- Check that message content is larger than
threshold value
. - If not, do the usual processing.
- If it is, upload message content as a blob to external storage, clear message content and add a custom header
x-sn-large
that points to the blob containing message content.
When receiving messages, it will check that x-sn-large
header is present, and if so, will download blob, set it's content as message content, and return the message to the receiver.
Blob is deleted from the blob storage only when message is confirmed by the receiver.
Large message handling works on any supported queue implementation because it's implemented in the core library itself, outside of specific queue implementation. To enable it, call .HandleLargeContent
on both publisher and receiver:
IBlobStorage offloadStorage = ...; // your blob storage for offloading content
IMessagePublisher publisher = StorageFactory.Messages
.XXXPublisher(...)
.HandleLargeContent(offloadStorage, thresholdValue);
IMessageReceiver receiver = StorageFactory.Messages
.XXXReceiver(...)
.HandleLargeContent(offloadStorage);
QueueMessage
class itself is not a serialisable entity when we talk about JSON or built-in .NET binary serialisation due to the fact it is a functionally rich structure. However, you might want to transfer the whole QueueMessage
across the wire sometimes. For these purposes you can use built-in binary methods:
var qm = new QueueMessage("id", "content");
qm.DequeueCount = 4;
qm.Properties.Add("key", "value");
byte[] wireData = qm.ToByteArray();
//transfer the bytes
QueueMessage receivedMessage = QueueMessage.FromByteArray(wireData);
These methods make sure that all of the message data is preserved, and also are backward compatible between any changes to this class.