Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Akka.Remote Quic-based Transport (Artery) #7466

Open
2 tasks
Aaronontheweb opened this issue Jan 14, 2025 · 3 comments
Open
2 tasks

Epic: Akka.Remote Quic-based Transport (Artery) #7466

Aaronontheweb opened this issue Jan 14, 2025 · 3 comments
Milestone

Comments

@Aaronontheweb
Copy link
Member

Abstract

We have had a number of issues related to porting the Artery transport from Scala for years:

One of my big hang-ups around doing this was the state of Aeron, the reliable UDP-based transport that Artery uses for multi-plexing. Yes, eventually Aeron.NET no longer required Java which eliminated a lot of the complaints but I was still really not thrilled at the prospect of getting DotNetty'd on the transport layer again.

Enter MsQuic - Microsoft's implementation of the Quic protocol, which is available in the .NET runtime as a stable feature since .NET 9: https://learn.microsoft.com/en-us/dotnet/fundamentals/networking/quic/quic-overview

Quic solves most of the same problems that Aeron does, with the benefit of being a IETF standard (i.e. it's the underlying layer on HTTP/3.)

Approach

Now as for the subject of Artery - I don't super-duper care about following the original implementation to the letter, but there are 100% some features we need to incorporate into our V2 transport:

  1. Multi-plexing - this is where all of the significant performance gains come from. In Quic these are "streams" and you can see an example of me working with them here: https://github.com/Aaronontheweb/QuicFun
  2. Security - TLS v1.3 is required as part of Quic.
  3. IActorRef compression - Artery can represent actors as long integers in order to reduce delivery overhead. Great idea! We should do that.
  4. Message headers - have no idea of Artery supports this; don't care either. It's a good idea that will make things like version negotiation and distributed tracing much easier to implement.
  5. Dedicated channel for large messages - large messages get moved into their own separate "fat boy" Stream and chunked in order to avoid head-of-line blocking. This will eliminate the need for tools like https://github.com/petabridge/Akka.Cluster.Chunking/
  6. Dedicated channel for exchanging messages even while quarantined - this should make things like our Split Brain Resolvers work a lot better in partially connected scenarios. When we finally add multi-DC clustering, this will be an important tool.
  7. Configurable number of channels - each channel correlates to a Quic stream. More channels = more parallelism with potentially higher latency.
  8. Akka.Streams-based backpressure support - to stop the transports from getting flooded when the buffers are full, we can use Akka.Streams for this.

Requirements

  1. The classic Akka.Remote TCP-based transports should still be workable, and they should also work with the new serialization layer we've proposed in v1.6: Epic: Source-Generated Serialization #7465
  2. The Quic-based transports might require .NET 8 or 9. That might influence the version of .NET we target - this feature probably won't be workable on .NET Framework.
  3. Performance target should be 2-3m protobuf serialized messages per second between two nodes. Number of channels and size of message to be determined later.

My one worry with going down this route is that we're going to find some major performance issues with Akka.Streams and its materializer, which might drag out this project. Hopefully not but that's something we'll have to be prepared to deal with if it comes up.

Describe alternatives you've considered

There is a TCP-based Artery implementation but I think it has all the same problems the classic remoting system does, sans some of the performance improvements in the serialization and compression layer too. We have an open issue for tracking that but I'm not keen to do it for v1.6: #4007 - we should prioritize the Quic implementation.

Implementation

  • Chose supported runtime versions of Akka.NET v1.6
  • Create protocol write-up (how is this thing going to form associations) ?

This task list is going to expand after we have a write-up for the new remoting protocol - no point it doing it preemptively..

@Aaronontheweb Aaronontheweb added this to the 1.6.0 milestone Jan 14, 2025
@Aaronontheweb Aaronontheweb moved this to Backlog in Akka.NET v1.6 Jan 14, 2025
@Aaronontheweb Aaronontheweb changed the title Epic: AKka.Remote Quic-based Transport (Artery) Epic: Akka.Remote Quic-based Transport (Artery) Jan 14, 2025
@to11mtm
Copy link
Member

to11mtm commented Jan 16, 2025

My draft got eaten.

Second try...

Configurable number of channels - each channel correlates to a Quic stream. More channels = more parallelism with potentially higher latency.

Dedicated channel for large messages - large messages get moved into their own separate "fat boy" Stream and chunked in order to avoid head-of-line blocking.

I'll start by noting that, my general understanding of how Artery worked, was that 'Dedicated channel for large messages' was based on paths (IIRC wildcards were allowed.) Here's the notes from pekko reference.conf.

Architecturally, that is for one big reason; this will still guarantee ordering from actor->actor (without requiring a lot of magic/overhead elsewhere).

The more subtle reason, for better or worse, is that to an extent, users often write code that, for lack of better verbiage, 'works the way erlang does with remoting'. That is, Erlang, Classic Akka/Pekko Remoting, and Akka/Pekko Artery without large actor paths set, to my understanding have a 'node->node' ordering guarantee. That is, you know the order of things queued to transport, will go in that order. But, how many people -do- know and realize that a formerly-implicit guarantee might no longer be there? (I've seen code inadvertently dependent on this more than once in a similar context.)

ALL OF THAT SAID I think it would be great to allow many channels, either 'configurable' (i.e. just take the large message channel concept and make it N instead of 1 channels)... or if we can make an API to safely recombine... that would be awesome.. (Or, other ideas to this.)

Message headers - have no idea of Artery supports this; don't care either. It's a good idea that will make things like version negotiation and distributed tracing much easier to implement.

Last I knew it didn't... But I think it's a great idea.

(At my own peril, I will suggest messagepack or even CBOR over protobuf, mostly because Messagepack is somewhat easier to read on some level, and AFAIK the minor compactness you can get on proto, you lose on varint128 parsing as well as the custom converters for stuff not represented in .NET. I'll also note MP3 has full sourcegen support now...)

There is a TCP-based Artery implementation but I think it has all the same problems the classic remoting system does,

Not to my knowledge, or if it did the serialization/compression were the big wins. When Aeron Artery was doing 750k msg/sec, TCP Artery managed 650kmsg/sec.

That said I would suggest ensuring that being able to plug in TCP (or, say, WebSockets, if only as an example of another simple abstraction) should be a good benchmark for the overall API. i.e. if users are in a network where Quic just won't work because of firewalls/etc (I can't remember how robust it is compared to, say, SignalR when it comes to renegotiation,) we want to make sure things are reasonably 'pluggable' for those cases.

Also FWIW I know lots of the high-perf protocols I've seen of late, seem to prefer System.IO.Pipelines with it's availability. I think it simplifies things but I must note I've tried some simple cases 'both ways' (i.e. IO.Pipelines vs Akka Streams) and the perf difference was... mostly tradeoffs. OTOH IO.Pipelines has some potential better use cases long term (i.e. maybe easier shimming of paradigms for transport)... I think community input would be nice here...

Security - TLS v1.3 is required as part of Quic.

Does the existing API set for Quic support 'rotation'? My understanding is that Artery had some capability to use multiple PEMs via SSLEngine to handle rotating SSL Keys (i.e. rolling deploy for a key change). Definitely not a launch requirement but should likely be considered for future in design.

@Aaronontheweb
Copy link
Member Author

That said I would suggest ensuring that being able to plug in TCP (or, say, WebSockets, if only as an example of another simple abstraction) should be a good benchmark for the overall API. i.e. if users are in a network where Quic just won't work because of firewalls/etc (I can't remember how robust it is compared to, say, SignalR when it comes to renegotiation,) we want to make sure things are reasonably 'pluggable' for those cases.

QUIC is just UDP, so I don't think it'll be an issue in live environments. Also worth bearing in mind that HTTP/3 will be totally inoperable if QUIC is blocked so I think we can piggy-back off of that too.

@Aaronontheweb
Copy link
Member Author

But you raise great points though @to11mtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

2 participants