-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for sendfile() based DATA frames. #236
Comments
Another option is to have |
While we're here, we should note the caveats of
This somewhat increases the overhead and decreases some of the utility of |
Agreed. But I think the ability to send large resources from disk without buffering them in Python would be a good thing in and of itself, even if there were no speed/performance improvements. |
The other concern about |
Ugh... that is an excellent point. The danger of doing my development work in plaintext is that I sometimes forget that browers don't speak it... So, for now, such a feature would only generally be useful for people who have non-browser clients and don't want/need TLS. Still a use case that exists, but much more niche than I was originally imagining. |
Yup, agreed. I think it's still worth doing, especially as there's the possibility of AF_ALG-based sockets being used in the future which would be compatible with sendfile. |
I have to admit that h11's sendfile support is more of a cute trick than anything carefully considered. I am somewhat dubious about whether one can get all the other python overhead low enough for sendfile to actually make any difference (cf Amdahl's law), but would be very interested to hear if anyone tries it. In the mean time h11 mostly just supports sendfile because I realized it would be trivial to do :-) |
The main attraction for sendfile is zero-copy overhead (which can be gotten other ways), but yeah any crypto will imply at least one read of the memory. I'm extremely dubious that directly supporting sendfile makes sense: I'd much rather make sure we can deliver a zero-spurious-copies guarantee, that is that we won't do anything daft like take slices of the input data - and instead make sure we use memoryviews and the like right up until its handed off to TLS. |
So in a sense, this is exactly how h11's sendfile support works. Normally for convenience, Then we also make the guarantee that the only thing we do with payload buffers is call The problem for h2 is that the |
We used to be very careful about this sort of thing in Twisted, and it was one of the major regressions in Python 3 that |
(Calling it a "regression" because the |
So |
I'm postponing this to 4.0.0. |
Right now there is no support for HTTP/2 using
sendfile
to send DATA frames. This is potentially inefficient for implementations that are able to send really sizeable DATA frames. Given thatsendfile
allows users to send fixed length data, it would be nice to provide some way to say "send these bytes, then do a sendfile with this fobj and this length".@njsmith's h11 library has this capacity, but this relies on the fact that h11 has no internal buffer: each "event" call returns the bytes required for that event directly without writing into a buffer. This allows for the subtle changing of return value in comparison to
send
, which is not so naturally achievable with h2.Coming up with a good design here is a bit tricky. It may be that this should be an optional switch on the
H2Connection
class that affects the return value ofdata_to_send()
, changing it to be an iterable of bytes and sentinel objects where each sentinel object is . Alternatively, we could go further and say that not only does the optional switch need to be enabled but there is also a separate "get the data I need" function that conforms to this new API.Another possible option that leaps out to me is to have a subclass (eww) or some other type that implements this support as a wrapper around the base
H2Connection
object.The final possible option is a dramatic API change. That changes the signature of
receive_data
to return two values: the events iterable and any bytes that may need to be sent. If we do that, we can then remove hyper-h2's internal buffer and then delegate entirely to the calling code by having alldo_event()
type functions simply return the data they generate rather than storing it internally. That's a very large API break, but it allows supporting this use-case more cleanly by simply emulating what h11 does.I'd like opinions from @python-hyper/contributors. Any thoughts on API choices? Is this worth supporting at all?
The text was updated successfully, but these errors were encountered: