Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a command to get response body #856

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

OrKoN
Copy link
Contributor

@OrKoN OrKoN commented Jan 13, 2025

Closes #747


Preview | Diff

@OrKoN OrKoN force-pushed the orkon/get-response-body branch from 24e2448 to 1ea00c4 Compare January 13, 2025 14:30
@OrKoN OrKoN force-pushed the orkon/get-response-body branch from 1ea00c4 to 9129bae Compare January 13, 2025 14:37
@jgraham
Copy link
Member

jgraham commented Jan 13, 2025

CC @juliandescottes who was also going to look at this. Very briefly, some high level things I think we should try to look at:

  • What's the lifecycle? How long should bodies be stored? I think having the lifecycle be implementation-defined is bad because we'll inevitably get interop problems where one browser keeps bodies for longer than another.
  • How does this work with request interception. We'd eventually like to be able to rewrite bodies as part of the interception API. As with network events it would be good to have a consistent model here rather than two unrelated sets of commands.

A question is whether requiring an interception is acceptable. If it is one could add a returnBody: "none" / "string" / "handle" in network.continueRequest or network.continueResponse, and if you provide that, you get an extra network.bodyReady event that in the case of a string occurs when the full body is known, or in the case of a handle is immediate (or maybe it becomes a property on some existing event), and (for strings) you get read-once semantics i.e. the implementation is expected to cache the body until it's read or the page is navigated, but to expire it after a read.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 13, 2025

requiring an interception is acceptable

Puppeteer allows getting bodies without interception so I do not think requiring an interception would be acceptable.

What's the lifecycle? How long should bodies be stored?

Chrome allows configuring limits https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-enable so we could have something similar too. Probably, clearing on a new-document navigation would make sense but I will need do some testing.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 13, 2025

How does this work with request interception.

I think we will be able to use the same command but if the request is paused on the responseStarted phase we have diferent steps to fetch the body.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 13, 2025

I think having the lifecycle be implementation-defined is bad because we'll inevitably get interop problems where one browser keeps bodies for longer than another.

I think it would be a good starting point to start with implementation-defined lifecycle and resolve the arising interop issues later as long as they do not require structural changes as there could be many edge cases and differences on the network stack that might not be easily unifiable (unless we just unconditionally store request bodies even if never requested which is not very efficient). The following questions need to be considered for the lifecycle:

  • is the worker/worklet that served the response still alive to provide blob data?
  • is the process hosting the response data still alive?
  • was the response body evicted for other reasons (memory limits)?

@OrKoN OrKoN requested a review from sadym-chromium January 13, 2025 16:33
@jgraham
Copy link
Member

jgraham commented Jan 14, 2025

there could be many edge cases and differences on the network stack that might not be easily unifiable

These are almost all resolvable, depending on the model. For example if instead of having a "getResponseBody" command, we adopted a model like network request interception where you can subscribe to get the body for some/all responses, and instead had a "responseBodyReady" event for matching requests. Indeed you could probably rather directly reuse the existing infrastructure and make it another network interception phase (although we'd need to work out how to specify which kind of body you want in the case where we support both strings and stream handles). Puppeteer in that case would need to subscribe to all bodies and manage the lifecycle itself, which isn't ideal in the short term, but you could probably move to a more efficient client side API that made storing the bodies opt-in.

Anyway, I'm not specifically advocating that solution, just saying that there clearly are options that avoid making the lifecycle totally implementation defined, and I think we should explore those because the other option is that we offer an unreliable user experience and/or end up needing to do significant platform work to align on the most permissive implementation.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 14, 2025

Puppeteer in that case would need to subscribe to all bodies and manage the lifecycle itself, which isn't ideal in the short term, but you could probably move to a more efficient client side API that made storing the bodies opt-in.

I do not think opt-in into all bodies would work for Puppeteer or Playwright. Interception has overhead and in many cases like the har generation firefox-devtools/bidi-har-export#22 the interception is not on. Although I think an ability to opt-in into all bodies would work for har I think an ability to lazily fetch the body after the fact without interception is an important feature.

@juliandescottes
Copy link
Contributor

A question is whether requiring an interception is acceptable.

As @OrKoN mentioned, for generating HAR files during performance tests, requiring interception would probably impact the network performance (unless we could define some interception rules that are automatically applied without having to send a command to resume the request), so it sounds difficult to combine the two approaches. An interception-like feature which effectively blocks requests can't be the only way to retrieve response bodies.

If we make this a command - as in this PR - then consumers can decide to get only the responses they are interested in, but need to request it before the content is unavailable (which brings questions about the lifecycle). For the example of HAR generation for perf tests, it also means the client (eg browsertime) keeps track of all network events collected, and at the end sends commands to get all response bodies. But that seems to already be what they are doing for Chrome on browsertime's side so it's probably fine as a pattern.

Alternatively we could make it a separate event. Then there is no issue with the lifecycle, but the only granularity for users is whether or not they want to receive response bodies. That might be slightly more consistent with an interception API. We could imagine 2 events: network.responseBodyStreamOpened / network.responseBodyStreamClosed, where the first one also corresponds to a new interception phase. The second event would not map to an interception phase but it would contain the response body. Again the issue with this is the granularity of the response bodies you will receive... Unless we have a way to restrict it to specific patterns? (which I know looks like interception, but it seems wrong to me to tie it to interception when we know we can't block all requests just to get the body)

@jgraham
Copy link
Member

jgraham commented Jan 14, 2025

Yes, sorry, in the previous comment I didn't mean to imply that you'd have to explicitly continue the request, just that there would be a way to enable an extra non-blocking lifecycle event containing the response body for certain URLs.

Functionally this is identical to network.addIntercept, but it indeed might be confusing to reuse the command given that the semantics would be different.

I agree that perf monitoring is a case where adding additional blocking is unacceptable.

So basically the design I was imagining is similar to @juliandescottes' final paragraph.

@jgraham
Copy link
Member

jgraham commented Jan 14, 2025

I think an ability to lazily fetch the body after the fact without interception is an important feature.

Then I think you have to define the lifecycle over which bodies are expected to be available. "Don't expose GC behaviour" is a fundamental design principle for the web platform, and whilst I often think that we can have slightly different constraints in automation, this is a case where I think "don't expose the memory management strategy of the implementation" is a principle we should work from, for basically the same reasons.

I also think that forcing the implementation to cache responses for a long or unbounded amount of time (e.g. until navigation) is very problematic; it seems likely that this will show up in tests as unexpected memory growth that doesn't replicate in non-automation scenarios.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 14, 2025

I am not sure I fully understand the proposal but it sounds it would be similar to just calling the command to get the body at responseStarted phase (and its response being the responseBodyStreamClosed event)? I think that from the lifecycle perspective if the request is not blocked there is still no guarantee that it would not be cleaned up and not saved by the implementation. Or do you propose that the client should know ahead of time what URLs of requests it needs bodies of look like? Should that also include the max body size to be emitted by events? I think it still does not really help with the lazy body fetching situation, e.g., what if you want to inspect the body of the slowest request?

@jgraham
Copy link
Member

jgraham commented Jan 14, 2025

The proposal would be something like a network.enableResponseBodies command with parameters like

{
  ? contexts: [+browsingContext.BrowsingContext],
  ? urlPatterns: [*network.UrlPattern],
  ? type: "string" .default "string" // Extensibility point for allowing a handle later on
  ?maxSize: js-int .default -1 // Allow opting out of large bodies
}

If a specific response matches a response body filter added in this way then there would be an additional event network.responseBody with a string containing the response data (possibly base64 encoded), once it's ready. Alternatively, we could just add the body in network.responseCompleted in this case.

For interception I think we'd just add a parameter to the existing network.addIntercept command that would make a handle to a stream containing the body available in network.responseStarted; this would unfortunately be a slightly different mechanism, but we already have an event that's emitted at the right time, and if you're going to modify the stream then the other concerns around overheads disappear.

what if you want to inspect the body of the slowest request?

In this design you have to get everything and the client gets to decide which responses to keep.

In your design you can't reliably do what you're asking for, because it depends on whether the implementation decided to hold on to the body until you requested it. In practice this means that everyone has to agree on what the lifecycle should be via the inefficient mechanism of getting bug reports from users until the behaviours are sufficiently similar in enough cases that people don't notice the differences any more.

In a design where we don't transmit the body until requested, I think you still want something like network.enableResponseBodies, but instead of adding an extra event, it means that the implementation has to cache those bodies until someone sends a network.getResponseBody() command (assuming we don't want the data held on both the client and browser sides) with a response id (which we'd need to add), or we reach some "natural" endpoint for the cache (e.g. navigation), or the client sends a network.clearResponseBodyCache command.

There's an additional question here about how you handle the case where multiple network.enableResponseBodies filters match the same request/response: would explicitly clearing the cache clear it for all rules, or would it be better to explicitly specify which part of the cache should be cleared using a token, similar to the way event [un]subscriptions now work. Given the experience with events, probably the latter, but then getResponseBody should also know about which filter the command corresponds to so that each "subscriber" (i.e. matching filter) has a chance to retrieve the data.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 14, 2025

In a design where we don't transmit the body until requested, I think you still want something like network.enableResponseBodies, but instead of adding an extra event, it means that the implementation has to cache those bodies until someone sends a network.getResponseBody() command (assuming we don't want the data held on both the client and browser sides) with a response id (which we'd need to add), or we reach some "natural" endpoint for the cache (e.g. navigation), or the client sends a network.clearResponseBodyCache command.

I think navigation is not a sufficient condition for clearing cache because loading one site can cause multiple navigations in multiple navigables so we would still need to define it (and it is mostly implementation defined). From the experience of dealing with bug reports in Puppeteer, the users who inspect response bodies generally want the response body to be indefinitely available (unless there are concerned with the memory usage) and it comes in conflict with browser engines implementations where we cannot arbitrarily move data to a cache storage and keep it indefinitely. Basically, we cannot guarantee that the data is there without incurring the overhead of always reading the data out of the process and backing it up elsewhere (e.g., a different process).

@jgraham
Copy link
Member

jgraham commented Jan 15, 2025

Just to summarize where I think we are, we've considered four possible areas of design space:

  1. Clients have to request a body at a point during the lifecycle when it is guaranteed to still be available (i.e. after the request has been initiated, but before the response is complete)
  2. Clients subscribe upfront to response bodies for requests matching certain URLs in certain contexts and are sent them in an event. Keeping responses available is a client side concern.
  3. Clients subscribe upfront for the browser to retain response bodies for requests matching certain URLs in certain contexts. If they later decide to actually use the body, they send a separate command to retrieve it, and there is presumably a way to clear the browser-side cache.
  4. Clients send a command to retrieve a response body at any point after the response starts. Whether or not it's available is entirely at the discretion of the browser.

Of these options, 1 imposes unacceptable overhead since it requires one roundtrip per request that might be intercepted. 4 is closest to the current model in CDP (and hence Puppeteer), and is good enough for devtools where there are no interoperability requirements. But for it to work cross-browser we would need to converge on similar models for how long bodies should be retained, and assuming that will happen by convergent evolution and reverse engineering is missing the point of standardisation, and seems likely to incur significant engineering costs later if users run into differences. 2 adds a lot of protocol overhead in transferring (possibly large) bodies that may not be used, plus it requires clients to implement the lifecycle management themselves. 3 reduces the protocol traffic, but requires browsers to store some bodies that may not be used (and likely requires additional IPC to do so), rather than giving them the option to throw away bodies that are difficult to persist.

@jgraham
Copy link
Member

jgraham commented Jan 15, 2025

In terms of use cases, 1. is fine for any request interception use cases. 2. is fine for HAR file generation. However the flexibility of existing clients suggests use cases for which control similar to 1 is required for overhead similar to 2, but no one has precisely set out what those use cases are (ideally with links to real examples, rather than hypothetical possibilities, which are of course easy to construct).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support getting response body for network responses
3 participants