-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse session for 60% speedup #5
base: main
Are you sure you want to change the base?
Conversation
Thanks you very much, good job :) I really did not expect it to make that much of a difference. Am I correct to assume that if you terminate the worker after a result has been generated the time difference will diminish? |
I just noticed that you're also holding the model in memory, was that on purpose? |
Yes. So for my use case I want to progressively do ttv, essentially one sentence at a time, and start playing it as soon as the first chunk is available. Holding the model in memory for the duration of the multi step session is ideal. I tried the transformers.js ttv and it's far too slow, even when using multiple workers. Your packaging of Sherpa/Onnx/Piper is perfect, and with this change makes real-time ttv possible in the browser. This will need some more polish, particularly around disposing of the session after use. |
Makes sense, it kind of depends on the use case whether to keep the memory footprint as small as possible (my goal) or to minimize runtime. Effectively, the part that is missing from the library are environment variables. I'm gonna copy this strategy from onnix then you can just set something like |
Absolutely, and I think vits-web can cover both use cases well. On the minimum runtime side, it's important to be able to start the runtime+model before you need it too, not just keep it around for the next call. In my refactor the original
Let me know if you are happy with my approach, and if so I can tidy it up and add tests. |
Hey guys, sorry to revive this thread. I'm I right to assume that the recent commit bdf7f36 addresses this issue? |
I think it addresses part of the issue, but it will still create a new session every This is very promising for me - I would like to create a session, run many predictions against the session, then close the session when it makes sense for me. |
The fastest is to use |
Hey @k9p5,
vits-web is really awesome!
I've started trying to speed it up a little, currently
predict()
setups a whole new ort session and loads the models each time. If instead you split the process into two, so that you can create a "vits-web session" first, you can get a 60% speedup on the basic example. This is particularly useful for repeated calls, setup the model once, and then repeatedly call it with text chunks for more audio (this is what I want to use it for).As you can see in the video from my machine it cuts the call time from ~2.7s to ~1.1s, with the ~1.7s during the initiation of the session. Repeat calls then only take ~1s.
I've not added/modified any tests yet as it makes more sense to run this past you first.
Screen.Recording.2024-07-10.at.19.47.50.mov