-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Unsafe][WIP] Using memmap'd data directly for CPU tensor storage, instead of copying. #15
base: main
Are you sure you want to change the base?
Conversation
// # Safety | ||
// TODO | ||
let vec = unsafe { Vec::from_raw_parts(ptr, numel, numel) }; | ||
let loaded = device.tensor_from_vec(vec, *shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cool feature, lazy loading via mmap() is how I'd default. There are some situations in which mmap() does not work (e.g. when the storage is exposed via a virtual block or virtual file system). I am certain that mmap() does not work with vhost
devices that use virtiofsd
.
It may also be that doing the work of loading once may produce better overall cache performance (inference would be faster, but initial model load would be slower). Definately worth benchmarking.
Perhaps make this optional?
Cheers,
-steve
Yeah I'm thinking to make this opt-in, because you're exactly right that you'd be paying for disk reads every forward call. Perhaps we can add something like |
@coreylowman eventually the forward pass would warm up the cache as needed. I am trying to think of a use case where The advantage in my opinion is to mmap(), and then not lazily load, but force load the stuff that matters. That way you get instant memory mapping of the model, but can load the important stuff as needed, reducing what the user sees as "load-screen" lag. |
@@ -13,24 +19,35 @@ pub enum LazyTensor<S: Shape, E: Unit> { | |||
shape: S, | |||
move_to_ram: bool, | |||
}, | |||
Cpu(Tensor<S, E, Cpu>), | |||
MemoryMapped(Option<MemoryMappedTensor<S, E>>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use ManuallyDrop instead?
Blocking questions: