Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unsafe][WIP] Using memmap'd data directly for CPU tensor storage, instead of copying. #15

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

coreylowman
Copy link
Owner

Blocking questions:

  1. Is it safe to std::mem::forget the tensor? Is that all we need to do? What about the other fields of tensor?
  2. Is the Vec::from_raw_parts_mut usage safe?

Comment on lines +115 to +118
// # Safety
// TODO
let vec = unsafe { Vec::from_raw_parts(ptr, numel, numel) };
let loaded = device.tensor_from_vec(vec, *shape);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe?

Copy link

@sdake sdake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool feature, lazy loading via mmap() is how I'd default. There are some situations in which mmap() does not work (e.g. when the storage is exposed via a virtual block or virtual file system). I am certain that mmap() does not work with vhost devices that use virtiofsd.

It may also be that doing the work of loading once may produce better overall cache performance (inference would be faster, but initial model load would be slower). Definately worth benchmarking.

Perhaps make this optional?

Cheers,
-steve

@coreylowman
Copy link
Owner Author

Yeah I'm thinking to make this opt-in, because you're exactly right that you'd be paying for disk reads every forward call. Perhaps we can add something like --mmap-limit-for-model which defaults to 0

@sdake
Copy link

sdake commented May 15, 2023

@coreylowman eventually the forward pass would warm up the cache as needed. I am trying to think of a use case where mmap() is superior here. The use of mmap() or not, depends on the holistic system. Either way, after the cache is warmed up, mmap() won't have much impact.

The advantage in my opinion is to mmap(), and then not lazily load, but force load the stuff that matters. That way you get instant memory mapping of the model, but can load the important stuff as needed, reducing what the user sees as "load-screen" lag.

@@ -13,24 +19,35 @@ pub enum LazyTensor<S: Shape, E: Unit> {
shape: S,
move_to_ram: bool,
},
Cpu(Tensor<S, E, Cpu>),
MemoryMapped(Option<MemoryMappedTensor<S, E>>),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use ManuallyDrop instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants