[Unsafe][WIP] Using memmap'd data directly for CPU tensor storage, instead of copying. #15

coreylowman · 2023-05-10T18:43:16Z

Blocking questions:

Is it safe to std::mem::forget the tensor? Is that all we need to do? What about the other fields of tensor?
Is the Vec::from_raw_parts_mut usage safe?

coreylowman · 2023-05-10T18:47:35Z

src/lazy.rs

+                        // # Safety
+                        // TODO
+                        let vec = unsafe { Vec::from_raw_parts(ptr, numel, numel) };
+                        let loaded = device.tensor_from_vec(vec, *shape);


Is this safe?

sdake

This is a cool feature, lazy loading via mmap() is how I'd default. There are some situations in which mmap() does not work (e.g. when the storage is exposed via a virtual block or virtual file system). I am certain that mmap() does not work with vhost devices that use virtiofsd.

It may also be that doing the work of loading once may produce better overall cache performance (inference would be faster, but initial model load would be slower). Definately worth benchmarking.

Perhaps make this optional?

Cheers,
-steve

coreylowman · 2023-05-12T13:40:20Z

Yeah I'm thinking to make this opt-in, because you're exactly right that you'd be paying for disk reads every forward call. Perhaps we can add something like --mmap-limit-for-model which defaults to 0

sdake · 2023-05-15T20:47:35Z

@coreylowman eventually the forward pass would warm up the cache as needed. I am trying to think of a use case where mmap() is superior here. The use of mmap() or not, depends on the holistic system. Either way, after the cache is warmed up, mmap() won't have much impact.

The advantage in my opinion is to mmap(), and then not lazily load, but force load the stuff that matters. That way you get instant memory mapping of the model, but can load the important stuff as needed, reducing what the user sees as "load-screen" lag.

coreylowman · 2023-05-20T15:21:28Z

src/lazy.rs

@@ -13,24 +19,35 @@ pub enum LazyTensor<S: Shape, E: Unit> {
        shape: S,
        move_to_ram: bool,
    },
-    Cpu(Tensor<S, E, Cpu>),
+    MemoryMapped(Option<MemoryMappedTensor<S, E>>),


Maybe use ManuallyDrop instead?

Not loading tensor data into RAM

195a77b

coreylowman mentioned this pull request May 10, 2023

Non-ggml backend rustformers/llm#31

Open

coreylowman commented May 10, 2023

View reviewed changes

sdake reviewed May 11, 2023

View reviewed changes

coreylowman commented May 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unsafe][WIP] Using memmap'd data directly for CPU tensor storage, instead of copying. #15

[Unsafe][WIP] Using memmap'd data directly for CPU tensor storage, instead of copying. #15

coreylowman commented May 10, 2023

coreylowman May 10, 2023

sdake left a comment

coreylowman commented May 12, 2023

sdake commented May 15, 2023

coreylowman May 20, 2023

[Unsafe][WIP] Using memmap'd data directly for CPU tensor storage, instead of copying. #15

Are you sure you want to change the base?

[Unsafe][WIP] Using memmap'd data directly for CPU tensor storage, instead of copying. #15

Conversation

coreylowman commented May 10, 2023

coreylowman May 10, 2023

Choose a reason for hiding this comment

sdake left a comment

Choose a reason for hiding this comment

coreylowman commented May 12, 2023

sdake commented May 15, 2023

coreylowman May 20, 2023

Choose a reason for hiding this comment