Android devices Support #133

qtyandhasee · 2024-10-16T08:46:01Z

Hello! Thank you very much for your excellent work, which enables the distributed running of large models on heterogeneous devices! I was wondering if this project supports Android devices. I am currently using two Redmi phones and plan to cross-compile this project using Android NDK and then transfer the executable file and related weights to the device to run it. I have already tested communication between the two phones (TCP/Socket). I would like to know what I need to pay attention to in this process. I look forward to your answer! Thank you

qtyandhasee · 2024-10-17T07:31:59Z

Hello! I have successfully completed an inference on an Android device, but I noticed that the prompt "🚧 Cannot allocate 294912 bytes directly in RAM" appeared on the Android phone serving as the worker node. May I ask what caused this?

qtyandhasee · 2024-10-17T07:34:57Z

the command i used:
"""
./dllama inference
--model /data/local/tmp/dllama-test/Llama-3_2-1B-Q40-Instruct-Distributed-Llama/dllama_model_llama3.2-1b-instruct_q40.m
--tokenizer /data/local/tmp/dllama-test/Llama-3_2-1B-Q40-Instruct-Distributed-Llama/dllama_tokenizer_llama3_2.t \
--buffer-float-type q80 \
--prompt "Hello world"
--steps 64
--nthreads 4
--workers 192.168.123.161:9998
"""
The RootNode is a Redmi phone and the workNode is a Oppo phone

b4rtaz · 2024-10-17T07:51:44Z

Wooow! This is so cool! Could you share specifications of phones? What a performance you get on a single phone vs 2 phones? Have you tried 3B model?

BTW: you could add --max-seq-len 8192 argument to the /dllama inference ... command. Llama 3.2 has a very long context, so it uses a lot of memory. By this you can reduce the memory usage.

I noticed that the prompt "🚧 Cannot allocate 294912 bytes directly in RAM" appeared on the Android phone serving as the worker node. May I ask what caused this?

You can ignore this warning. It's related to the memory assigned to the application. Distributed Llama tries to lock memory. If it cannot lock it, the application will still work, but probably a bit slower. However, I'm not entirely sure about this.

qtyandhasee · 2024-10-18T06:30:14Z

Wooow! This is so cool! Could you share specifications of phones? What a performance you get on a single phone vs 2 phones? Have you tried 3B model?

BTW: you could add --max-seq-len 8192 argument to the /dllama inference ... command. Llama 3.2 has a very long context, so it uses a lot of memory. By this you can reduce the memory usage.

I noticed that the prompt "🚧 Cannot allocate 294912 bytes directly in RAM" appeared on the Android phone serving as the worker node. May I ask what caused this?

You can ignore this warning. It's related to the memory assigned to the application. Distributed Llama tries to lock memory. If it cannot lock it, the application will still work, but probably a bit slower. However, I'm not entirely sure about this.

Thank you for your reply! I will try to limit the generated length, I ran into a problem with insufficient memory when testing an Android phone, whenever I run it, the phone automatically shut down, I suspect because there is no limit max-seq-len, the two phones I use are oppo find x7 ultra, and redmi Note12 pro. I'll test the acceleration using both devices after I've eliminated the above issues!

b4rtaz · 2024-10-21T19:02:43Z

I'll test the acceleration using both devices after I've eliminated the above issues!

It would be great! If you want you can share your results also here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Android devices Support #133

Android devices Support #133

qtyandhasee commented Oct 16, 2024

qtyandhasee commented Oct 17, 2024

qtyandhasee commented Oct 17, 2024

b4rtaz commented Oct 17, 2024 •

edited

Loading

qtyandhasee commented Oct 18, 2024

b4rtaz commented Oct 21, 2024

Android devices Support #133

Android devices Support #133

Comments

qtyandhasee commented Oct 16, 2024

qtyandhasee commented Oct 17, 2024

qtyandhasee commented Oct 17, 2024

b4rtaz commented Oct 17, 2024 • edited Loading

qtyandhasee commented Oct 18, 2024

b4rtaz commented Oct 21, 2024

b4rtaz commented Oct 17, 2024 •

edited

Loading