Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android devices Support #133

Open
qtyandhasee opened this issue Oct 16, 2024 · 5 comments
Open

Android devices Support #133

qtyandhasee opened this issue Oct 16, 2024 · 5 comments

Comments

@qtyandhasee
Copy link

Hello! Thank you very much for your excellent work, which enables the distributed running of large models on heterogeneous devices! I was wondering if this project supports Android devices. I am currently using two Redmi phones and plan to cross-compile this project using Android NDK and then transfer the executable file and related weights to the device to run it. I have already tested communication between the two phones (TCP/Socket). I would like to know what I need to pay attention to in this process. I look forward to your answer! Thank you

@qtyandhasee
Copy link
Author

Hello! I have successfully completed an inference on an Android device, but I noticed that the prompt "🚧 Cannot allocate 294912 bytes directly in RAM" appeared on the Android phone serving as the worker node. May I ask what caused this?
下载

@qtyandhasee
Copy link
Author

the command i used:
"""
./dllama inference
--model /data/local/tmp/dllama-test/Llama-3_2-1B-Q40-Instruct-Distributed-Llama/dllama_model_llama3.2-1b-instruct_q40.m
--tokenizer /data/local/tmp/dllama-test/Llama-3_2-1B-Q40-Instruct-Distributed-Llama/dllama_tokenizer_llama3_2.t \
--buffer-float-type q80 \
--prompt "Hello world"
--steps 64
--nthreads 4
--workers 192.168.123.161:9998
"""
The RootNode is a Redmi phone and the workNode is a Oppo phone

@b4rtaz
Copy link
Owner

b4rtaz commented Oct 17, 2024

Wooow! This is so cool! Could you share specifications of phones? What a performance you get on a single phone vs 2 phones? Have you tried 3B model?

BTW: you could add --max-seq-len 8192 argument to the /dllama inference ... command. Llama 3.2 has a very long context, so it uses a lot of memory. By this you can reduce the memory usage.

I noticed that the prompt "🚧 Cannot allocate 294912 bytes directly in RAM" appeared on the Android phone serving as the worker node. May I ask what caused this?

You can ignore this warning. It's related to the memory assigned to the application. Distributed Llama tries to lock memory. If it cannot lock it, the application will still work, but probably a bit slower. However, I'm not entirely sure about this.

@qtyandhasee
Copy link
Author

Wooow! This is so cool! Could you share specifications of phones? What a performance you get on a single phone vs 2 phones? Have you tried 3B model?

BTW: you could add --max-seq-len 8192 argument to the /dllama inference ... command. Llama 3.2 has a very long context, so it uses a lot of memory. By this you can reduce the memory usage.

I noticed that the prompt "🚧 Cannot allocate 294912 bytes directly in RAM" appeared on the Android phone serving as the worker node. May I ask what caused this?

You can ignore this warning. It's related to the memory assigned to the application. Distributed Llama tries to lock memory. If it cannot lock it, the application will still work, but probably a bit slower. However, I'm not entirely sure about this.

Thank you for your reply! I will try to limit the generated length, I ran into a problem with insufficient memory when testing an Android phone, whenever I run it, the phone automatically shut down, I suspect because there is no limit max-seq-len, the two phones I use are oppo find x7 ultra, and redmi Note12 pro. I'll test the acceleration using both devices after I've eliminated the above issues!

@b4rtaz
Copy link
Owner

b4rtaz commented Oct 21, 2024

I'll test the acceleration using both devices after I've eliminated the above issues!

It would be great! If you want you can share your results also here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants