Multithreading not working when using OpenBLAS #107

ma-sadeghi · 2023-10-22T14:16:26Z

I haven't done a rigorous profile, but eyeballing the CPU usage, it seems that the _solve step is single-threaded. Do you plan on adding multithreading? Thanks!

The text was updated successfully, but these errors were encountered:

ma-sadeghi · 2023-10-22T19:53:31Z

Update 1: After installing MKL, multithreading seems to be working. Maybe this is related: JuliaLang/LinearAlgebra.jl#1000?

ma-sadeghi · 2023-10-22T20:32:17Z

Update 2: Upon closer looking, even with MKL, it seems that the multithreaded part is only ~5% of the total run time. Here's a screenshot of CPU usage during the _solve phase:

termi-official · 2023-11-01T07:16:51Z

On master there is nothing explicitly threaded in AlgebraicMultigrid.jl. Only for the coarse solver you can expect that it might be multithreaded. I will try to provide threaded smoothers, RAP and setup algorithms in the future. Right now I don't have enough time and hands to implement it. However, if you (or someone else reading this) want to go ahead, then also feel free to grab this and ping me in the PR or here.

ma-sadeghi · 2023-11-01T14:53:32Z

Thanks! Quick question: Any intuition on which part might be the bottleneck?

termi-official · 2023-11-01T15:01:42Z

In the default setup it is likely the coarse solver (pinv). In general: It depends on what smoothers, cycle and coarse solver you choose.

ma-sadeghi · 2023-11-01T15:05:57Z

I'm trying to solve the Laplace equation on a very large system of equations, ~ 1 billion unknowns. I'm willing to spend some time on a PR if there are significant gains to be achieved. I know it's difficult to know in advance without proper profiling the code though.

termi-official · 2023-11-01T15:28:36Z

If you want to keep to the shared memory environment, then I would suggest that you setup a Laplace problem with fewer dofs first and measure with e.g. https://github.com/KristofferC/TimerOutputs.jl the setup, smoother, RAP and coarse solve timings. Also, if it is possible use a different coarse solver. I think I forgot to add it to docs, so here is an example how to swap the coarse solver

using AlgebraicMultigrid, LinearSolve
ml = ruge_stuben(A, coarse_solver=AlgebraicMultigrid.LinearSolveWrapper(UMFPACKFactorization()))

The amount of speedup you can achieve depends on your specific system at hand (i.e. number of cores, cache sizes, memory bandwidth,...).

termi-official · 2023-11-01T15:30:49Z

Speeding up RAP should also be straight forward. We can try something in the direction of https://github.com/BacAmorim/ThreadedSparseCSR.jl/blob/main/src/batch_matmul.jl#L13-L34 .

termi-official · 2023-11-01T15:34:29Z

cc @tirtho109 @fredrikekre is there an update regarding the distributed memory implementation?

ma-sadeghi changed the title ~~Support multithreading?~~ Multithreading not working when using OpenBLAS Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreading not working when using OpenBLAS #107

Multithreading not working when using OpenBLAS #107

ma-sadeghi commented Oct 22, 2023

ma-sadeghi commented Oct 22, 2023 •

edited

Loading

ma-sadeghi commented Oct 22, 2023

termi-official commented Nov 1, 2023

ma-sadeghi commented Nov 1, 2023

termi-official commented Nov 1, 2023

ma-sadeghi commented Nov 1, 2023

termi-official commented Nov 1, 2023

termi-official commented Nov 1, 2023

termi-official commented Nov 1, 2023

Multithreading not working when using OpenBLAS #107

Multithreading not working when using OpenBLAS #107

Comments

ma-sadeghi commented Oct 22, 2023

ma-sadeghi commented Oct 22, 2023 • edited Loading

ma-sadeghi commented Oct 22, 2023

termi-official commented Nov 1, 2023

ma-sadeghi commented Nov 1, 2023

termi-official commented Nov 1, 2023

ma-sadeghi commented Nov 1, 2023

termi-official commented Nov 1, 2023

termi-official commented Nov 1, 2023

termi-official commented Nov 1, 2023

ma-sadeghi commented Oct 22, 2023 •

edited

Loading