GPU and CPU Acceleration
CPU Multi-Threading
The kernels and solvers implemented in FLOWPanel are parallelized (threaded) in CPU by default. However, in order to activate the CPU parallelization, the user needs to launch Julia with multi-threading activated. For instance, to launch Julia with 4 threads:
$ julia --threads 4
You can then verify that the 4 threads became available:
julia> Threads.nthreads()
4
Porting to GPU
The solver can be seamlessly ported to GPU by indicating the type of array to be used internally. The Julia GPU interface is the same for any GPU hardware and platform (NVIDIA CUDA, AMD ROCm, and Mac Metal), however, we have only tested NVIDIA GPUs.
For an NVIDIA GPU, first import the CUDA package before running the code of the previous section,
julia> import CUDA
check that the GPU hardware is ready to be used,
julia> CUDA.functional()
true
and instead of letting the solver default its internal arrays to CPU, change the solver call from
julia> pnl.solve(body, Uinfs, Das, Dbs)
to
julia> pnl.solve(body, Uinfs, Das, Dbs; GPUArray=CUDA.CuArray{Float32})
For AMD GPU:
julia> import AMDGPU
julia> AMDGPU.functional()
true
julia> AMDGPU.functional(:MIOpen)
true
julia> pnl.solve(body, Uinfs, Das, Dbs; GPUArray=AMDGPU.ROCArray{Float32})
For Metal GPU:
julia> import Metal
julia> Metal.functional()
true
julia> pnl.solve(body, Uinfs, Das, Dbs; GPUArray=Metal.MtlArray{Float32})
We have only tested NVIDIA GPUs