GPU and CPU Acceleration

CPU Multi-Threading

The kernels and solvers implemented in FLOWPanel are parallelized (threaded) in CPU by default. However, in order to activate the CPU parallelization, the user needs to launch Julia with multi-threading activated. For instance, to launch Julia with 4 threads:

$ julia --threads 4

You can then verify that the 4 threads became available:

julia> Threads.nthreads()
4

Porting to GPU

The solver can be seamlessly ported to GPU by indicating the type of array to be used internally. The Julia GPU interface is the same for any GPU hardware and platform (NVIDIA CUDA, AMD ROCm, and Mac Metal), however, we have only tested NVIDIA GPUs.

For an NVIDIA GPU, first import the CUDA package before running the code of the previous section,

julia> import CUDA

check that the GPU hardware is ready to be used,

julia> CUDA.functional()
true

and instead of letting the solver default its internal arrays to CPU, change the solver call from

julia> pnl.solve(body, Uinfs, Das, Dbs)

to

julia> pnl.solve(body, Uinfs, Das, Dbs; GPUArray=CUDA.CuArray{Float32})

For AMD GPU:

julia> import AMDGPU
julia> AMDGPU.functional()
true
julia> AMDGPU.functional(:MIOpen)
true
julia> pnl.solve(body, Uinfs, Das, Dbs; GPUArray=AMDGPU.ROCArray{Float32})

For Metal GPU:

julia> import Metal
julia> Metal.functional()
true
julia> pnl.solve(body, Uinfs, Das, Dbs; GPUArray=Metal.MtlArray{Float32})
GPU

We have only tested NVIDIA GPUs