High-Level Programming Interface
With APPy’s high-level programming interface, all you need to do to parallelize a loop on GPUs
is to replace range with appy.prange and annotate the loop with @appy.jit. NumPy arrays
and scalar values can be directly used inside the prange region. Here’s an example:
import numpy as np
from appy import jit, prange
@jit
def add_one(a):
for i in prange(a.shape[0]):
a[i] += 1
a = np.zeros(10)
add_one(a)
# a is now [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
In this example, a is a NumPy array and is used directly inside the prange region.
The compiler will handle the data movement between the CPU and the GPU automatically under the hood.
Note
Only code inside the prange region will be compiled and executed on the GPU. Other
code will be just executed by the Python interpreter on the CPU, i.e. no compilation happens.
Reductions can be parallelized as well with prange. Here’s an example:
@jit
def sum_vector(a, N):
sum = 0
for i in prange(N):
sum += a[i]
return sum
a = np.ones(10)
sum_vector(a, 10)
# sum is now 10
APPy automatically detects reductions on scalar variables and generates atomic operations for them.
Unsupported Features
In general, it’s fine to use common NumPy array operations, slicing, and indexing inside prange.
Control flows such as if structures are also supported, though using some of them could prevent automatic vectorization.
Unsupported Python language constructs inside prange include:
Containers such as
listanddict.breakandreturnstatements (continueis fine).tryandexceptblocks.String operations.
Unsupported NumPy functions inside prange include:
np.linalg.np.random.np.fft.
Outside of the prange region, the code will simply be executed by the Python interpreter so there’s no limitation.