High-Level Programming Interface

With APPy’s high-level programming interface, all you need to do to parallelize a loop on GPUs is to replace range with appy.prange and annotate the loop with @appy.jit. NumPy arrays and scalar values can be directly used inside the prange region. Here’s an example:

import numpy as np
from appy import jit, prange

@jit
def add_one(a):
     for i in prange(a.shape[0]):
          a[i] += 1

a = np.zeros(10)
add_one(a)

# a is now [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In this example, a is a NumPy array and is used directly inside the prange region. The compiler will handle the data movement between the CPU and the GPU automatically under the hood.

Note

Only code inside the prange region will be compiled and executed on the GPU. Other code will be just executed by the Python interpreter on the CPU, i.e. no compilation happens.

Reductions can be parallelized as well with prange. Here’s an example:

@jit
def sum_vector(a, N):
    sum = 0
    for i in prange(N):
        sum += a[i]
    return sum

a = np.ones(10)
sum_vector(a, 10)

# sum is now 10

APPy automatically detects reductions on scalar variables and generates atomic operations for them.

Unsupported Features

In general, it’s fine to use common NumPy array operations, slicing, and indexing inside prange. Control flows such as if structures are also supported, though using some of them could prevent automatic vectorization.

Unsupported Python language constructs inside prange include:

Containers such as list and dict.
break and return statements (continue is fine).
try and except blocks.
String operations.

Unsupported NumPy functions inside prange include:

np.linalg.
np.random.
np.fft.

Outside of the prange region, the code will simply be executed by the Python interpreter so there’s no limitation.