Supported Address Space Qualifiers

The address space qualifier may be used to specify the region of memory that is used to allocate the object.

Numba-dppy supports three disjoint named address spaces:

Global Address Space

Global Address Space refers to memory objects allocated from the global memory pool and will be shared among all work-items. Arguments passed to any kernel are allocated in the global address space. In the below example, arguments a, b and c will be allocated in the global address space:

#! /usr/bin/env python
# Copyright 2020, 2021 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import dpctl
import numpy as np
import numpy.testing as testing

import numba_dppy as dppy


@dppy.kernel
def data_parallel_sum(a, b, c):
    """
    Vector addition using the ``kernel`` decorator.
    """
    i = dppy.get_global_id(0)
    c[i] = a[i] + b[i]


def driver(a, b, c, global_size):
    print("A : ", a)
    print("B : ", b)
    data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
    print("A + B = ")
    print("C ", c)
    testing.assert_equal(c, a + b)


def main():
    global_size = 10
    N = global_size
    print("N", N)

    a = np.array(np.random.random(N), dtype=np.float32)
    b = np.array(np.random.random(N), dtype=np.float32)
    c = np.ones_like(a)

    # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
    # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
    device = dpctl.select_default_device()
    print("Using device ...")
    device.print_device_info()

    with dppy.offload_to_sycl_device(device):
        driver(a, b, c, global_size)

    print("Done...")


if __name__ == "__main__":
    main()

Local Address Space

Local Address Space refers to memory objects that need to be allocated in local memory pool and are shared by all work-items of a work-group. Numba-dppy does not support passing arguments that are allocated in the local address space to @numba_dppy.kernel. Users are allowed to allocate static arrays in the local address space inside the @numba_dppy.kernel. In the example below @numba_dppy.local.array(shape, dtype) is the API used to allocate a static array in the local address space:

def local_memory():
    """
    This example demonstrates the usage of numba-dppy's `local.array`
    intrinsic function. The function is used to create a static array
    allocated on the devices local address space.
    """
    blocksize = 10

    @dppy.kernel
    def reverse_array(A):
        lm = dppy.local.array(shape=10, dtype=float32)
        i = dppy.get_global_id(0)

        # preload
        lm[i] = A[i]
        # barrier local or global will both work as we only have one work group
        dppy.barrier(dppy.CLK_LOCAL_MEM_FENCE)  # local mem fence
        # write
        A[i] += lm[blocksize - 1 - i]

    arr = np.arange(blocksize).astype(np.float32)
    print(arr)

    # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
    # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
    device = dpctl.select_default_device()
    print("Using device ...")
    device.print_device_info()

    with dppy.offload_to_sycl_device(device):
        reverse_array[blocksize, dppy.DEFAULT_LOCAL_SIZE](arr)

    # the output should be `orig[::-1] + orig, i.e. [9, 9, 9, ...]``
    print(arr)

Private Address Space

Private Address Space refers to memory objects that are local to each work-item and is not shared with any other work-item. In the example below @numba_dppy.private.array(shape, dtype) is the API used to allocate a static array in the private address space:

# Copyright 2020, 2021 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import dpctl
import numpy as np
from numba import float32

import numba_dppy


def private_memory():
    """
    This example demonstrates the usage of numba-dppy's `private.array`
    intrinsic function. The function is used to create a static array
    allocated on the devices private address space.
    """

    @numba_dppy.kernel
    def private_memory_kernel(A):
        prvt_mem = numba_dppy.private.array(shape=1, dtype=np.float32)
        i = numba_dppy.get_global_id(0)

        # preload
        prvt_mem[0] = i
        numba_dppy.barrier(numba_dppy.CLK_LOCAL_MEM_FENCE)  # local mem fence

        # prvt_mem will not hold correct deterministic result if it is not
        # private to each thread.
        A[i] = prvt_mem[0] * 2

    N = 4
    arr = np.zeros(N).astype(np.float32)
    orig = np.arange(N).astype(np.float32)

    # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
    # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
    device = dpctl.select_default_device()
    print("Using device ...")
    device.print_device_info()

    with numba_dppy.offload_to_sycl_device(device):
        private_memory_kernel[N, N](arr)

    np.testing.assert_allclose(orig * 2, arr)
    # the output should be `orig[i] * 2, i.e. [0, 2, 4, ..]``
    print(arr)


def main():
    private_memory()

    print("Done...")


if __name__ == "__main__":
    main()