Supported Address Space Qualifiers

The address space qualifier may be used to specify the region of memory that is used to allocate the object.

Numba-dppy supports three disjoint named address spaces:

  1. Global Address Space

    Global Address Space refers to memory objects allocated from the global memory pool and will be shared among all work-items. Arguments passed to any kernel are allocated in the global address space. In the below example, arguments a, b and c will be allocated in the global address space:

    #! /usr/bin/env python
    # Copyright 2020, 2021 Intel Corporation
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    import dpctl
    import numpy as np
    import numpy.testing as testing
    
    import numba_dppy as dppy
    
    
    @dppy.kernel
    def data_parallel_sum(a, b, c):
        """
        Vector addition using the ``kernel`` decorator.
        """
        i = dppy.get_global_id(0)
        c[i] = a[i] + b[i]
    
    
    def driver(a, b, c, global_size):
        print("A : ", a)
        print("B : ", b)
        data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
        print("A + B = ")
        print("C ", c)
        testing.assert_equal(c, a + b)
    
    
    def main():
        global_size = 10
        N = global_size
        print("N", N)
    
        a = np.array(np.random.random(N), dtype=np.float32)
        b = np.array(np.random.random(N), dtype=np.float32)
        c = np.ones_like(a)
    
        # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
        # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
        device = dpctl.select_default_device()
        print("Using device ...")
        device.print_device_info()
    
        with dppy.offload_to_sycl_device(device):
            driver(a, b, c, global_size)
    
        print("Done...")
    
    
    if __name__ == "__main__":
        main()
    
  2. Local Address Space

    Local Address Space refers to memory objects that need to be allocated in local memory pool and are shared by all work-items of a work-group. Numba-dppy does not support passing arguments that are allocated in the local address space to @numba_dppy.kernel. Users are allowed to allocate static arrays in the local address space inside the @numba_dppy.kernel. In the example below @numba_dppy.local.array(shape, dtype) is the API used to allocate a static array in the local address space:

    def local_memory():
        """
        This example demonstrates the usage of numba-dppy's `local.array`
        intrinsic function. The function is used to create a static array
        allocated on the devices local address space.
        """
        blocksize = 10
    
        @dppy.kernel
        def reverse_array(A):
            lm = dppy.local.array(shape=10, dtype=float32)
            i = dppy.get_global_id(0)
    
            # preload
            lm[i] = A[i]
            # barrier local or global will both work as we only have one work group
            dppy.barrier(dppy.CLK_LOCAL_MEM_FENCE)  # local mem fence
            # write
            A[i] += lm[blocksize - 1 - i]
    
        arr = np.arange(blocksize).astype(np.float32)
        print(arr)
    
        # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
        # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
        device = dpctl.select_default_device()
        print("Using device ...")
        device.print_device_info()
    
        with dppy.offload_to_sycl_device(device):
            reverse_array[blocksize, dppy.DEFAULT_LOCAL_SIZE](arr)
    
        # the output should be `orig[::-1] + orig, i.e. [9, 9, 9, ...]``
        print(arr)
    
  3. Private Address Space

    Private Address Space refers to memory objects that are local to each work-item and is not shared with any other work-item. In the example below @numba_dppy.private.array(shape, dtype) is the API used to allocate a static array in the private address space:

    # Copyright 2020, 2021 Intel Corporation
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #      http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    import dpctl
    import numpy as np
    from numba import float32
    
    import numba_dppy
    
    
    def private_memory():
        """
        This example demonstrates the usage of numba-dppy's `private.array`
        intrinsic function. The function is used to create a static array
        allocated on the devices private address space.
        """
    
        @numba_dppy.kernel
        def private_memory_kernel(A):
            prvt_mem = numba_dppy.private.array(shape=1, dtype=np.float32)
            i = numba_dppy.get_global_id(0)
    
            # preload
            prvt_mem[0] = i
            numba_dppy.barrier(numba_dppy.CLK_LOCAL_MEM_FENCE)  # local mem fence
    
            # prvt_mem will not hold correct deterministic result if it is not
            # private to each thread.
            A[i] = prvt_mem[0] * 2
    
        N = 4
        arr = np.zeros(N).astype(np.float32)
        orig = np.arange(N).astype(np.float32)
    
        # Use the environment variable SYCL_DEVICE_FILTER to change the default device.
        # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
        device = dpctl.select_default_device()
        print("Using device ...")
        device.print_device_info()
    
        with numba_dppy.offload_to_sycl_device(device):
            private_memory_kernel[N, N](arr)
    
        np.testing.assert_allclose(orig * 2, arr)
        # the output should be `orig[i] * 2, i.e. [0, 2, 4, ..]``
        print(arr)
    
    
    def main():
        private_memory()
    
        print("Done...")
    
    
    if __name__ == "__main__":
        main()