Supported Address Space Qualifiers
The address space qualifier may be used to specify the region of memory that is used to allocate the object.
Numba-dppy supports three disjoint named address spaces:
- Global Address Space
Global Address Space refers to memory objects allocated from the global memory pool and will be shared among all work-items. Arguments passed to any kernel are allocated in the global address space. In the below example, arguments a, b and c will be allocated in the global address space:
#! /usr/bin/env python # Copyright 2020, 2021 Intel Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import dpctl import numpy as np import numpy.testing as testing import numba_dppy as dppy @dppy.kernel def data_parallel_sum(a, b, c): """ Vector addition using the ``kernel`` decorator. """ i = dppy.get_global_id(0) c[i] = a[i] + b[i] def driver(a, b, c, global_size): print("A : ", a) print("B : ", b) data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c) print("A + B = ") print("C ", c) testing.assert_equal(c, a + b) def main(): global_size = 10 N = global_size print("N", N) a = np.array(np.random.random(N), dtype=np.float32) b = np.array(np.random.random(N), dtype=np.float32) c = np.ones_like(a) # Use the environment variable SYCL_DEVICE_FILTER to change the default device. # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter. device = dpctl.select_default_device() print("Using device ...") device.print_device_info() with dppy.offload_to_sycl_device(device): driver(a, b, c, global_size) print("Done...") if __name__ == "__main__": main()
- Local Address Space
Local Address Space refers to memory objects that need to be allocated in local memory pool and are shared by all work-items of a work-group. Numba-dppy does not support passing arguments that are allocated in the local address space to @numba_dppy.kernel. Users are allowed to allocate static arrays in the local address space inside the @numba_dppy.kernel. In the example below @numba_dppy.local.array(shape, dtype) is the API used to allocate a static array in the local address space:
def local_memory(): """ This example demonstrates the usage of numba-dppy's `local.array` intrinsic function. The function is used to create a static array allocated on the devices local address space. """ blocksize = 10 @dppy.kernel def reverse_array(A): lm = dppy.local.array(shape=10, dtype=float32) i = dppy.get_global_id(0) # preload lm[i] = A[i] # barrier local or global will both work as we only have one work group dppy.barrier(dppy.CLK_LOCAL_MEM_FENCE) # local mem fence # write A[i] += lm[blocksize - 1 - i] arr = np.arange(blocksize).astype(np.float32) print(arr) # Use the environment variable SYCL_DEVICE_FILTER to change the default device. # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter. device = dpctl.select_default_device() print("Using device ...") device.print_device_info() with dppy.offload_to_sycl_device(device): reverse_array[blocksize, dppy.DEFAULT_LOCAL_SIZE](arr) # the output should be `orig[::-1] + orig, i.e. [9, 9, 9, ...]`` print(arr)
- Private Address Space
Private Address Space refers to memory objects that are local to each work-item and is not shared with any other work-item. In the example below @numba_dppy.private.array(shape, dtype) is the API used to allocate a static array in the private address space:
# Copyright 2020, 2021 Intel Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import dpctl import numpy as np from numba import float32 import numba_dppy def private_memory(): """ This example demonstrates the usage of numba-dppy's `private.array` intrinsic function. The function is used to create a static array allocated on the devices private address space. """ @numba_dppy.kernel def private_memory_kernel(A): prvt_mem = numba_dppy.private.array(shape=1, dtype=np.float32) i = numba_dppy.get_global_id(0) # preload prvt_mem[0] = i numba_dppy.barrier(numba_dppy.CLK_LOCAL_MEM_FENCE) # local mem fence # prvt_mem will not hold correct deterministic result if it is not # private to each thread. A[i] = prvt_mem[0] * 2 N = 4 arr = np.zeros(N).astype(np.float32) orig = np.arange(N).astype(np.float32) # Use the environment variable SYCL_DEVICE_FILTER to change the default device. # See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter. device = dpctl.select_default_device() print("Using device ...") device.print_device_info() with numba_dppy.offload_to_sycl_device(device): private_memory_kernel[N, N](arr) np.testing.assert_allclose(orig * 2, arr) # the output should be `orig[i] * 2, i.e. [0, 2, 4, ..]`` print(arr) def main(): private_memory() print("Done...") if __name__ == "__main__": main()