Welcome to Numba-dppy’s documentation!

Numba-dppy is an Intel ®-developed extension to the Numba JIT compiler that adds “XPU” programming capabilities to it. The XPU vision is to make it extremely easy for programmers to write efficient and portable code for a mix of architectures across CPUs, GPUs, FPGAs and other accelerators. To provide XPU programming capabilities, Numba-dppy relies on SYCL that is an industry standard for writing cross-platform code using standard C++. Using a SYCL runtime library Numba-dppy can launch data-parallel kernels generated directly from Python bytecode on supported data-parallel architectures. Currently, support for SYCL is restricted to Intel’s DPC++ via the dpctl package. Support for other SYCL runtime libraries may be added in the future.

The main feature of Numba-dppy is to let programmers write data-parallel kernels directly in Python. Such kernels can be written in two different ways: an explicit API superficially similar to OpenCL, and an implicit API that generates kernels from NumPy library calls, Numba’s prange statement, and other “data-parallel by construction” expressions that Numba is able to parallelize. Following are two examples to demonstrate the two ways in which kernels may be written in a Numba-dppy program.

  • Defining a data-parallel kernel explicitly in Numba-dppy.

    import numpy as np
    import numba_dppy as dppy
    import dpctl
    
    @dppy.kernel
    def sum(a, b, c):
        i = dppy.get_global_id(0)
        c[i] = a[i] + b[i]
    
    a = np.array(np.random.random(20), dtype=np.float32)
    b = np.array(np.random.random(20), dtype=np.float32)
    c = np.ones_like(a)
    
    with dpctl.device_context("opencl:gpu"):
        sum[20, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
    
  • Writing implicitly data-parallel expressions in the fashion of Numba parallel loops.

    from numba import njit
    import numpy as np
    import dpctl
    
    @njit
    def f1(a, b):
        c = a + b
        return c
    
    global_size = 64
    local_size = 32
    N = global_size * local_size
    a = np.ones(N, dtype=np.float32)
    b = np.ones(N, dtype=np.float32)
    with dpctl.device_context("opencl:gpu:0"):
        c = f1(a, b)
    

About

Numba-dppy is developed by Intel and is part of the Intel Distribution for Python.

Contributing

Refer the contributing guide for information on coding style and standards used in Numba-dppy.

License

Numba-dppy is Licensed under Apache License 2.0 that can be found in LICENSE. All usage and contributions to the project are subject to the terms and conditions of this license.

Indices and tables