Local variables
Note
NUMBA_OPT=0“no optimization” level - all local variables of the kernel function are available.NUMBA_OPT=1or higher - some variables may be optimized out.
Consider Numba-dppy kernel code sum_local_vars.py
15import dpctl
16import numpy as np
17
18import numba_dppy as dppy
19
20
21@dppy.kernel(debug=True)
22def data_parallel_sum(a, b, c):
23 i = dppy.get_global_id(0)
24 l1 = a[i] + 2.5
25 l2 = b[i] * 0.3
26 c[i] = l1 + l2
27
28
29global_size = 10
30N = global_size
31
32a = np.array(np.random.random(N), dtype=np.float32)
33b = np.array(np.random.random(N), dtype=np.float32)
34c = np.ones_like(a)
35
36device = dpctl.SyclDevice("opencl:gpu")
37with dppy.offload_to_sycl_device(device):
38 data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
39
40print("Done...")
info locals
Run the debugger:
$ NUMBA_OPT=0 gdb-oneapi -q python
(gdb) set breakpoint pending on
(gdb) break sum_local_vars.py:22
(gdb) run sum_local_vars.py
...
Thread 2.2 hit Breakpoint 1, with SIMD lanes [0-7], __main__::data_parallel_sum () at sum_local_vars.py:22
Run the info locals command. The sample output on “no optimization” level NUMBA_OPT=0 is as follows:
(gdb) info locals
a = '\000' <repeats 55 times>
b = '\000' <repeats 55 times>
c = '\000' <repeats 55 times>
i = 0
l1 = 0
l2 = 0
__ocl_dbg_gid0 = 0
__ocl_dbg_gid1 = 0
__ocl_dbg_gid2 = 0
__ocl_dbg_lid0 = 0
__ocl_dbg_lid1 = 0
__ocl_dbg_lid2 = 0
__ocl_dbg_grid0 = 0
__ocl_dbg_grid1 = 0
__ocl_dbg_grid2 = 0
(gdb) next
Thread 2.3 hit Breakpoint 1, with SIMD lanes [0-1], __main__::data_parallel_sum () at sum_local_vars.py:22
22 i = dppy.get_global_id(0)
(gdb) next
23 l1 = a[i] + 2.5
(gdb) next
24 l2 = b[i] * 0.3
(gdb) info locals
a = '\000' <repeats 55 times>
b = '\000' <repeats 16 times>, "\n\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\000\240\016XUU\000\000\n\000\000\000\000\000\000\000\004\000\000\000\000\000\000"
c = '\000' <repeats 16 times>, "\n\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\000@\256WUU\000\000\n\000\000\000\000\000\000\000\004\000\000\000\000\000\000"
i = 8
l1 = 2.5931931659579277
l2 = 0
__ocl_dbg_gid0 = 0
__ocl_dbg_gid1 = 0
__ocl_dbg_gid2 = 0
__ocl_dbg_lid0 = 42949672970
__ocl_dbg_lid1 = 0
__ocl_dbg_lid2 = 93825037590528
__ocl_dbg_grid0 = 4612811918334230528
__ocl_dbg_grid1 = 0
__ocl_dbg_grid2 = 0
(gdb) next
25 c[i] = l1 + l2
Since the debugger does not hit a line with the target variable l1, the value equals 0. The true value of the variable l1 is shown after stepping to line 22.
(gdb) info locals
a = '\000' <repeats 55 times>
b = '\000' <repeats 55 times>
c = '\000' <repeats 16 times>, "\n\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\000@\256WUU\000\000\n\000\000\000\000\000\000\000\004\000\000\000\000\000\000"
i = 8
l1 = 2.5931931659579277
l2 = 0.22954882979393004
__ocl_dbg_gid0 = 0
__ocl_dbg_gid1 = 8
__ocl_dbg_gid2 = 8
__ocl_dbg_lid0 = 93825034429928
__ocl_dbg_lid1 = 0
__ocl_dbg_lid2 = 93825034429936
__ocl_dbg_grid0 = 4599075939470750515
__ocl_dbg_grid1 = 0
__ocl_dbg_grid2 = 0
(gdb) print a
$1 = '\000' <repeats 55 times>
When the debugger hits the last line of the kernel, info locals command returns all the local variables with their values.
Note
The debugger can show the variable values, but these values may be equal to 0 after the variable is explicitly deleted or the function scope is ended. For more info see Lifetime of local variables.
When you use “O1 optimization” level NUMBA_OPT=1 and run the info locals command, the output is as follows:
(gdb) info locals
__ocl_dbg_gid0 = 8
__ocl_dbg_gid1 = 0
__ocl_dbg_gid2 = 0
__ocl_dbg_lid0 = 8
__ocl_dbg_lid1 = 0
__ocl_dbg_lid2 = 0
__ocl_dbg_grid0 = 0
__ocl_dbg_grid1 = 0
__ocl_dbg_grid2 = 0
i = 0
l1 = 0
l2 = 0
(gdb) continue
...
Done...
Note
The debugger does not show the local variables a, b and c, they are optimized out on “O1 optimization” level.
print <variable>
To print the value of a variable, run the print <variable> command.
(gdb) print l1
$3 = 2.5931931659579277
(gdb) print l2
$4 = 0.22954882979393004
(gdb) ptype a
type = byte [56]
Note
Kernel variables are shown in intermidiate representation view (with “$” sign). The actual values of the arrays are currently not available.
ptype <variable>
To print the type of a variable, run the ptype <variable> or whatis <variable> commands:
(gdb) whatis a
type = byte [56]
(gdb) ptype l1
type = double
(gdb) whatis l1
type = double
(gdb) continue
...
Done...
See also:
Lifetime of local variables
Numba uses live variable analysis. Lifetime of Python variables are different from lifetime of variables in compiled code.
Note
For more information, refer to Numba variable policy.
It affects debugging experience in following way.
Consider Numba-dppy kernel code from sum_local_vars.py:
20
21@dppy.kernel(debug=True)
22def data_parallel_sum(a, b, c):
23 i = dppy.get_global_id(0)
24 l1 = a[i] + 2.5
25 l2 = b[i] * 0.3
Run this code with environment variable NUMBA_DUMP_ANNOTATION=1 and it
will show where numba inserts del for variables.
1-----------------------------------ANNOTATION-----------------------------------
2# File: numba_dppy/examples/debug/sum_local_vars.py
3# --- LINE 20 ---
4
5@dppy.kernel(debug=True)
6
7# --- LINE 21 ---
8
9def data_parallel_sum(a, b, c):
10
11 # --- LINE 22 ---
12 # label 0
13 # a = arg(0, name=a) :: array(float32, 1d, C)
14 # b = arg(1, name=b) :: array(float32, 1d, C)
15 # c = arg(2, name=c) :: array(float32, 1d, C)
16 # $2load_global.0 = global(dppy: <module 'numba_dppy' from '.../numba-dppy/numba_dppy/__init__.py'>) :: Module(<module 'numba_dppy' from '.../numba-dppy/numba_dppy/__init__.py'>)
17 # $4load_method.1 = getattr(value=$2load_global.0, attr=get_global_id) :: Function(<function get_global_id at 0x7f82b8bae430>)
18 # del $2load_global.0
19 # $const6.2 = const(int, 0) :: Literal[int](0)
20 # i = call $4load_method.1($const6.2, func=$4load_method.1, args=[Var($const6.2, sum_local_vars.py:22)], kws=(), vararg=None, target=None) :: (uint32,) -> int64
21 # del $const6.2
22 # del $4load_method.1
23
24 i = dppy.get_global_id(0)
25
26 # --- LINE 23 ---
27 # $16binary_subscr.6 = getitem(value=a, index=i, fn=<built-in function getitem>) :: float32
28 # del a
29 # $const18.7 = const(float, 2.5) :: float64
30 # l1 = $16binary_subscr.6 + $const18.7 :: float64
31 # del $const18.7
32 # del $16binary_subscr.6
33
34 l1 = a[i] + 2.5
35
36 # --- LINE 24 ---
37 # $28binary_subscr.11 = getitem(value=b, index=i, fn=<built-in function getitem>) :: float32
38 # del b
39 # $const30.12 = const(float, 0.3) :: float64
40 # l2 = $28binary_subscr.11 * $const30.12 :: float64
41 # del $const30.12
42 # del $28binary_subscr.11
43
44 l2 = b[i] * 0.3
45
46 # --- LINE 25 ---
47 # $40binary_add.16 = l1 + l2 :: float64
48 # del l2
49 # del l1
50 # c[i] = $40binary_add.16 :: (array(float32, 1d, C), int64, float64) -> none
51 # del i
52 # del c
53 # del $40binary_add.16
54 # $const48.19 = const(NoneType, None) :: none
55 # $50return_value.20 = cast(value=$const48.19) :: none
56 # del $const48.19
57 # return $50return_value.20
58
59 c[i] = l1 + l2
I.e. in LINE 23 variable a used the last time and numba inserts del a as shown in annotated code in line 28. It means you will see value 0 for the variable a when you set breakpoint at LINE 24.
As a workaround you can expand lifetime of the variable by using it (i.e. passing to dummy function revive()) at the end of the function. So numba will not insert del a until the end of the function.
20
21@dppy.func
22def revive(x):
23 return x
24
25
26@dppy.kernel(debug=True)
27def data_parallel_sum(a, b, c):
28 i = dppy.get_global_id(0)
29 l1 = a[i] + 2.5
30 l2 = b[i] * 0.3
31 c[i] = l1 + l2
1-----------------------------------ANNOTATION-----------------------------------
2# File: numba_dppy/examples/debug/sum_local_vars_revive.py
3# --- LINE 24 ---
4
5@dppy.kernel(debug=True)
6
7# --- LINE 25 ---
8
9def data_parallel_sum(a, b, c):
10
11 # --- LINE 26 ---
12 # label 0
13 # a = arg(0, name=a) :: array(float32, 1d, C)
14 # b = arg(1, name=b) :: array(float32, 1d, C)
15 # c = arg(2, name=c) :: array(float32, 1d, C)
16 # $2load_global.0 = global(dppy: <module 'numba_dppy' from '.../numba-dppy/numba_dppy/__init__.py'>) :: Module(<module 'numba_dppy' from '.../numba-dppy/numba_dppy/__init__.py'>)
17 # $4load_method.1 = getattr(value=$2load_global.0, attr=get_global_id) :: Function(<function get_global_id at 0x7fcdf7e8c4c0>)
18 # del $2load_global.0
19 # $const6.2 = const(int, 0) :: Literal[int](0)
20 # i = call $4load_method.1($const6.2, func=$4load_method.1, args=[Var($const6.2, sum_local_vars_revive.py:26)], kws=(), vararg=None, target=None) :: (uint32,) -> int64
21 # del $const6.2
22 # del $4load_method.1
23
24 i = dppy.get_global_id(0)
25
26 # --- LINE 27 ---
27 # $16binary_subscr.6 = getitem(value=a, index=i, fn=<built-in function getitem>) :: float32
28 # $const18.7 = const(float, 2.5) :: float64
29 # l1 = $16binary_subscr.6 + $const18.7 :: float64
30 # del $const18.7
31 # del $16binary_subscr.6
32
33 l1 = a[i] + 2.5
34
35 # --- LINE 28 ---
36 # $28binary_subscr.11 = getitem(value=b, index=i, fn=<built-in function getitem>) :: float32
37 # del b
38 # $const30.12 = const(float, 0.3) :: float64
39 # l2 = $28binary_subscr.11 * $const30.12 :: float64
40 # del $const30.12
41 # del $28binary_subscr.11
42
43 l2 = b[i] * 0.3
44
45 # --- LINE 29 ---
46 # $40binary_add.16 = l1 + l2 :: float64
47 # del l2
48 # del l1
49 # c[i] = $40binary_add.16 :: (array(float32, 1d, C), int64, float64) -> none
50 # del i
51 # del c
52 # del $40binary_add.16
53
54 c[i] = l1 + l2
55
56 # --- LINE 30 ---
57 # $48load_global.19 = global(revive: <numba_dppy.compiler.DPPYFunctionTemplate object at 0x7fce12e5cc40>) :: Function(<numba_dppy.compiler.DPPYFunctionTemplate object at 0x7fce12e5cc40>)
58 # $52call_function.21 = call $48load_global.19(a, func=$48load_global.19, args=[Var(a, sum_local_vars_revive.py:26)], kws=(), vararg=None, target=None) :: (array(float32, 1d, C),) -> array(float32, 1d, C)
59 # del a
60 # del $52call_function.21
61 # del $48load_global.19
62 # $const56.22 = const(NoneType, None) :: none
63 # $58return_value.23 = cast(value=$const56.22) :: none
64 # del $const56.22
65 # return $58return_value.23
66
67 revive(a) # pass variable to dummy function