Skip to content

try to use AOT for kernels#4

Closed
shssf wants to merge 1 commit into
masterfrom
aot_prelink_kernels
Closed

try to use AOT for kernels#4
shssf wants to merge 1 commit into
masterfrom
aot_prelink_kernels

Conversation

@shssf
Copy link
Copy Markdown
Contributor

@shssf shssf commented Sep 8, 2020

It loook like AOT is usable for development only.

AOT reduces the module loading time from 21sec to 0.5sec.
if use CPU only option, it compiles successfully and run on CPU but failed with GPU run

terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -42 (CL_INVALID_BINARY) -42 (CL_INVALID_BINARY)

if use both options:

Platform name: Intel(R) OpenCL
Device name: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
Driver version: 2020.11.8.0.27
OpenCL program was successfully created from SPIR-V file /tmp/backend_iface_fptr-ec9979.spv
Using build options:  -I "/tmp"
Compilation started
Compilation done
Linking started
Linking done
Device build started
Options used by backend compiler:
Device build done
Kernel <_ZTS25custom_blas_gemm_c_kernelIlE> was successfully vectorized (8)
Kernel <_ZTS25custom_blas_gemm_c_kernelIiE> was successfully vectorized (8)
Kernel <_ZTS24custom_blas_dot_c_kernelIlE> was successfully vectorized (8)
Kernel <_ZTS24custom_blas_dot_c_kernelIiE> was successfully vectorized (8)
Kernel <_ZTSN6oneapi3dpl20__par_backend_hetero24__parallel_sort_kernel_1IJ23custom_argsort_c_kernelIdlEEEE> was successfully vectorized (8)
Kernel <_ZTSN6oneapi3dpl20__par_backend_hetero24__parallel_sort_kernel_2IJ23custom_argsort_c_kernelIdlEEEE> was successfully vectorized (8)
...
Kernel <_ZTSN6oneapi3dpl20__par_backend_hetero26__parallel_reduce_kernel_1IJ19custom_sum_c_kernelIiEEEE> was successfully vectorized (8)
Kernel <_ZTSN6oneapi3dpl20__par_backend_hetero26__parallel_reduce_kernel_2IJ19custom_sum_c_kernelIiEEEE> was successfully vectorized (8)
Done.
OpenCL program binary file was successfully created: /tmp/backend_iface_fptr-5230b0.out
Error: Device name missing.
clang++: error: gen compiler command failed with exit code 226 (use -v to see invocation)
error: command 'clang++' failed with exit status 226

@shssf shssf closed this Sep 14, 2020
@shssf shssf deleted the aot_prelink_kernels branch September 18, 2020 14:44
antonwolfy referenced this pull request in antonwolfy/dpnp Sep 14, 2022
abagusetty pushed a commit to abagusetty/dpnp that referenced this pull request May 27, 2026
… __del__ against shutdown races

Closes audit items IntelPython#5 and IntelPython#29 from the prior solver review. Item IntelPython#4
(_matmat default uses a per-column matvec loop) is closed as wontfix:
SciPy's scipy.sparse.linalg.LinearOperator and cupyx's analogue both
ship the same hstack-of-matvecs default, so dpnp matches the
reference exactly and there is no portable improvement to make
without subclass-level _matmat overrides (which _CustomLinearOperator
already exposes via its matmat= constructor argument).

scipy/sparse/linalg/_interface.py
  - Set __array_ufunc__ = None on the LinearOperator base class.
    This is the SciPy contract: a host numpy.ndarray on the left of
    np_array * linop or np_array @ linop previously triggered
    NumPy's ufunc dispatch first, which would attempt to broadcast
    the operator element-wise before falling back to its reflected
    operator method -- producing either an opaque error or a wrong-
    typed result. With __array_ufunc__ = None NumPy returns
    NotImplemented from the ufunc protocol and Python's operator
    dispatch falls through cleanly to LinearOperator.__rmul__ /
    __rmatmul__. dpnp.ndarray itself sets __array_ufunc__ = None
    (see dpnp/dpnp_array.py:222) for the same reason, so the two
    dispatch systems now agree.

scipy/sparse/_csr.py, scipy/sparse/linalg/_iterative.py
  - Harden __del__ in csr_matrix and in _CachedSpMV against the
    interpreter-shutdown race where the compiled _sparse_impl
    extension is garbage-collected before the matrix instance whose
    oneMKL handle it owns. Previous code used a single
    except Exception: pass which silenced two qualitatively
    different failure modes:
      1. shutdown race -- extension gone, si._sparse_gemv_release
         evaluates to None or AttributeError; the handle is
         unrecoverable and leaving the OS to reclaim it at process
         exit is the only sane option;
      2. genuine backend error while the interpreter is healthy --
         a real bug we want to surface eventually, but raising from
         __del__ produces only an 'Exception ignored in:' warning
         and the handle is gone either way.

    The new code probes getattr(si, '_sparse_gemv_release', None)
    explicitly so case (1) takes the fast non-call path, and then
    splits the except into (AttributeError, TypeError) for case (1)-
    style residuals (queue / handle attribute access racing the
    shutdown) versus a final broad except for case (2). Both still
    return silently from __del__ -- raising is never valid here --
    but the intent is now documented and a real backend regression
    is no longer indistinguishable from the GC race in code review.

tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py
  - test_array_ufunc_opt_out: asserts the __array_ufunc__ = None
    marker is present on LinearOperator. Mirrors SciPy's own test
    suite test_interface.py::test_array_ufunc_opt_out.
  - test_numpy_scalar_times_linop_dispatches_to_rmul: the concrete
    runtime consequence -- numpy.float64(2.0) * linop must
    produce a scaled LinearOperator, not raise or yield an array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant