Overview of Intel® oneMKL BLAS Routines for Data Parallel C++

The following pages describe the Intel® oneMKL BLAS routines for Data Parallel C++ (DPC++), all of which are declared in the header file mkl_blas_sycl.hpp.

Several conventions are used throughout this document:

  • All Intel® oneMKL for DPC++ data types and non domain specific functions are inside the oneapi::mkl:: namespace.

  • All Intel® oneMKL BLAS functions for DPC++ are inside the oneapi::mkl::blas namespace.

  • For brevity, the cl::sycl namespace is omitted from DPC++ object types, such as buffers and queues. For example a single-precision, 1D buffer A would be written buffer<float,1> &A instead of cl::sycl::buffer<float,1> &A.

  • The routines are templated on precision. Each routine has a table detailing the supported precisions.

Device Support

DPC++ supports several types of devices:

  • Host device: Performs computations directly on the current CPU.

  • CPU device: Performs computations on a CPU using OpenCL™.

  • GPU device: Performs computations on a GPU.

Each routine details the device types which are currently supported.

In the current release of Intel® oneMKL BLAS for DPC++, all standard Level1, Level2, and Level3 BLAS routines and the BLAS extensions gemmt, gemm_bias, axpy_batch, gemm_batch, and trsm_batch support the host, CPU, and GPU devices.