Support Support Home » APIs » XiAPI » XiApi Manual » XiAPI CUDA

xiAPI CUDA support


XIMEA API can write image data to memory allocated by NVIDIA CUDA runtime library on supported configurations.
This can save you one cudaMemcpy operation for copying the data from CPU to device memory on each acquired frame.
This is especially useful on systems with a physically unified memory like TX1, TX2 or Xavier.
XIMEA API also has support for GPUDirect technology described on a separate page.

Requirements

  • 64 bit Linux or Windows system with NVIDIA GPU
  • Camera models from following lines: xiB, xiB-64, xiC, xiX, xiJ, xiT
  • XIMEA API package version 4.17.01 (4.17.14 for Windows) or later
  • proprietary NVIDIA video drivers
  • CUDA toolkit version 6 or later (tested with versions 8.0 and 9.2)

How to enable

Set relevant xiApi parameters in your code:

xiSetParamInt(handle, XI_PRM_BUFFER_POLICY, XI_BP_UNSAFE);
xiSetParamInt(handle, XI_PRM_IMAGE_DATA_FORMAT, XI_FRM_TRANSPORT_DATA);
xiSetParamInt(handle, XI_PRM_TRANSPORT_DATA_TARGET, XI_TRANSPORT_DATA_TARGET_ZEROCOPY); // or XI_TRANSPORT_DATA_TARGET_UNIFIED

Note that you can't use safe buffer policy or image format other than transport data.

Unified memory

XIMEA API allocates unified CUDA memory using cudaMallocManaged function and then pins it to CPU RAM using cudaMemAdvise call setting cudaMemAdviseSetPreferredLocation to cudaCpuDeviceId.
CUDA device must have cudaDevAttrConcurrentManagedAccess attribute set to true for this feature to work.
Pointer returned in bp field of XI_IMG structure in xiGetImage call can be used therefore both in CPU and GPU code.

Zerocopy memory

XIMEA API allocates zerocopy CUDA memory using cudaHostAlloc function with cudaHostAllocMapped flag set.
Pointer returned in bp field of XI_IMG structure in xiGetImage call is for access from CPU.
It needs to be converted into device pointer for use in CUDA kernels with cudaHostGetDevicePointer function unless cudaDevAttrCanUseHostPointerForRegisteredMem device attribute is true, then the pointer can be used as is.

Sample application

Attached is an example demonstrating discussed features: xiCUDASample.tar.bz2.
It is based on 3_Imaging/histogram from CUDA samples which computes 64-bin histogram on GPU.
Results are displayed on the terminal using ASCII-art.
Also this application prints time measurements, so you can compare the time needed for running the computation with different options.
Please refer to readme.txt included in the tarball for building instructions.