Support Support Home » APIs » XIMEA Linux Software Package » Jetson Nano Benchmarks

Jetson Nano with Embedded vision cameras - Benchmarks

The NVIDIA Jetson Nano Developer Kit on a white background

Fig.1. Jetson Nano Developer Kit

NVIDIA Jetson Nano - new module

The top beneficiaries of the latest release of NVIDIA Jetson Nano hardware are the Embedded imaging applications.
Essentially, the new NVIDIA Jetson Nano is a very small, powerful enough computer with an integrated GPU that lets you run multiple neural networks in parallel for applications like image classification, object detection, segmentation, and speech processing.

The tested XIMEA camera families so far include - xiQ, xiMU and xiC models - for more details check HERE

Below are already some of the benchmarks and results from testing Image & Video Processing SDK from Fastvideo with Jetson Nano Developer Kit.
These are specific for camera applications.


The NVIDIA Jetson Nano module on a white background

Fig.2. NVIDIA Jetson Nano module


Useful links:
Jetson Nano Presentation
Jetson Nano Product Brief
Getting Started with AI on Jetson Nano
Jetson Family Presentation

NVIDIA Jetson Nano specifications

According to CUDA Device Query application, the classification of the tested Jetson Nano module is NVIDIA Tegra X1 with CUDA Capability 5.3.
So it resembles Jetson TX1, but with half of CUDA Cores.

  • 128-core Maxwell GPU (for display and computing)
  • Quad-core ARM A57 @ 1.43 GHz (main CPU)
  • 4 GB LPDDR4 (rated at 25.6 GB/s)
  • Gigabit Ethernet
  • 4x USB 3.0, USB 2.0 Micro-B (the Micro USB port could be utilized both for 5V power input and for data)
  • HDMI 2.0 & eDP 1.4 (4K monitor support, HDMI or Display Port)
  • Support of MIPI CSI-2 and PCIe Gen2 high-speed I/O
  • DC Barrel jack for 5V power input
  • Storage microSD
  • Dimensions: 100 mm × 80 mm × 29 mm (including the carrier board)

Video Encoding and Decoding Options

Following are NVIDIA NVENC and NVDEC benchmarks:

  • Video Encode:
    4K at 30 fps, 4x for 1080p at 30 fps, 9x for 720p at 30 fps (H.264 / H.265)

  • Video Decode:
    4K at 60 fps, 2x for 4K at 30 fps, 8x for 1080p at 30 fps, 18x for 720p at 30 fps (H.264 / H.265)

Hardware and software used for benchmarking

  • CPU/GPU NVIDIA Jetson Nano Developer Kit
  • OS L4T (Ubuntu 18.04)
  • JetPack 4.2 with CUDA Toolkit 10.0
  • Fastvideo SDK 0.14.1

NVIDIA Jetson Nano Power Consumption and Power Management

In the case of Jetson Nano, NVIDIA uses the Dynamic Voltage and Frequency Scaling (DVFS) approach.
This power management technology is utilized in most modern computer hardware to maximize power savings - the voltage used in a component is increased or decreased depending on external conditions.

Jetson Nano Developer Kit is configured to accept power via the Micro USB connector.
Some Micro USB power supplies are designed in a way to output slightly more than 5V to account for voltage loss across the cable.
The critical point is that the new NVIDIA Jetson Nano module requires a minimum of 4.75V to operate.
It's recommended to use a power supply capable of delivering 5V to the J28 Micro-USB connector.

There are some other power supply options for Jetson Nano.
If the total load is expected to exceed 2A, e.g., due to peripherals attached to the carrier board or due to high performance computational tasks, you have to lock the J48 Power Select pins, disable power supply via Micro USB and enable 5V-4A via the J25 power jack.
Another option is to supply 5V-6A via the J41 expansion header - two 5V pins can be used to power the developer kit at 3A each.
The NVIDIA Jetson Nano Developer Kit is equipped with a passive heatsink, to which a fan can be mounted.


A diagram of the top view of the NVIDIA Jetson Nano Developer Kit on a white background
Fig.3. Top View of Jetson Nano Developer Kit


NVIDIA Jetson Nano module is designed to optimize power efficiency and supports two software-defined power modes.
The default mode provides a 10W power budget for the module and the other a 5W budget.
These power modes restrain the 10W or 5W budgets by capping the GPU and CPU frequencies and the number of online CPU cores.
Individual parts of the CORE power domain, such as video encode (NVENC) and video decode (NVDEC), are not covered by these budgets.

The carrier board consumes between 0.5W (at 2A) and 1.25W (at 4A) with no peripherals attached.
According to the tests, the normal operation of the Jetson Nano Developer Kit in 10W mode requires more power than USB can offer (5V and 2A).
USB-powered NVIDIA Jetson Nano can't work continuously under heavy workload on default clock (no jetson_clocks applied).
USB-powered Jetson Nano is working perfectly in 5W mode, but with less performance.

For the below benchmark measurements the external power supply with 5V and 4A was used.
Even better performance could be achieved by supplying more power.

To manage the speed and the amount of power consumed use:
nvpmodel -m0 and jetson_clocks to get maximum performance.

NVIDIA Jetson Nano Benchmark Performance Clarification

The following image processing kernels, which are conventional for camera applications, were used as examples for benchmarks:
white balance, demosaic, color correction, LUT, resize, gamma, jpeg / jpeg2000 / h.264 encoding, etc.

To evaluate the total time for the chosen set of modules, GPU kernel time for each image processing module was measured.
The performance of some modules depends on image content.
CUDA initialization and GPU memory buffers allocations are not included in the benchmarks.
Usually, it is done just once, before the measurements, so it doesn't affect GPU performance.

All computations were performed with 16-bit precision.
Before JPEG compression the 16-bit data was converted to the 8-bit per channel to comply with JPEG Standard.
JPEG2000 compression benchmarks were measured for 24-bit images with 4:4:4 subsampling.
The last row of each Table shows the total values for the GPU kernel pipeline.

Table for 2K RAW

Table 1. NVIDIA Jetson Nano performance benchmarks for 2K raw image processing (1920×1080, 8-bit)

Algorithm and parameters Kernel time (ms) Performance (MB/s) Frames per second
Host to Device 0.2 10,000 --
White Balance 0.6 6,500 1,660
HQLI Debayer 1.8 2,200 550
DFPD Debayer 4.7 850 212
MG Debayer 12.7 315 78
Color Correction with 3×4 matrix 1.7 7,000 588
Resize from 2K to 960×540 10.0 600 100
Resize from 2K to 1919×1079 19.8 303 50
Gamma (1920×1080) 1.4 8,500 710
JPEG Encoding (1920×1080, 90%, 4:2:0) 4.3 1,400 230
JPEG Encoding (1920×1080, 90%, 4:4:4) 6.8 880 147
JPEG2000 (lossy, 32×32, single mode) 81 74 12
JPEG2000 (lossless, 32×32, single mode) 190 31 5
Device to Host 0.1 10,000 --

In real life camera applications, there is a possibility to eliminate Host to Device copy by utilizing Jetson Zero-Copy. In that case, an image from a camera is written via DMA directly to pinned buffer in system memory.
Pinned buffer is accessible in both CPU and GPU.
In another option, Device to Host copy could be hidden by overlapping of data transfer and computations in multi-thread applications.
NVIDIA Jetson Nano can do concurrent copy and kernel execution with 1 copy engine.

The simplest image processing pipeline for 2K images on NVIDIA Jetson Nano can reach 100 fps performance.
If for the same pipeline the H.264 encoding is utilized via hardware-based NVENC (instead of Fastvideo CUDA-based Motion JPEG encoding) you can reach a total of 120 fps, which is the limitation of H.264 encoder (NVENC) for 2K resolution.

Table for 4K RAW

Table 2. NVIDIA Jetson Nano performance benchmarks for 4K raw image processing (3840×2160, 8-bit)

Algorithm and parameters Kernel time (ms) Performance (MB/s) Frames per second
Host to Device 0.8 10,000 --
White Balance 2.2 7,200 455
HQLI Debayer 7.1 2,250 141
DFPD Debayer 18.2 880 55
MG Debayer 50.3 318 20
Color Correction with 3×4 matrix 6.9 7,000 145
Resize from 4K to 1920×1080 39.4 610 25
Resize from 4K to 3839×2159 77.9 308 12
Gamma (3840×2160) 5.7 8,400 175
JPEG Encoding (3840×2160, 90%, 4:2:0) 17.1 1,400 58
JPEG Encoding (3840×2160, 90%, 4:4:4) 27.3 880 36
JPEG2000 (lossy, 32×32, single mode) 309 77 3
JPEG2000 (lossless, 32×32, single mode) 620 38 1.6
Device to Host 0.2 10,000 --

The same image processing pipeline for 4K RAW image on NVIDIA Jetson Nano can achieve 30 fps.
If the H.264 encoding is utilized via hardware-based NVENC (instead of Fastvideo JPEG or MJPEG on GPU), the result of 30 fps will stay the same because it is the maximum for H.264 encoder (NVENC) for 4K resolution, but GPU occupancy, in that case, would be less.

Summary

It is clear that NVIDIA Jetson Nano has sufficient performance for image processing in camera applications.
For resolutions up to 4K you can get realtime performance to convert RAW to RGB with JPEG or H.264 compression.

Published here is just a small part of Jetson Nano benchmarks that were performed with Fastvideo SDK.
You can test the Fastvideo SDK with XIMEA cameras and your image processing pipeline.

Credentials
Fastvideo Blog:
https://www.fastcompression.com/blog/jetson-nano-benchmarks-image-processing.htm