Support Support Home » Vision Libraries » Fastvideo » Artificial Intelligence

Neural Networks

A picture illustrating artificial intelligence, with a high-tech robot on a blue dataset background

Preparing datasets for deep learning
using embedded vision and multi camera setup

Preparing datasets is an important step in any deep learning program. Embedded vision and/or a multi camera setup offer the chance to gather high quality data for applications in almost all areas of life.

How to prepare data set for AI application and Deep learning?

AI (Artificial Intelligence) based applications are becoming more and more popular in various fields nowadays.
They already solve tasks with different levels of complexity, often being faster and more reliable than humans.

This article is focused on implementations which are based on image and video processing like UAV control, self-driving cars, driverless trains, boats or mobile robots.

Data is key in deep learning projects

Any of the automated systems mentioned above will need a lot of data for inference purposes to learn how to behave.
Getting this high quality data for a particular set of situations is a crucial starting point, posing an important question: where or how to obtain such dataset.

Using just any standard training set is a possibility, but it does not usually correspond to the real situation which requires to be managed.
Feeding the neural network with such material will not provide enough confidence that the system will behave correctly.
Most of such data setups, especially the complex ones, are therefore built on a collection of real life examples.
For autonomous cars, this would mean the practical installation of the necessary number of specific embedded camera models, thus creating a multi camera setup on the vehicle, and running a plethora of recordings.

A picture illustrating the deep learning approach with the help of two images of dogs, one image of a cat, one image of a honey badger and two diagrams

Important parts of a typical multi camera setup

Deep learning sets will then depend on a particular camera and image processing algorithms and such a camera system generates some artifacts.
Which is why the following contents and aspects need to be considered when assembling a camera setup for data gathering:

  • Camera (image sensor, bit depth, resolution, FPS, S/N, firmware, mode of operation, etc.)
  • Lens, its control and settings
  • Camera and lens calibration and testing
  • Software for image/video processing
  • Different illumination conditions
  • How to handle multicamera or embedded solutions

How to train a neural network?

For example, in the case of NVIDIA DALI project, the workflow starting point is to utilize a standard image database.
Decode JPEG images and then apply several image processing transforms to train the network on changed images which could be derived from the original set via the following operations:

  • jpeg decoding
  • exposure change
  • resize
  • rotation
  • color correction (augment)

A picture illustrating the workflow of the NVIDIA DALI project, with the help of a diagram, starting with the standard image database and ending with the neural network

This could be an artificial way how to significantly increase the number of images in the database.
It a virtual increase, but images are not the same and such an approach turns can be useful.

In fact, something like this can be done for video as well by getting video in RAW and then choosing different sets of parameters for GPU-based RAW processing to multiply new image series.
Provided the original RAW video is of high enough quality, many more different videos can be prepared for use in neural network training. Such GPU-based RAW processing takes minimum time.

List of transforms applied to RAW data

Combining XIMEA embedded cameras for video recordings and Fastvideo SDK for raw image/video processing the following can be achieved:

  • exposure correction
  • denoising
  • color correction
  • color space transforms
  • 1D and 3D LUT in RGB/HSV
  • crop and resize
  • rotation
  • geometric transforms
  • lens distortion/undistortion
  • sharp
  • any image processing filter
  • gamma

This is also the approach to simulate through software different lighting conditions in terms of exposure control and spectral characteristics of illumination.
Possible to simulate are various lenses and orientations, so the total number of new videos/pictures for training could be increased in magnitude.
There is no need to save these processed videos, they can be generated on-the-fly by doing realtime RAW processing on GPU.
These are the basics of how to prepare a dataset for deep learning and what type of equipment is needed for a multi camera setup.

Fastvideo Blog: