High resolution cameras are becoming increasingly popular nowadays, with the resolution parameter ever growing due to advancements in the latest sensor technology.
The newest of them offer a remarkable resolution of 50 or more Megapixel pushing the data bandwidth to the edge where it can become a bottleneck.
In the past, high resolution meant slow transfer speed (low fps), which was far from ideal for a smooth video stream on the monitor, where you would expect to get at least 20-30 fps, meaning an output with low latency and not just a sequence of separate frames.
Even the simplest task of getting the full RAW stream from the image sensor to the computer with maximum bit depth and at full image resolution can be complicated.

Fig.1. The high resolution 48 MP camera from XIMEA with active EF-mount
Camera vendor XIMEA offers a portfolio of various types of industrial cameras, including the xiB series.
The xiB camera line includes a model called CB500 with a 48 MPix Global shutter CMOS image sensor providing 22 Fps (frames per second) at 12-bit readout or 30 Fps at 8-bit.
This results in a substantial data stream where equipping the camera with a PCI-Express interface that secures 20 Gbits throughput comes in handy.
The real data stream of this 8K camera model can go up to 1550 MBytes/s.
One option to handle all this RAW data is to store them to high-end SSD, yet there could be a more effective solution.
It is possible to process the data with the help of GPU and save the compressed color frames to conventional SSD with the final data rate being less than 500 MB/s.
This procedure allows to solve important tasks of real-time applications: image acquisition, RAW image processing, image compression and storage.
One more option is to send the data from the XIMEA CB500 camera directly to system memory and then copy it to the NVIDIA GPU memory, thus all image processing will be done on the GPU.
Below are two pipelines and corresponding benchmarks for the CB500 camera on NVIDIA GeForce RTX 2080ti.
The first is an example of a common pipeline and the second is a representation of a Preview mode.

Fig.2. Fast CinemaDNG Processor on CUDA
This is not a full image processing pipeline, but an example of a common one.
It includes camera calibration data (dark frame, flat field, dcp profile, lcp profile) and has an option for JPEG compression to get output bandwidth to around 400-450 MB/s, which should comply with conventional SSD.
These benchmarks are for the camera application at full resolution and 12-bit output.
The pipeline can be tested with Fast CinemaDNG Processor software in offline mode to tune the parameters, check image quality and performance to implement them in real time afterwards.

Fig.3. Fastvideo SDK for Jetson
Time for GPU processing on NVIDIA GeForce RTX 2080ti could be around 30-40 ms per frame, which is faster than the maximum frame rate of the camera.
For a more complicated image processing pipeline that can include bad pixels removal, denoising, intermediate color space transforms, defringe, resize, rotate, crop, sharp, histogram, parade, image and video compression, etc., the second GPU could assist in accomplishing these tasks in realtime.
If there is only one GPU, the total time to process one frame (GPU + CPU) could reach 60-70 ms, making it important to optimize both software and hardware to get the maximum performance from the system. To create a fast multithreaded solution with the XIMEA CB500 camera, both high-end software and hardware (CPU, GPU, SSD) are essential.
For workflows such as these, JPEG storage to fast SSD or NVMe is implemented in a separate CPU thread, making it asynchronous and thus the time of jpg storing is not added to the total time.
Video output is also implemented in a separate thread - the main idea is to divide the whole task into parts and process them in parallel both on CPU and GPU to get the maximum performance.
Here you can review the results of time measurements when running in preview mode.
This mode is used when it is not necessary to compress and to store processed frames – in such a case, the image processing pipeline is very simple and performance is higher.
It's possible to get even better results by overlapping the host-to-device transfer with computations to exclude the time this takes from the benchmarks.
The above benchmarks do still include this time.

Fig.4. Application example: Print circuit board or PCB inspection
There are a lot of different tasks that require a camera with high image resolution at 12-bit and high speed.
For example, the CB500 camera model is successfully utilized in applications like:
Aerial mapping, 3D scanning, flat panel inspection (FPD), solar panel analysis, printed circuit board (PCB) examination, wide area surveillance, persistent stadium and border security, cinematography, sports and entertainment, 360 panorama, UAV and Autonomous, Unmanned vehicles, etc.
The Fastvideo company offers high performance software solutions for such applications and most of them are based on GPU image processing pipeline from the PRO version of Fast CinemaDNG Processor software, which is highly optimized and has a digital cinema workflow inside for excellent image quality.
Additionally, the CB500 camera comes with a flat ribbon flex cable ( MX500 model ), making it a perfect fit for embedded vision systems or multiple camera setups.
Software for this kind of complex, integrated solutions is based on Fastvideo SDK for Jetson and is available for: NVIDIA TK1, TX1, TX2, TX2i and AGX Xavier hardware*.
With the help of Fastvideo, it is also possible to implement the desired pipeline for any specific task using the CB500 camera and similar models in a single or multi-camera system.
For example, you can take advantage of 12-bit per channel JPEG compression on a GPU at the end of the image processing pipeline to compress and to store more information at every frame.
Credentials
Fastvideo Blog:
https://www.fastcompression.com/blog/48-mpix-ximea-camera-gpu-processing.htm