Peter Steinbach, Jeffrey Kelling (presenter)
steinbach@scionics.de
presenter
author
service provider to the Max Planck Institute of Molecular Cell Biology and Genetics
member of the GPU Center of Excellence (community of industrial and academic developers/scientists using GPUs)
code snippets
presentation links
open an issue for questions
Scientific Motivation
Sqeazy library
3D rendering of Drosophila embryogenesis time-lapse data reconstructed from 5 angles SPIM recording
today:
scientists would like to capture long timelapses 1-2 days (or more)
total data volume per 1-2 day capture:
150-300 TiB raw volume
= 57 - 114 kEUR in SSDs
3D in space = 2D in space + time!
using ffmpeg framework to interface sqeazy to
support CPU and GPU based encoding/decoding
enable future directions to non-x86 platforms
Linux, macOS, Windows supported
steep learning curve for using libavcodec API
for this talk: ffmpeg 3.0.7
rarely any single library supports hardware accelerated video encoding uniformly across platforms
ffmpeg+nvenc meets our production requirements
encapsulates external dependencies (easier comparison)
hardware
software
simple workflow based on ffmpeg performed on all:
x265 is slow, but does provide high compression
codec preset study ongoing with downstream analysis/processing
GPUs to the rescue?
$ time ffmpeg -i input.y4m -c:v nvenc_h264 -preset llhp -2pass 0 ...
$ nvprof --print-api-trace ffmpeg -i input.y4m -c:v nvenc_h264 ...
nvprof api trace: time delta from cuCtxCreate/cuCtxDestroy
nvenc codec consumes 30-50% of the ffmpeg process time only
ffmpeg induces quite some overhead on top of nvenc!
$ nvprof ffmpeg -i input.y4m -c:v nvenc_h264 -preset llhp -2pass 0 -gpu 1 -y output.h264
NvEncodeLowLatency timings:
tough business given modern CMOS cameras (around 1GB/s at 16bit greyscale)
multi-core implementations very competitive
(either in compression ratio or speed)
many codecs available
manu configuration parameters
many bit ranges coming about (8,10,12 bits)
nvenc through ffmpeg difficult to use/measure
(memory traffic, implementation quality poor?)
raw nvenc API suitable for high-bandwidth compression
NvEncodeLowLatency timings ignores driver and memory initialisation
(represents scenario of constant streaming/encoding)
nvenc API useful on the microscope only, i.e. in streaming mode
(at best if compression pipeline is on the device as well)
PCIe bus apparently a bottleneck
For questions, concerns or suggestions: