How do the super resolution filters in FFmpeg work?

Question

I need to incorporate a small amount of 1080p HD shots in a 2160p 4K composition. I'm not impressed by default upscaling techniques like lanczos/bicubic/trilinear; it seems like my television does a better job of upscaling 1080p in real time. It may be time to learn some new tricks.

The FFmpeg manual mention a technique called Super Resolution, and it comes in two forms:

SRCNN: Super-Resolution Convolutional Neural Network model
ESPCN: Efficient Sub-Pixel Convolutional Neural Network model

Usually I can find many online sources for figuring out how to do something with FFmpeg. But I can't seem to find any tutorials on how to use this. I don't really understand it. I need to train (how?) a model (from what?), or get a pre-trained model (where?). I'd like to know how to get from 1080p to 2160p using this technique, assuming I know nothing about it.

Mark · Accepted Answer

I realise this question is pretty old now but it still comes up quite high in search results so I would like to document how I got the "sr" to work (August 2020) in case it can help someone else. Before proceeding it is worth saying that I was not overly impressed by the result. I felt on my test videos that lanczos did an ever so slightly better job. Bear in mind that my test was upscaling from 640x360 flv file to 1920x1080 mp4. I have not tested many other videos and I also have quite a lot to learn about SR like if I feed it different training material will I get a better result.

What you will need

A CUDA enabled GPU with a capability of at least 6.0. I worked all the way through this once with a GeForce GT710 (capability of 3.5) only to get to the final ffmpeg upscaling for it to tell me I needed a capability of 6.0. It did proceed to use the CPU but was very slow. I then went through it again with a GeForce GTX 1050TI 4GB card and saw a noticable performance gain.
At least 70GB free disk space. The training scripts download about 42GB worth of videos and 9GB worth of images. The rest is used up generating the model.
Patience. Generating the datasets took a good few hours and training just one model took nearly the whole day.
I had a couple of spare computers (one server and one pc) so I was able to build this from scratch and let it run. First I tried it on Ubuntu Server 20.04 and 18.04. Final run through with the new GPU was on Xubuntu 20.04. The only reason I went with Xubuntu in the end was because I was also investigating a new USB capture device for VHS but the instructions below will work with Server.

Setup

THIS GUIDE IS OFFERED AS IS WITH NO GUARANTEES IT WILL WORK FOR YOU. PLEASE TAKE CARE AND ONLY FOLLOW IF YOU KNOW WHAT YOU ARE DOING.

My starting point is a fresh install of Xubuntu 20.04. I have a GeForce GTX 1050TI GPU that is not my main GPU and there is no monitor connected to it. I do not think it should matter though if it is your main GPU or if there is a monitor connected.

FROM https://www.tensorflow.org/install/gpu

Add NVIDIA package repositories

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

Install NVIDIA driver

sudo apt-get install --no-install-recommends nvidia-driver-450

REBOOT, then check that GPUs are visible using the command: nvidia-smi. Do not proceed if you do not see a table output. If you have a message saying there is no device then you need to troubleshoot this stage. For example in my first build the BIOS was set to use the onboard graphics card first and changing it to use the offboard card got it working. I didn't have this problem in the last build on a different PC.

Install development and runtime libraries (~4GB). Here we are installing both CUDA 10.0, 10.1. The training scripts and filter are a couple of years old now and we therefore need to accommodate them.

sudo apt-get install --no-install-recommends 
    cuda-10-0 
    libcudnn7=7.6.5.32-1+cuda10.0  
    libcudnn7-dev=7.6.5.32-1+cuda10.0

Install TensorRT. Requires that libcudnn7 is installed above.

sudo apt-get install --no-install-recommends libnvinfer6=6.0.1-1+cuda10.0 
    libnvinfer-dev=6.0.1-1+cuda10.0 
    libnvinfer-plugin6=6.0.1-1+cuda10.0

And now the 10.1 versions

sudo apt-get install --no-install-recommends 
    cuda-10-1 
    libcudnn7=7.6.5.32-1+cuda10.1  
    libcudnn7-dev=7.6.5.32-1+cuda10.1

sudo apt-get install --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 
    libnvinfer-dev=6.0.1-1+cuda10.1 
    libnvinfer-plugin6=6.0.1-1+cuda10.1

FROM https://www.tensorflow.org/install/pip#system-install

The guide above suggests to install venv here but this was not a concern for me so I left it out. If you are familiar with Python then go ahead and use venv.

sudo apt install python3-dev python3-pip
sudo pip3 install --upgrade tensorflow

Add some libraries temporarily to LD_LIBRARY_PATH. We will make this permanent a little later.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib:/usr/local/cuda-10.0/lib64:/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.2/lib64:/usr/local/cuda/extras/CUPTI/lib64

You will notice I am adding cuda 10.2 here as well but we did not install it. For some reason one of the library files (libcublas) when installing 10.1 is taken from 10.2.

We can verify with the below (check for errors). There is a lot of output, mainly information and you want to check that it has seen your GPU and has successfully opened all the libraries.

python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Add LD_LIBRARY_PATH to ld.so.conf.d

sudo vi /etc/ld.so.conf.d/cuda-additional.conf

Add the lines

/usr/local/cuda/extras/CUPTI/lib64
/usr/local/cuda-10.2/lib64

The other 2 paths are automatically added for us during install.

FROM https://ffmpeg.org/ffmpeg-filters.html#sr-1 Install the TensorFlow for C library. If you follow the link given at the above it shows you Tensorflow version 2.3.0 (or later), however, as mentioned earlier the filter code is old and requires version 1.15.0. So:

wget https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz
sudo tar -C /usr/local -xzf libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz

FROM https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu Now we fetch and compile ffmpeg. So install some dependencies first:

sudo apt-get update -qq && sudo apt-get -y install 
  autoconf 
  automake 
  build-essential 
  cmake 
  git-core 
  libass-dev 
  libfreetype6-dev 
  libgnutls28-dev 
  libsdl2-dev 
  libtool 
  libva-dev 
  libvdpau-dev 
  libvorbis-dev 
  libxcb1-dev 
  libxcb-shm0-dev 
  libxcb-xfixes0-dev 
  pkg-config 
  texinfo 
  wget 
  yasm 
  zlib1g-dev

mkdir -p ~/ffmpeg_sources ~/bin
sudo apt-get install nasm libx264-dev libx265-dev libnuma-dev libvpx-dev libfdk-aac-dev libmp3lame-dev libopus-dev libunistring-dev

cd ~/ffmpeg_sources && 
wget -O ffmpeg-snapshot.tar.bz2 https://ffmpeg.org/releases/ffmpeg-snapshot.tar.bz2 && 
tar xjvf ffmpeg-snapshot.tar.bz2 && 
cd ffmpeg && 
PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure 
  --prefix="$HOME/ffmpeg_build" 
  --pkg-config-flags="--static" 
  --extra-cflags="-I$HOME/ffmpeg_build/include" 
  --extra-ldflags="-L$HOME/ffmpeg_build/lib" 
  --extra-libs="-lpthread -lm" 
  --bindir="$HOME/bin" 
  --enable-gpl 
  --enable-gnutls 
  --enable-libass 
  --enable-libfdk-aac 
  --enable-libfreetype 
  --enable-libmp3lame 
  --enable-libopus 
  --enable-libvorbis 
  --enable-libvpx 
  --enable-libx264 
  --enable-libx265 
  --enable-libtensorflow 
  --enable-nonfree && 
PATH="$HOME/bin:$PATH" make && 
make install && 
hash -r

Notice that I have left out libaom and added libtensorflow. This has created new binaries under ~/bin and added it to your profile and path so it can be called from anywhere. If you do not want to replace your existing ffmpeg then be sure to change the bindir and paths as required.

Reload our profile

source ~/.profile

FROM https://github.com/XueweiMeng/sr/tree/sr_dnn_native

So now we are almost at the stage where we can start to generate and train the models, however, the scripts are old so we need to modify them somewhat. Get them first (you can specify any path to download it to or get it manually with GUI:

cd ~
wget https://github.com/XueweiMeng/sr/archive/sr_dnn_native.zip
unzip sr_dnn_native.zip
rm sr_dnn_native.zip
cd sr-sr_dnn_native

Now edit all python files that have "import tensorflow as tf", i.e.

datasets/prepare_dataset.py
datasets/prepare_div2k_dataset.py
evaluate.py
generate_header_and_model.py
models/model_vespcn.py
models/model_espcn.py
models/image_warp.py
models/model_srcnn.py
models/model.py
models/dataset.py
models/model_vsrnet.py
train.py

After the imports in each script add the line:

tf = tf.compat.v1

For example using datasets/prepare_dataset.py:

import os
import argparse
from tqdm import tqdm
import cv2
import numpy as np
import json
import tensorflow as tf
from PIL import Image

tf = tf.compat.v1


class SceneChangeDetector:

There is one other depreciated function we need to take care of and that is imresize.

Edit the file datasets/prepare_dataset.py

Replace line 8: from scipy.misc import imresize

With: from PIL import Image

Replace line 127: frame_lr = imresize(frames[k], (lr_h, lr_w), interp='bicubic')

With: frame_lr = np.array(Image.fromarray(frames[k]).resize(size=(lr_h, lr_w)))

We can finally start generating. Install some dependencies.

sudo pip3 install Pillow tqdm opencv-python

Generate (expect it to take a good few hours and you will likely see a lot of information output and warnings of almost depreciated functions, these are ok):

sh generate_datasets.sh

This will download 11 videos totalling around 43GB and images totalling around 9GB and can take a few hours to run depending on hardware.

And now for the training (expect these to take about a day each). The models after this will be found in the extracted ~/sr-sr_dnn_native folder if that is where you saved the Git zip file to:

SRCNN

sh train_srcnn.sh
python3 generate_header_and_model.py --model=srcnn --ckpt_path=logdir/srcnn_batch_32_lr_1e-3_decay_adam/train

ESPCN

sh train_espcn.sh
python3 generate_header_and_model.py --model=espcn --ckpt_path=logdir/espcn_batch_32_lr_1e-3_decay_adam/train

Once the training is complete you can use the filter. It seems as though ESPCN only upscales by a factor of 2 whereas with SRCNN you can specify 2, 3, or 4. I moved the models generated into the same folder as my videos for ease of use but you can simply point to the models wherever they are.

e.g.

ffmpeg -i <input_video> -vf sr=dnn_backend=tensorflow:scale_factor=3:model=srcnn.pb -q 15 -preset slow <output_video>
ffmpeg -i <input_video> -vf sr=dnn_backend=tensorflow:model=espcn.pb -q 15 -preset slow <output_video>

Here is an interesting video of this in use and was probably one of the only places I have seen anyone use this. In it he suggests it can only work on a single plane but I did not find that to be the case. His results actually appear to be quite good:

This is the ffmpeg command he refers to in the video at 14:44:

ffmpeg -i my_video_540p.mp4 -filter_complex "format=pix_fmts=yuv420p,extractplanes=y+u+v[y][u][v];[y] sr=dnn_backend=tensorflow:scale_factor=2:model=espcn.model [y_scale];[u] scale=960:-2 [u_scale];[v] scale=960:-2 [v_scale];[y_scale][u_scale][v_scale] mergeplanes=0x001020:yuv420p [merged]" -map "[merged]" -c:v libx264 -crf 18 -pix_fmt yuv420p my_video_1080p.mp4

Witiko · Answer

Since Google Summer of Code 2018, FFMpeg has supported the sr filter for applying super-resolution methods based on convolutional neural networks. However, as you have discovered, few super-resolution tutorials exist, and compiling FFMpeg with proper libraries and preparing models for super-resolution requires expert knowledge.
To make super-resolution in FFMpeg easier, Mikuláš and I have taken the excellent answer of Mark, and we used it to prepare a Docker image with FFMpeg and Libtensorflow. We also wrote a tutorial that explains step-by-step how to use the Docker image for super-resolution in FFMpeg: https://github.com/MIR-MU/ffmpeg-tensorflow#ffmpeg-with-libtensorflow.
After installation, super-resolution in FFMpeg becomes as easy as this:
$ wget https://media.xiph.org/video/derf/y4m/flower_cif.y4m
$ ffmpeg-tensorflow -i flower_cif.y4m -filter_complex '
>   [0:v] format=pix_fmts=yuv420p, extractplanes=y+u+v [y][u][v];
>   [y] sr=dnn_backend=tensorflow:scale_factor=2:model=espcn.pb [y_scaled];
>   [u] scale=iw*2:ih*2 [u_scaled];
>   [v] scale=iw*2:ih*2 [v_scaled];
>   [y_scaled][u_scaled][v_scaled] mergeplanes=0x001020:yuv420p [merged]
> ' -map [merged] -sws_flags lanczos 
> -c:v libx264 -crf 17 -preset ultrafast -tune film 
> -c:a copy 
> -y flower_cif_2x.mp4

Compare upscaling using Lanczos (left) with the ESPCN super-resolution model (right):

Jason Conrad · Answer

If all you need to do is superscale an image, going through the trouble of training neural networks is re-inventing the wheel.  By all means, if you're studying computer science and are interested in AI/ML, I'd encourage you to look into it, but to just superscale an image, you don't need to train a neural network.  There are tools available.

In DaVinci Resolve, for instance, all you need to do is right-click on a clip in the media pool, select "clip attributes...," then at the bottom of the "video" tab, change "Super Scale: None" to 2x, 3x, or 4x.  I'm not sure if this feature is available in the free version, but I wouldn't be surprised if it is, because all they've done is integrated the open source ML bits of FFMPEG for you, trained the models for you, and bundled it as a feature.  I'm sure Adobe has an equivalent feature, though I haven't used it.

How do the super resolution filters in FFmpeg work?

3 Answers

Add your own answers!

Ask a Question