I need to incorporate a small amount of 1080p HD shots in a 2160p 4K composition. I’m not impressed by default upscaling techniques like lanczos/bicubic/trilinear; it seems like my television does a better job of upscaling 1080p in real time. It may be time to learn some new tricks.
The FFmpeg manual mention a technique called Super Resolution, and it comes in two forms:
SRCNN: Super-Resolution Convolutional Neural Network model
ESPCN: Efficient Sub-Pixel Convolutional Neural Network model
Usually I can find many online sources for figuring out how to do something with FFmpeg. But I can’t seem to find any tutorials on how to use this. I don’t really understand it. I need to train (how?) a model (from what?), or get a pre-trained model (where?). I’d like to know how to get from 1080p to 2160p using this technique, assuming I know nothing about it.
I realise this question is pretty old now but it still comes up quite high in search results so I would like to document how I got the "sr" to work (August 2020) in case it can help someone else. Before proceeding it is worth saying that I was not overly impressed by the result. I felt on my test videos that lanczos did an ever so slightly better job. Bear in mind that my test was upscaling from 640x360 flv file to 1920x1080 mp4. I have not tested many other videos and I also have quite a lot to learn about SR like if I feed it different training material will I get a better result.
What you will need
THIS GUIDE IS OFFERED AS IS WITH NO GUARANTEES IT WILL WORK FOR YOU. PLEASE TAKE CARE AND ONLY FOLLOW IF YOU KNOW WHAT YOU ARE DOING.
My starting point is a fresh install of Xubuntu 20.04. I have a GeForce GTX 1050TI GPU that is not my main GPU and there is no monitor connected to it. I do not think it should matter though if it is your main GPU or if there is a monitor connected.
Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb sudo apt-get update wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb sudo apt-get update
Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-450
REBOOT, then check that GPUs are visible using the command:
nvidia-smi. Do not proceed if you do not see a table output. If you have a message saying there is no device then you need to troubleshoot this stage. For example in my first build the BIOS was set to use the onboard graphics card first and changing it to use the offboard card got it working. I didn't have this problem in the last build on a different PC.
Install development and runtime libraries (~4GB). Here we are installing both CUDA 10.0, 10.1. The training scripts and filter are a couple of years old now and we therefore need to accommodate them.
sudo apt-get install --no-install-recommends cuda-10-0 libcudnn7=18.104.22.168-1+cuda10.0 libcudnn7-dev=22.214.171.124-1+cuda10.0
Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install --no-install-recommends libnvinfer6=6.0.1-1+cuda10.0 libnvinfer-dev=6.0.1-1+cuda10.0 libnvinfer-plugin6=6.0.1-1+cuda10.0
And now the 10.1 versions
sudo apt-get install --no-install-recommends cuda-10-1 libcudnn7=126.96.36.199-1+cuda10.1 libcudnn7-dev=188.8.131.52-1+cuda10.1 sudo apt-get install --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 libnvinfer-dev=6.0.1-1+cuda10.1 libnvinfer-plugin6=6.0.1-1+cuda10.1
The guide above suggests to install venv here but this was not a concern for me so I left it out. If you are familiar with Python then go ahead and use venv.
sudo apt install python3-dev python3-pip sudo pip3 install --upgrade tensorflow
Add some libraries temporarily to LD_LIBRARY_PATH. We will make this permanent a little later.
You will notice I am adding cuda 10.2 here as well but we did not install it. For some reason one of the library files (libcublas) when installing 10.1 is taken from 10.2.
We can verify with the below (check for errors). There is a lot of output, mainly information and you want to check that it has seen your GPU and has successfully opened all the libraries.
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
Add LD_LIBRARY_PATH to ld.so.conf.d
sudo vi /etc/ld.so.conf.d/cuda-additional.conf
Add the lines
The other 2 paths are automatically added for us during install.
FROM https://ffmpeg.org/ffmpeg-filters.html#sr-1 Install the TensorFlow for C library. If you follow the link given at the above it shows you Tensorflow version 2.3.0 (or later), however, as mentioned earlier the filter code is old and requires version 1.15.0. So:
wget https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz sudo tar -C /usr/local -xzf libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz
FROM https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu Now we fetch and compile ffmpeg. So install some dependencies first:
sudo apt-get update -qq && sudo apt-get -y install autoconf automake build-essential cmake git-core libass-dev libfreetype6-dev libgnutls28-dev libsdl2-dev libtool libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev pkg-config texinfo wget yasm zlib1g-dev mkdir -p ~/ffmpeg_sources ~/bin sudo apt-get install nasm libx264-dev libx265-dev libnuma-dev libvpx-dev libfdk-aac-dev libmp3lame-dev libopus-dev libunistring-dev cd ~/ffmpeg_sources && wget -O ffmpeg-snapshot.tar.bz2 https://ffmpeg.org/releases/ffmpeg-snapshot.tar.bz2 && tar xjvf ffmpeg-snapshot.tar.bz2 && cd ffmpeg && PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure --prefix="$HOME/ffmpeg_build" --pkg-config-flags="--static" --extra-cflags="-I$HOME/ffmpeg_build/include" --extra-ldflags="-L$HOME/ffmpeg_build/lib" --extra-libs="-lpthread -lm" --bindir="$HOME/bin" --enable-gpl --enable-gnutls --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libtensorflow --enable-nonfree && PATH="$HOME/bin:$PATH" make && make install && hash -r
Notice that I have left out libaom and added libtensorflow. This has created new binaries under ~/bin and added it to your profile and path so it can be called from anywhere. If you do not want to replace your existing ffmpeg then be sure to change the bindir and paths as required.
Reload our profile
So now we are almost at the stage where we can start to generate and train the models, however, the scripts are old so we need to modify them somewhat. Get them first (you can specify any path to download it to or get it manually with GUI:
cd ~ wget https://github.com/XueweiMeng/sr/archive/sr_dnn_native.zip unzip sr_dnn_native.zip rm sr_dnn_native.zip cd sr-sr_dnn_native
Now edit all python files that have "import tensorflow as tf", i.e.
datasets/prepare_dataset.py datasets/prepare_div2k_dataset.py evaluate.py generate_header_and_model.py models/model_vespcn.py models/model_espcn.py models/image_warp.py models/model_srcnn.py models/model.py models/dataset.py models/model_vsrnet.py train.py
After the imports in each script add the line:
tf = tf.compat.v1
For example using datasets/prepare_dataset.py:
import os import argparse from tqdm import tqdm import cv2 import numpy as np import json import tensorflow as tf from PIL import Image tf = tf.compat.v1 class SceneChangeDetector:
There is one other depreciated function we need to take care of and that is imresize.
Edit the file
Replace line 8:
from scipy.misc import imresize
from PIL import Image
Replace line 127:
frame_lr = imresize(frames[k], (lr_h, lr_w), interp='bicubic')
frame_lr = np.array(Image.fromarray(frames[k]).resize(size=(lr_h, lr_w)))
We can finally start generating. Install some dependencies.
sudo pip3 install Pillow tqdm opencv-python
Generate (expect it to take a good few hours and you will likely see a lot of information output and warnings of almost depreciated functions, these are ok):
This will download 11 videos totalling around 43GB and images totalling around 9GB and can take a few hours to run depending on hardware.
And now for the training (expect these to take about a day each). The models after this will be found in the extracted
~/sr-sr_dnn_native folder if that is where you saved the Git zip file to:
sh train_srcnn.sh python3 generate_header_and_model.py --model=srcnn --ckpt_path=logdir/srcnn_batch_32_lr_1e-3_decay_adam/train
sh train_espcn.sh python3 generate_header_and_model.py --model=espcn --ckpt_path=logdir/espcn_batch_32_lr_1e-3_decay_adam/train
Once the training is complete you can use the filter. It seems as though ESPCN only upscales by a factor of 2 whereas with SRCNN you can specify 2, 3, or 4. I moved the models generated into the same folder as my videos for ease of use but you can simply point to the models wherever they are.
ffmpeg -i <input_video> -vf sr=dnn_backend=tensorflow:scale_factor=3:model=srcnn.pb -q 15 -preset slow <output_video> ffmpeg -i <input_video> -vf sr=dnn_backend=tensorflow:model=espcn.pb -q 15 -preset slow <output_video>
Here is an interesting video of this in use and was probably one of the only places I have seen anyone use this. In it he suggests it can only work on a single plane but I did not find that to be the case. His results actually appear to be quite good:
This is the ffmpeg command he refers to in the video at 14:44:
ffmpeg -i my_video_540p.mp4 -filter_complex "format=pix_fmts=yuv420p,extractplanes=y+u+v[y][u][v];[y] sr=dnn_backend=tensorflow:scale_factor=2:model=espcn.model [y_scale];[u] scale=960:-2 [u_scale];[v] scale=960:-2 [v_scale];[y_scale][u_scale][v_scale] mergeplanes=0x001020:yuv420p [merged]" -map "[merged]" -c:v libx264 -crf 18 -pix_fmt yuv420p my_video_1080p.mp4
Correct answer by Mark on November 29, 2020
Since Google Summer of Code 2018, FFMpeg has supported the
sr filter for applying super-resolution methods based on convolutional neural networks. However, as you have discovered, few super-resolution tutorials exist, and compiling FFMpeg with proper libraries and preparing models for super-resolution requires expert knowledge.
To make super-resolution in FFMpeg easier, Mikuláš and I have taken the excellent answer of Mark, and we used it to prepare a Docker image with FFMpeg and Libtensorflow. We also wrote a tutorial that explains step-by-step how to use the Docker image for super-resolution in FFMpeg: https://github.com/MIR-MU/ffmpeg-tensorflow#ffmpeg-with-libtensorflow.
After installation, super-resolution in FFMpeg becomes as easy as this:
$ wget https://media.xiph.org/video/derf/y4m/flower_cif.y4m $ ffmpeg-tensorflow -i flower_cif.y4m -filter_complex ' > [0:v] format=pix_fmts=yuv420p, extractplanes=y+u+v [y][u][v]; > [y] sr=dnn_backend=tensorflow:scale_factor=2:model=espcn.pb [y_scaled]; > [u] scale=iw*2:ih*2 [u_scaled]; > [v] scale=iw*2:ih*2 [v_scaled]; > [y_scaled][u_scaled][v_scaled] mergeplanes=0x001020:yuv420p [merged] > ' -map [merged] -sws_flags lanczos > -c:v libx264 -crf 17 -preset ultrafast -tune film > -c:a copy > -y flower_cif_2x.mp4
Compare upscaling using Lanczos (left) with the ESPCN super-resolution model (right):
Answered by Witiko on November 29, 2020
If all you need to do is superscale an image, going through the trouble of training neural networks is re-inventing the wheel. By all means, if you're studying computer science and are interested in AI/ML, I'd encourage you to look into it, but to just superscale an image, you don't need to train a neural network. There are tools available.
In DaVinci Resolve, for instance, all you need to do is right-click on a clip in the media pool, select "clip attributes...," then at the bottom of the "video" tab, change "Super Scale: None" to 2x, 3x, or 4x. I'm not sure if this feature is available in the free version, but I wouldn't be surprised if it is, because all they've done is integrated the open source ML bits of FFMPEG for you, trained the models for you, and bundled it as a feature. I'm sure Adobe has an equivalent feature, though I haven't used it.
Answered by Jason Conrad on November 29, 2020
1 Asked on October 28, 2021
1 Asked on October 28, 2021 by bakhanov-a
1 Asked on October 28, 2021 by khawar-raza
1 Asked on October 28, 2021 by caius-jard
2 Asked on October 28, 2021
1 Asked on October 28, 2021
1 Asked on October 28, 2021 by diatomym
2 Asked on October 28, 2021 by rickmer
2 Asked on October 28, 2021 by steve-f
1 Asked on October 28, 2021
2 Asked on October 28, 2021 by truecp5
2 Asked on October 28, 2021 by theredstoneguy
1 Asked on October 28, 2021 by navy-seal
1 Asked on October 28, 2021 by user26625
1 Asked on October 28, 2021 by andrei-irimie
Get help from others!