return to table of content

FFmpeg is getting multithreaded transcoding pipelines

It's difficult to understand what this is about without the presentation from Anton, at VideoLAN Dev Days 2023, that you can watch here: https://www.youtube.com/watch?v=Z4DS3jiZhfo&t=1221

What is difficult to understand? Does multithreaded mean something different in the realm of video transcoding?

His work is not primarily about multithreading but about cleaning up ffmpeg to be true to its own architecture so that normal human beings have a chance of being able to maintain it. Things like making data flow one way in a pipeline, separating public and private state and having clearly defined interfaces.

Things had got so bad that every change was super difficult to make.

Multithreading comes out as a natural benefit of the cleanup.

^ This: I just spent the ~20 minutes necessary to watch that part of the talk at a reasonable 1.5x speed, and that's the summary. Ffmpeg was suffering from 20 years of incremental change and lack of cleanup/refactoring, and that's what he's spent two years doing.

A couple of great lines including "my test for deprecating an option is if it's been broken for years and nobody is complaining, then definitely nobody is using it".

I often look for obscure options to get specialists tasks done in all sorts of software.

Better to fix the option.

Just because I didn’t go to the very significant effort of complaining doesn’t mean I didn’t want and burn hours trying to use that option.

The verdict of the presentation was that many options are (bad) duplicates of the filter graph, and you should configure the software through the filter graph.

We saw in openssl what the consequences of never removing any code for decades were. It has a real cost.

Always a case-by-case decision, though.

Fixing the option takes time and expertise, as well as an idea of what users actually are trying to do.

If you don't give feedback to developers, how do you expect them to know you wanted to use the option?

Better to remove a broken feature than let users burn hours futilely trying to make it work.

You can't expect open source developers to be omniscient and know you want to use a specific feature if you don't communicate that to them. Would you rather have them add telemetry?

There are many different types of pipelined processing tasks with many differing kinds of threading approaches, and I guess the video clears up what kinds of approaches work best with transcoding ..

"multi-threading it's not really all about just multi-threading - I will say more about that later - but that is the marketable term"

That's whats said in the video at least in the first 10 seconds so it might be that multi-threading is just a too trivial term for the work here. (But haven't watched the video yet so just an observation.)

Man, i really want to watch this presentation, but the piss poor audio just causes my brain to have a fit. How in today's time is this still possible to screw up so badly?

What are you talking about... the audio might not professional studio level 10/10, but I don't see anything significantly wrong with it - given that its more like a standard presentation mic. Its clearly good enough.

Every time he turns away from the mic and continues talking while looking at the projection his volume goes way down and at best sounds like a mumble. It is very taxing to keep up with him when he's turned away. It's the wrong mic for the task.

We're very happy for your volunteering to record our next developer conference, tickets are free!

if you weren't on a different continent separated by a very large body of water, I'd be there. I'll donate by suggesting the use of a lav mic vs a podium mic.

It is very difficult for some people to be able to understand clearly voices that are muddled from off axis audio recording. It's a real condition. I have hard time hearing voices in a crowded room from people across the table from me. We spend time worrying about the aria tags in our mark up, but we just assume that everyone has the same hearing abilities? I get that most people probably don't think about this when they don't have a hearing condition, but to be dismissive about it is an entirely different level of egregiousness.

Could my initial criticism have been provided with an entirely different tact, absolutely. But after the mental exhaustion that video was, that was all the energy I could afford at the time.

Agreed. Maybe some are not as sensitive to this, but it is a major energy suck for me. A little post-processing on noise and compression would come a long way. This recording is as raw as the Ramsey meme.

I love this guy! He's a very talented guy who devoted his life to open source.

Hope he doesn't have to beg for donations.

Nice, great presentation! Curious what he has in mind for the "dynamic pipelines" and "scripting (Lua?)" he mentions in the "Future directions" section. I'm imagining something more powerful for animating properties?

Oh, I've run into so many issues related to the "Extras" listed on the slide at ~33m into that video.

About 2x faster on my 4-cores ARM server, without any significant parallelism overhead:

    $ time ffmpeg_threading/ffmpeg -i input.mp4 -ar 1000 -vn -acodec flac -f flac -y /dev/null -hide_banner -loglevel quiet
    14.90s user 2.08s system 218% cpu 7.771 total
    
    $ time ffmpeg -i input.mp4 -ar 1000 -vn -acodec flac -f flac -y /dev/null -hide_banner -loglevel quiet 
    14.05s user 1.80s system 114% cpu 13.841 total

But what part gets multi threading? Because the video compression is already multithreaded. Video decompression I am not sure. And I think anything else is fairly small in comparison in term of performance cost. All improvements are welcome but I would expect the impact to be fairly immaterial in practice.

Well, that's the very specific command I'm using in one of my webapps (https://datethis.app), and it's one of the main performance hotspots, so it's very *not* immaterial.

Very interesting! I had seen the "learn more" video already, but it stayed in a corner of my mind.

To compare any given piece of sound with reference sounds for ENF analysis, the references must have been recorded to start with.

The fact that a webapp like yours can exist... does it mean that we, indeed, have recordings of electrical hum spanning years and years? Are they freely available, or are they commercial products?

It seems so crazy to me that someone decided to put a recorder next to a humming line just to be able to later in the future match the sound with some other recordings...

For Europe, there are academic and public organizations that publish these ENF backlog since about 2017.

For US, I couldn't find any open dataset. For these regions, I'm basically recording the sound of an A/C motor to get the reference data, but I only have a few months of backlog.

See here for the coverage of the webapp: https://datethis.app/coverage

I notice in your coverage plot, the UK National Grid data appears to end mid-2023... have they stopped providing this data?

No, but they do not provide the data in real time.

Wow, I learned something today, did not know this was a thing!

Multithreading the filter graph itself at the top level, so "decode", "sample rate convert", and "encode" can be in separate threads.

Threading depends on implementation of each encoder/decoder - most video encoders and decoders are multithreaded, audio ones not so much. At least that was the state of the world the last time I've looked into ffmpeg internals.

This is removing the video stream (-vn) so that's not involved. Not sure which parts are in parallel here, but I'm guessing decoding and encoding the audio.

You are not using hardware acceleration on the decoding side, and removing video output here. I wonder what happens if we use both hardware acceleration on video decoding and encoding, i.e. something like this on NVIDIA card

   ffmpeg -hwaccel cuda -i $inputFile -codec:a copy -codec:v hevc_nvenc $output

No video is being transcoded in the parent's command (-vn).

Can anyone clarify the licensing requirements of large scale ffmpeg deployments? In what cases are fees required?

The general use case for ffmpeg inside proprietary software is the version of ffmpeg used needs to be statically compiled and linked into the software's executable, or it needs to be a separate executable called by the proprietary software.

You have that backwards - it must be dynamically linked. Static linking without providing your source would violate the LGPL.

Can you drill down a bit more into this? I would consider static linking to be including unmodified ffmpeg with my application bundle and calling it from my code (either as a pre-built binary from ffmpeg official or compiled by us for whatever reason, and called either via a code interface or from a child process using a command line interface). Seems bsenftner's comment roughly confirms this, tho their original comment does make the distinction between the two modes.

What's someone to do?

It is widely known and accepted that you need to dynamically link to satisfy the LGPL (you can static link if you are willing to provide your object files on request). There is a tl;dr here that isn't bad: https://fossa.com/blog/open-source-software-licenses-101-lgp...

But, speciically the bit in the LGPL that matters, is secton 5: https://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html#S... - particularily paragraph 2.

As always, IANAL, but I also have worked with a lot of FOSS via lawyers.

Also, this is and always has been the view of upstream FFmpeg. (Source: I work on upstream FFmpeg.)

If one statically links ffmpeg into a larger proprietary application, the only source files one needs to supply are your ffmpeg sources, modified or not. The rest of the application's source does not have to be released. In my (now ex) employer's case, only the low level av_read_frame() function was modified. The entire ffmpeg version used, plus a notice about that being the only modification, is in the software as well as the employer's web site in multiple places. They're a US DOD contractor, so their legal team is pretty serious.

What's someone to do?

Release your code as GPL

Static linking means combining compiled object files (e.g. your program and ffmpeg) into a single executable. Loading a .so or .dll file at runtime would be dynamic linking. Invoking through a child process is not linking at all.

Basically you must allow the user to swap out the ffmpeg portion with their own version. So you can dynamically link with a .dll/.so, which the user can replace, and you can invoke a CLI command, which the user can replace. Any modifications you make to the ffmpeg code itself must be provided.

Thanks. We're doing the second but I heard that at a certain scale you might need to pay fees anyway?

I used to work at an FR video security company, where our product was in a significant percentage of the world's airports and high traffic hubs. Statically linked ffmpeg for the win.

"FFmpeg is licensed under the GNU Lesser General Public License (LGPL) version 2.1 or later. However, FFmpeg incorporates several optional parts and optimizations that are covered by the GNU General Public License (GPL) version 2 or later. If those parts get used the GPL applies to all of FFmpeg. "

http://ffmpeg.org/legal.html

Meaning it is free, but if you use some modules, you might have problems mixing it with proprietary code.

Isn't video transcoding easily parallelizable? I mean just split the video into N equal parts (at keyframes) and divide the work.

Not a video encoding expert, but for live streams you can't merge the output until you process all the N parts, so you introduce delays. And if any part of the input pipeline, like an overlay containing a logo or text, is generated dynamically i.e. not a static mp4, it basically counts as a live stream.

Why not cut the image in rectangles and process those simultaneously? Wouldn't that work for live streams? (There may be artefacts at the seams though?)

Yes, and we do, but that is not the slow part: https://en.wikipedia.org/wiki/Motion_compensation

And as for seams: https://en.wikipedia.org/wiki/Deblocking_filter

Thanks! Very informative! It's really a fascinating topic.

Yes, good point about live streams.

You are right, but first you ideally should perform scene change detection (to put keyframes at propper positions), and that alone takes quite some processing.

Exactly as some encoders (H264) have keyframes at intervals (e.g., every 30 frames) rather than where the action occurs.

As such, they are suboptimal by default if a lot of motion occurs.

some encoders (H264)

What H.264 encoder are you using that does not have a scene change detection option?

I’ve been looking for more ways to speedup the transcoding process, one solution I found was using gpu acceleration, another was using more threads but its hard to find the optimal amount I should provide.

For most workloads, setting the number of threads to the number of vCPUs (i.e. count each hyperthreaded core as 2) works. But GPU acceleration is much better if it's available to you.

Though in my tests I found that gpu acceleration of video decoding actually hurts performance. It seems software decoding is faster than hardware for some codecs. Of course not the case for encoding.

That heavily depends on the GPU being used and whether or not it has hardware support for your codec. Maybe your GPU is just old/weak compared to your CPU?

That's possible, but if you look at nvidia, the whole range uses the same hardware accelerator, so at most it is a difference in term of chip generation, not so much GPU model.

I am not saying GPU hw decoding isn't useful, it certainly is in term of power consumption, and the CPU might be better used for something else happening at the same time. But in term of raw throughput it's not clear that a GPU beats a recent CPU.

GPU acceleration may produce worse quality and slightly larger files. So there's a trade-off to be had.

Can't you just use Hyperparameter Optimization to find the best value? Tools like Sherpa or Scikit-optimize can be used to explore a search space of n-threads/types of input/CPU type (which might be fixed on your machine).

I don't think "just" is appropriate here, that makes it sound like this should be a trivial task for anyone while it is not. Using "just" like this minimizes work and makes people feel stupid which leads to various negative outcomes.

Sorry for lecturing

Are there any video editing software that take advantage of ffmpeg? I once thought about making something to draw geometry through SVG and use ffmpeg then, or maybe add some UI or whatever, or just to add text, but I never started.

Avidemux feels like it's a bit that.

Since ffmpeg internals are quite raw and not written to be accessed through a GUI, any video editor based on it would probably be quite clunky and weird and hard to maintain.

Maybe an editor that use modules that just build some kind of preview with an command explainer, or some pipeline viewer.

ffmpeg is quite powerful, but it's a bit stuck because it only works with a command line, which is fine, but I guess it somehow prevents it from being used by some people.

I've already written a python script to take a random amount of clips, and build a mosaic with the xstack filter. It was not easy.

KDEnlive. Fantastic foss video editing software.

https://shotcut.org/ also uses ffmpeg

VapourSynth is intended to be a middle ground you might be seeking. You manipulate the video in Python instead of ffmpeg's CLI, but its often more extensible and powerful than pure ffmpeg due to the extensions:

https://vsdb.top/

https://www.vapoursynth.com/

I have seen some niche software built on ffmpeg like losslesscut:

https://github.com/mifi/lossless-cut

Staxrip is also big:

https://github.com/staxrip/staxrip

But I don't know anything "comprehensive."

Probably would have been better to link to

https://ffmpeg.org/pipermail/ffmpeg-devel/2023-November/3165...

instead of the tweet (or the xit, or whatever they are called now), as the substance in the tweet is the link.

tweet (or the xit, or whatever they are called now)

Just "post". They're no longer limited to 280 characters now either, although longer posts are collapsed by default.

I wonder if the X should be pronounced in the Chinese manner, as in Xi Jinping.

Tweet just links to http://ffmpeg.org/pipermail/ffmpeg-devel/2023-November/31655...

.. which in turn references code at https://git.khirnov.net/libav.git/log/?h=ffmpeg_threading

The tweet links to the mailing list which is the official source of the patchset (you can choose "Next in Thread" to continue).

Awesome to see improvements to FFmpeg! I'm hoping to see Dolby AC4 support soon.

Is it automatic or does one need to do command line parameter gymnastics?

Very nice.

Hopefully this will make my small transcoding needs faster for plex (as I don't have hardware transcoding support on my graphics card) =D

You could already place demuxing on a different thread as well

non crap link:

http://farside.link/twitter.com/FFmpeg/status/17212756693367...