Disclaimer: I used to work at a live video streaming company as a financial analyst so quite familiar with this
The biggest cost is as you imagine the streaming - getting the video to the viewer. It was a large part of our variable cost and we had a (literal) mad genius dev ops person holed up in his own office cave that managed the whole operation.
Ive long forgotten the special optimizations he did but he would keep finding ways to improve margin / efficiency.
Encoding is a cost but I don’t recall it being significant
Storage isnt generally expensive. Think about how cheap you as a consumer can go get 2 TB of storage, and extrapolate.
The other big expense - people! All those engineers to build back and front end systems. That’s what ruined us - too many people were needed and not enough money coming in so we were burning cash.
I'm guessing live video looks a lot different from a more static video site. I think encoding and storage are both quite expensive. You want to encode videos that are likely to be watched in the most efficient ways possible to reduce network bandwidth usage, and every video needs at least some encoding.
Based on some power laws etc., I would guess most videos have only a handful of views, so storing them forever and the cost to encode them initially is probably significant.
Encoding and storage aren't significant, relative to the bandwidth costs. Bandwidth is the high order bit.
The primary difference between live and static video is the bursts -- get to a certain scale as a static video provider, and you can roughly estimate your bandwidth 95th percentiles. But one big live event can blow you out of the water, and push you over into very expensive tiers that will kill your economics.
But if you're broadcasting something live and what's killing you is that everyone wants to watch it at the same time... wouldn't you serve it P2P so that everyone is downloading it from each other rather than you?
I doubt that live(!) P2P video sharing would work. You will have some users who get the video stream directly from you. These primary peers will then need to relay the same data through their tiny consumer DSL line (slow upload!) to secondary peers. These secondary peers will have a noticeable lag. It will get even worse when you have tertiary peers.
One great thing about P2P is you can provide more peers. You can surge inexpensive machines near your market and drastically reduce the load on your main servers.
And home connections —while still largely asymmetric— are much faster than they used to be. Having 10mbps up means one client can serve two more. And there's a lot more FTTP with 100-1000mbps up too. These really make a difference when you have a large swarm.
A problem with live is that everyone wants the content at the same time. One client can only serve two more after it has the content. Any drop in connection is also very disruptive because you don't have a big buffer and everyone wants the content now.
A place this could work is streaming a conference, live-ish is the goal and the producers aren't rich. Sports would be the worst case.
Isn't the point of the P2P approach that it gets better the more this is true?
No, not really on those timescales. If it's about a popular show that's released the whole season today, yeah absolutely. Pulling ep1 from my neighbour while they watch ep2 makes sense.
It doesn't really work for something you want to watch simultaneously and reliably. I have to wait for my neighbour to get the chunk I want, then I get it. If they got it from someone else, we form a bigger chain, and then you have all the broadcasting etc to figure out who to actually get a chunk of video from.
Hearing the street cheer while I watch my national team captain take a runup for a penalty is really quite bad.
But the problem is that you have a gigantic audience. Many of them will make effective primary peers. If that weren't true, you wouldn't have a problem in the first place.
P2P is going to be a big challenge for tons of reasons. Set top boxes aren't going to play. Lots of people are behind NATs that make it hard to connect. Mobile is battery sensitive and sending to peers is going to eat more battery. Some users pay for every byte they send, and they won't want to pay for you to save operating costs. Plus all the other stuff everyone said about latency.
If they're not significant, then why does youtube build ASICs for doing video encoding? See e.g., https://arstechnica.com/gadgets/2021/04/youtube-is-now-build...
VA-API, NVENC,
nvenc > See also: https://en.wikipedia.org/wiki/Nvidia_NVENC#See_also
NVIDIA Video Codec SDK v12.1 > NVENC Application Note: https://docs.nvidia.com/video-technologies/video-codec-sdk/1... :
FFMPEG > Platform [hw video encoder] API Availability table: https://trac.ffmpeg.org/wiki/HWAccelIntro#PlatformAPIAvailab... :
Because significance varies, as does optimisation. At YouTube scale it might matter more, or the benefits might be bigger, even if just to save some energy or carbon footprint (and even that might be just for a compliance or marketing line).
Because when you are Youtube, even relatively marginal cost improvements can be huge in absolute. There is also the UX of having to wait X minutes for an uploaded video to be ready that is improved by this.
If you make a billion, a 1% saving is 10 million. You can hire and fund a lot of activity with 10 million.
If you make 1 million, 10k isn't going to go very far towards paying devs to save you 1%
AFAICT, the answer to "why does Google do X" is basically always "because someone needed a launch to point at when they're up for promotion".
Doing so wouldn’t hurt and would make a sizable impact at the scale of Google?
Live has a _huge_ advantage in the storage side. In a purely "live" sense all of the content is temporally synchronised; every viewer is requesting approximately the same segments at the same time. Store the current chunks, and the last few minutes of seek time, in memory and put out on the wire to all of the viewers. Twitch talked about this a bit just before/after the AMZN acquisition.
In a prerecorded video CDN managing that catalog is a PITA and does drive meaningful infrastructure cost. You need the "right" content to be in the correct location for low cost peering/transit/distribution, on the correct media for the total throughput:size, in the optimal number of encodings for efficient/quality playback, etc. This job is a lot easier when the provider controls the catalog, and has a limited catalog size. See some of the OpenConnect talks where they're "preloading" content offpeak to optimize IO allocation on the appliances. It was an absolute nightmare to try and manage with a many PB catalog with 3P content that service didnt control the release/popularity of.
Edit: source, principal at AWS and was responsible for a lot of the prime video delivery once upon a time.
With Netflix's Live events, you can seek anywhere up to time zero, not just the last few minutes.
Netflix already solved the on demand streaming at scale though, for them it is harder to do live events given the fact they are new to it
Apple does the same with the MLS soccer matches. I think you’re conflating “live” with “live and available on-demand anytime thereafter”
Interestingly enough the Apple and I assume Netflix live streams come from the colo equipment in your ISP. So each box has their own recording as it happens.
Used to work at a live-streaming company on our stream infra.
I mostly disagree, unless it's pure live no replay at all and no closely timed events required. Usually live platforms will offer some sort of a VOD (VODs, Replays, Rebroadcasts), all of which will require a storage solution. Couple that in with the fact that anything requiring more complex timing than "show video live~ish" can get messy fast with sync and latency issues.
Yes, i was referring to “live only” and not VOD/“low latency hls” cases. This is a decade ago but my examples off hand are things like video game, game shows, and contests. Was definitely a category, infrastructure looked a lot closer to multicastish RTMP than todays dynamic manifest mpeg segment CDNs.
Edit: the above notwithstanding live sports etc is _still_ better on the storage side as viewers are so heavily synchronized. Lots of nice cache efficiencies when everyone is watching the same content at the same time.
Does the recommendation algorithm account for this? If I'm in a specific place, am I more likely to see content that's already in the right data center?
I cant speak to content recommendations; I worked on the “backend” infrastructure storing bytes and delivering bits. But in that realm yes absolutely. Any CDNs #1 job is to route an end user request to a nearby (or otherwise optimal) datacenter, usually via DNS response. For streaming content I believe “everyone” is doing (part of) this at the streaming client/API layer these days. When you request to start playing the returned url will include/encode hints that help the CDN to send your request to the correct part of the CDN that holds your requested catalog title. ie CDNs arent homogenous and not all content will be stored in every edge location. The service API servers may/will even allocate different requests to different CDNs. eg the streaming service might use any combination of 1P (OpenConnect, CloudFront) and 3P (limelight, akamai, level 3) CDNs.
The minimum possible expenditure on encoding is "we require videos to be encoded like so; here's our help page on how you can do that".
It's not even slightly expensive.
It’s not that expensive at YouTube scale. We are talking fractions of a penny per GiB transferred.
"not that expensive" is relative; it's still a lot of money. Sure, it's not trillions of dollars, but it's still billions of dollars. YouTube has historically not returned a net profit (and I haven't heard of that situation changing).
Yt basically got unusable without premium though.
I have premium, my wife accepts all of those ads
I have uBlock Origin.
That doesn't work on my LG tv
I did that for a while and then even just hearing my wife suffer through it made me upgrade to the family plan. Now I invited some other family members and we are all enjoying that premium bitrate no-ad lifestyle.
Still on Ublck Origin and I don't see any ads and all the videos.
I'm sure that's what Google's accountants would love us (and the IRS) to continue to believe.
Not sure, one could say they use their dominant search position and revenue to serve video at a loss and distort the market, making it very hard for anyone without an existing money printing machine to bootstrap a profitable video site. See vimeo.
that was my assumption when i read they weren't profitable
and also that streaming and storing video at that scale is almost a natural monopoly, with how much it must cost and how hard it would be to compete without existing resources
Do you have a public source for that? From what I’ve heard YouTube has been profitable year years at this point.
YT financials and P&L were not broken out in audited financial statements back in the day.
Still aren’t. Alphabet only publishes YouTube revenue.
Yeah, YouTube is big enough to put their own cache nodes directly in ISP datacenters
Which would help for the crazy popular meme videos, but I bet the long tail on YouTube is insanely big, even if you did have the “watch next” engine getting in on the game steering you toward content already present in your nearby caches.
YouTube is estimated to have 1 exabyte [1] of data. Petabyte level storage is not unheard of [2], and a gateway server with 5PB storage would cover ~0.5% of all YouTube videos, which should be sufficient to serve a very high percentage of the most popular videos.
They can still afford to serve the occasional obscure video from the origin servers.
[1] https://www.qqtube.com/blog/how-much-storage-does-youtube-ha...
[2] https://www.qnap.com/solution/petabyte-storage/en/
Bingo, and also in Internet Exchanges - every ISP at KCIX the exchange has direct handoff to YouTube, Google, NetFlix, Cloudflare, ...
Yup! This is the reason why its so cheap for them. Other companies in similar positions have cache nodes in the ISPs and this dramatically lowers the cost
Next iteration of this will be video generated on demand with GenAI running closer to the request, ideally at the request.
The important question is how much can an ISP charge for a broadband without YouTube and Netflix service. They do not pay even the fractions of a penny everyone else has to.
I disagree. Storage is expensive. Think of an old video uploaded 15 years ago with total view count of 1k. You can't just put it to a cheap cold storage. Someday, somebody is going to watch it and you have to retrieve it instantly or that somebody will be disappointed.
You can.
Yt for example deletes your 720p after a while and replaces it with a potato.
And if you watch a old not relevant yt and it starts after 10 seconds instead of now, no one really cares.
You can put that old highly encoded potato at your huge and cheap storage system de located somewhere around the globe were it's just cheap (energy).
You can also calculate in the time for a band robot and only store half or the first minute of that potato on your cheap storage and let the robot grab the rest of it.
After all if video is your main thing plenty of weird optimizations start to make sense.
Tape robot time is NOT measured in minutes but hours. Also you wouldn't pull individual files like this out of tape.
Tape is for long term sunk storage, not cold infrequent access like a youtube video.
I know aws glacier has an "expedited retrieval time" of 1-5 minutes, but that is not how typical tape setups work. Frankly I would be very interested in what actually hides behind that product.
You can use big disk, but not be able to access all the files on the disk with the same frequency, so you have 20% of the disk dedicated to hot storage, and 80% of it to cold storage. Cold storage access is queued , so the 1-5 minutes can come from there.
Yeah I looked closer and I think they are basically just packaging up some kind of offering on top of spinning rust for the two first glacier tiers.
Glacier is not actually tape, the fancy tape-robot videos nonwithstanding. Most of it is just regular old S3 running on outdated storage hardware
Google gave up on tape a while back. Latest Google search indicates it is only used for air gapped backups. I don't OP was suggesting using tape though, especially with technologies such as hybrid SMR.
That's all true, but I don't think anyone mentioned tape storage.
Or run 2-3 adds for 2 minutes each. Gives you plenty of time to fetch the video in the background.
No way, man! They don't want to interfere with the viewing experience of their primary revenue source! (Ads) :-)
They are definitely not fetching video in the background.
Fetch from cold storage to thier CDN; whilst they fill your bandwidth with ads.
I WISH it would be prefetching video in background while showing ads.
But no, always goes to spinny wheel buffering after ad ends. Oh, and thats after having some spinny wheel to load the ad in first place ffs.
always bugs me to hell when i encounter a "high definition" video that has worse quality than pal/ntsc
But in a few years we will have 8k for all those videos with super AI upscalers.
I regularly run into videos on youtube that obviously came from tape because they stop for minutes to load.
100% that Google put videos on colder storage. Hot cached videos in memory cached at all possible locations. And cold videos stored compressed in a much cheaper storage container. The difference is maybe 500 to 3000ms.
I seem to remember Google own some network infrastructure? That saves some money. On top of that at their size you are going to get things cheaper.
There should be economies of scale on that. Its harder to build and maintain bigger systems, but the work required does not scale linearly with size.
Google own vast network infrastructure. The day Google acquired YouTube (I was there) they discovered that YT was on the verge of collapse and would run out of bandwidth entirely within months. There was an emergency programme put in place with hundreds of engineers reallocated to an effort to avoid site collapse, with the clever/amusing name of BandAid.
BandAid was a success. YouTube's history might look very different if it wasn't. It consisted of a massive crash buildout of a global CDN, something that Google historically hadn't had (CDNs aren't useful if everything you serve is dynamically generated).
One reason BandAid could happen was that Google had a large and sophisticated NetOps operation already in place, which was already acquiring long haul unused fibre wavelengths at scale for the purposes of moving the web search index about. So CDN nodes didn't need to be far from a Google backbone POP, and at that point moving bits around on that backbone was to some extent "free" because the bulk of the cost was in the fibre rental and the rackspace+equipment on either end. Also it was being paid for already by the needs of search+ads.
Over time Google steadily moved more stuff over to their infrastructure and off YouTube's own, but it was all driven by whatever would break next.
Then you have all the costs that YouTube has that basic video sites don't. ContentID alone had costs that would break the back of nearly any other company but was made effectively mandatory by the copyright lawsuits against YouTube early on, which Google won but (according to internal legal analysis at least) that was largely due to the enormous good-faith and successful effort demonstrated by ContentID. And then of course you need a global ad sales team, a ton of work on ad targeting, etc.
This story explains why Google can afford to do YouTube and you can't. The reality is that whilst they certainly have some sort of internal number for what YouTube costs, all such figures are inevitably kind of fantastical because so much infrastructure is shared and cross-subsidised by other businesses. You can't just magic up a global team of network engineers and fibre contracts in a few months, which is what YouTube needed in order to survive (and one of the main reasons they were forced to sell). No matter what price you come up with that in internal booking it will always be kinda dubious because such things aren't sold on the market.
Great write up! I worked for a few months at a Google datacenter and a few times got to see the fiber endpoints.
Though the idea of such networks not being sold on the market makes be ponder if starlink will come to provide such a service. They’d need to scale out their laser links and ground stations.
There are hard physical limits on how much bandwidth Starlink can provide to do with spectrum allocations, so it will always be a somewhat boutique service and indeed prices for end users might end up climbing as word spreads and demand grows. They already practice regular dynamic pricing changes depending on demand within a cell. It doesn't make sense for corporate backbones.
Definitely a few factual errors here that ought to be corrected.
On day one of the acquisition, Youtube's egress network was at least 4x the size of Google's, built and run by two guys. This shouldn't be a shock, you need a lot more bits to serve video than search results. For the hottest bits of content, third-party CDNs were serving videos and thumbnails.
There was no collapse imminent, but there were concerns about getting YouTube and Google infrastructure on a common path. BandAid was named as such because the goal was "not to break the internet." It was a small project with maybe a dozen members early on, all solid people.
YouTube had its own contemporaneous project, née VCR - "video cache rack". We did not necessarily believe that BandAid would arrive in a reasonable amount of time. Generally Google has two versions of every system - the deprecated one and the one that doesn't work yet.
VCR was a true YouTube project, 3 or 4 people working with one purpose. It was conceived, written, physically built and deployed in about 3 weeks with its own hardware, network setup and custom software. I believe it was lighttpd with a custom 404 handler that would fetch a video when missing. That first rack maxed out at 80Gbps a few hours into its test.
After several months, Bandaid debuted. It launched at ~800Mbps and grew steadily from then on into what is certainly one of the largest networks in the world.
YouTube mostly moved things to Google based on a what made good engineering sense. Yes, a few of them were based on what would break next - thumbnails leaps to mind. Search, which we thought was a no-brainer and would be "easy" took more than a year to migrate - mostly due to quality issues. Many migrations were purely whimsical or some kind of nebulous "promo projects." Many more stayed separate for more than a decade. When a company gets large enough, the foolish consistency tends to trump a lot of other engineering arguments.
To the ancestral poster, do not despair. You can transcode, store and serve video, as you've likely surmised it's not all that difficult technically. In fact, it's so much easier now than in 2005.
What makes a great product is hard to describe and not always obvious. The economics will depend on your premise and success. "the cloud" gets pricey as you scale. There will be a long path of cost optimization if you get big enough, but that's the price of doing business at scale.
Thanks for the corrections. I was indeed thinking of thumbnails w.r.t. "what would break next".
This comment is a great answer to those commenters who say "Google bought YouTube/Android/..., they haven't invented anything since Search" miss the actual hard part entirely.
(same for Meta wrt Instagram)
These products that have been scaled by multiple orders of magnitude since original acquisition are like ships of Theseus; almost everything about how they work, how they scale and how they make money, have completely changed.
Curious: was your distribution client-server or peer-to-peer?
Or both, similar to Skype's supernode model?
The overwhelming majority of "legitimate" video streaming sites operate on a client-server model, which allows videos to be watched in web browsers, and on mobile devices (which don't generally do well in P2P as they find uploading difficult).
And generally torrent-based streamers don't hire financial analysts :)
Thankfully the FCC definition of "broadband" is getting more symmetrical over time. And doesn't webrtc take care of connecting browsers pretty well?
The current definition requires 20Mbps of upload, and uploading a youtube-quality video to two other people would not take a big fraction of that. Though it would help if ISPs stop trying to set bandwidth caps at <5% utilization levels.
It's not only the amount of upload bandwidth and the usage caps (although those are both big issues).
It's that you're also probably going to get CGNAT - and maybe even a firewall blocking unusual ports.
And you're going to be running the power-hungry data connection at least twice as much, bad for battery life.
And mobile connections are less reliable - transitions between towers, going through tunnels, switching between 4G and WiFi.
And mobile OSes are eager to suspend things - especially things that are using a lot of data and battery.
That's a problem if all your users are mobile, yeah.
I'm thinking of the situation where most of the users are using home connections and have power cables always in or in reach.
Interesting background. I worked twice in digital video, once ~2000-2001 (ancient history - early IP, ISDN, the dead-end of H.323, bonded GSM channels, etc.) and once ~2009-2010. The second episode was fascinating, we specialised in mobile video at a time when it was just appearing on the consumer market. Most of the global mobile device manufacturers were clients. It got to the point where they would build the hardware and we would get airdropped in to their R&D to make it work - they had no idea how performant the architecture was going to be, because they'd never tried it. We also built the server side, the billing architecture with revenue share, carrier billing support (only possible with device preloaded apps due to Google Play (then "Google Apps"?) store restrictions on third party payment mechanisms), etc.
Encoding, scaling and transcoding are relatively cheap for stored content, and relatively expensive if you want real or near-real time.
If you want DRM (digital rights management = ~ineffective copy protection) then you need to add a bit more overhead for that, both in terms of processing and latency. If you need multi-DRM (different DRM systems for different devices the consumer owns) and a good cross-device experience (like pause and resume across devices), it gets real hard real fast.
It helps to be targeting a standard platform, for example a modern widescreen TV with H.265 support and solid 4K decoding. Otherwise you need a different version for every resolution, a different version for every CODEC, a different version for every bitrate, etc. We had great experience adjusting bitrates and encoding parameters for different device categories, for example if you had a certain phone and you ran it at max spec it might look great but if you were looking to preserve battery and were running on battery save mode the decode would fail and you'd get choppy performance and stuttering audio. This sort of thing was rife then.
As a series of specialist video providers emerged, ~all the cloud providers went and added these services, basically 95% of which are frontends to ffmpeg and some internal cloud storage scheme or similar.
Finally, billing is hard. Users have an expectation of free content now.
No experience with real time stream economics, but saw the inside of LA's stadium video control center one day. Didn't look inexpensive, I'll tell you that much. Probably for events with multiple cameras you're mostly paying site fees, ie. reliable bandwidth, humans, mixing desk if required. For studio broadcast these costs will be reduced. Both will have a slight real time encoding tax vs. stored content. If you want to figure out how to do it cheaply, look at the porn industry.
I wonder what the approximate net global economic benefit of ffmpeg is to this point?
Or the net global economic benefit of discrete cosine transform... "We're not in Kansas anymore, Toto" https://en.wikipedia.org/wiki/Discrete_cosine_transform#Hist... https://en.wikipedia.org/wiki/Discrete_cosine_transform#Appl...
quick someone post the xkcd
(https://xkcd.com/2347/)
I'm actually kind of surprised serving media isn't trivial and solved yet.
Routers have ASIC switching, why can't we have dedicated cache appliances with a bunch of RAM and some kind of modified GPU with network access and crypto acceleration in each core?
Have you seen Netflix OCAs? Current off-the-shelf hardware goes really long way
https://openconnect.netflix.com/en/appliances/
https://news.ycombinator.com/item?id=32519881
Humorous
Youtube owns a huge CDN to deliver video quickly.
https://support.google.com/interconnect/answer/9058809?hl=en
Sort of make's Cloudflare's R2 look more impressive since they do not charge for egress.
I'm digressing from the topic, but R2 looked good on paper, but have a long way to go in terms of reliability.
Gilfoyle: ":smiling"
Thanks for sharing. Is it possible to join his team?
Moderation seems like another big issue although the solution afaict seems to mostly involve shipping this work off to the Philippines or wherever and making people look at the most horrifying content imaginable for 40 hours a week at very low wages.