Why is low latency livestream so hard, while at the same time Cloud Gaming Tech like Nvidia Gamestream and such can have such a flawless experience?
I've used Moonlight + Nvidia Gamestream with ~40ms RTT and couldn't feel a difference in competitive shooters, so total latency must be pretty low.
Does it have something to do with the bandwidth requirements? (1 stream v/s potentially hundreds)
There's no way that people cannot tell the difference, I can with various streaming methods from my pc to my shield/tv, with wire in the same house. Mouse to photon latency of a good pc will be in the range of 10-20ms, best case you're doubling or tripling that.
I can feel 40ms for sure and there’s no way you play a competitive shooter with a 40ms delay. Hell even a Bluetooth mouse gets annoying.
Maybe if you’re playing Microsoft Office it’s ok.
Nah to be fair it's fine for a lot of games which are also played on old gen consoles with terrible gamepad to TV latency. Sure twitchy multiplayers are definitely not some of them. I'm not big on competitive multiplayer, only Rocket League and I can't do this over local streaming. Pretty much anything else I play is ok though.
You, my dear Internet friend, are confidently expressing your lack of experience. No one who has played multiplayer LAN games, or low latency Internet games, could or would ever say that streaming gaming, such as the dead stadia, or moonlight, whatever, are comparable to the alternative, Nah, they couldn't.
I don't think that I could feel the difference between 40ms and 10ms RTT when playing something like DOTA2 or AoE2.
Most online games use client side prediction, so any input made by the client happens almost instantly on the client and it feels really good, and can be rollbacked if the server disagrees. If you stream your game remote with 40ms it will add 40ms to your input and that just feels bad (not to mention jitter, especially if you're on semi-congested wifi), but its not unplayable or even that noticeable in many games. Would I play some casual Dota like that? Sure. But not high ranked games.
You conflate local streaming vs internet streaming, and I specifically excluded twitchy multiplayer games...
You're not going to be able to do the best combos with that kind of latency, but I guess it's ok for mid-level play.
yeah, I just feel the lag and everything, even on monitors claiming 1ms I can feel it while playing FPS and it is really annoying to me if game is not fluent I will not play it
Slightly tangential, do you think Moonlight (with say Sunshine) is good enough for proper work? I've used a few "second screens" apps like spacedesk on my iPad but generally when the resolution is good enough for text, it's too laggy for scrolling (and vice-versa).
(For more details, I'm planning to stream from an old laptop after turning it into a hackintosh. I'm hoping staying on the home network's going to help with latency.)
Absolutely yes, especially if you use a GPU with a recent video encoder.
I have all my PCs connected to each other via Moonlight+Sunshine and the latency on the local network is unnoticeable. I code on my Linux workstation from my laptop, play games on the gaming PC from my workstation, etcetera and it is all basically perfect.
Thank you! When you say GPU with a recent video encoder, do you mean them separately (i.e. don't use software/cpu streaming; and use an efficient encoder), or do you mean use a GPU that supports recent encoders? I'm afraid my Intel HD 520 isn't particularly new and likely doesn't support modern encoders.
A GPU with a dedicated encoder for x264 or preferably x265/AV1. You can do without it, but you'll spend a core or two on software encoding and the overhead will add a few tens of ms of lag.
With full hardware capture and encode (default on windows, can require tweaking on Linux ) it's virtually free resource-wise.
You're off by one letter: the codec is h264/h265. x264 and x265 are the CPU encoding softwares for the codec.
Without a doubt! I use moonlight for ten+ hours a week. I never use it for gaming and it never failed once.
Thanks, that's great to hear/know! Would you be okay sharing your hardware (CPU/GPU) setup on the server (Sunshine) side? Thanks!
Yes. I host both on my NixOS desktop that has a 13900KF and an RTX4080 (using nvenc and AV1), as well as from my MacBook Pro M3 Pro.
GPU would do transcoding, build network packet and copy data via PCI-E, all using hardware, avoid memory copy.
OBS+WebRTC is mostly software doing heavy-lifting.
Imagine if the camera would build WebRTC UDP packets directly and zero-copy to NIC, that would lower latency quite a bit.
I wouldn't be surprised to learn that Nvidia is doing exactly that on their cloud: Compressing the video on the GPU using NVENC, building a package around it and then passing it to a NIC under the same PCIe switch (mellanox used to call that peerdirect) and sending it on its way.
The tech is all there, it just requires some arcane knowledge.
This is premature optimisation. The bus bandwidth and latency needed to get a few Mbps of compressed video to the PC is microscopic. It's completely unnecessary to lock yourself into NVIDIA just to create some UDP packets.
I was talking about Nvidia's Cloud gaming offer (GeForce Now). For them it's certainly not a premature optimization.
"arcane knowledge" is too strong of a phrase. You need someone who is familiar with Nvidia hardware and is willing to write software that only works on Nvidia hardware.
It is arcane as in information how all of this works on their specific hardware is not publicly available, but probably widespread within.
Exactly this with „…NVIDIA GPUDirect for Video, IO devices are fully synchronized with the GPU and the CPU to minimize wasting cycles copying data between device drivers“.[1]
1. https://developer.nvidia.com/gpudirectforvideo
TLDR, there are a lot of moving pieces, but people are working on it at the moment. I try to summarize below what some of the challenges are
Bandwidth requirements are a big one. For broadcasts you want your assets to be cacheable in CDN and on device, and without custom edge + client code + custom media package, that means traditional urls which each contain a short (eg 2s) mp4 segment of the stream.
The container format used is typically mp4, and you cannot write the mp4 metadata without knowing the size of each frame, which you don't know until encoding finishes. Let's call this "segment packaging latency".
To avoid this, it's necessary to use (typically invent) a new protocol other than DASH/HLS + mp4. Also need cache logic on the CDN to handle this new format.
For smooth playback without interruptions, devices want to buffer as much as possible, especially for unreliable connections. Let's call this "playback buffer latency".
Playback buffer latency can be minimized by writing a custom playback client, it's just a lot of work.
Then there is the ABR part, where there is a manifest being fetches that contains a list of all available bitrates. This needs to be updated, devices need to fetch it and then fetch the next content. Let's call this "manifest rtt latency".
Lastly (?) there is the latency from video encoding itself. For the most efficient encoding / highest quality, B-frames should be used. But those are "lookahead" frames, and a typical 3 frame lookahead already adds ~50 ms at 60fps. Not to mention the milliseconds spent doing the encoding calculations themselves.
Big players are rewriting large parts of the stack to have lower latency, including inventing new protocols other than DASH/HLS for streaming, to avoid the manifest RTT latency hit.
For HLS you can use mpeg ts, but mp4 is also an option (with the problem you talk about).
IMO one of the issues is that transcoding to lower resolutions usually happens on the server side. That takes time. If the client transcoded that latency would go away (mostly).
Cloud gaming is streaming from a server in a data center to one nearby client. Twitch-style live streaming is from a client, to a data center, to a CDN, to multiple clients.
A lot of it is buffering to work around crappy connections. Cloud gaming requires low latency so buffering is kept to a minimum.
Because there are so many middlemen in series buffering frames, and also because access circuit between user terminal to nearest CDN is jittery too. The latency must be few times over max jitter for a par course user experience.
In my head the answer is simple, moonlight is a one-to-one stream, while broadcasting to many clients at once is a whole different setup.