Ah this is quite interesting! I had a usecase where I needed a GPU-over-IP but only for transcoding videos. I had a not-so-powerful AMD GPU in my homelab server that somehow kept crashing the kernel any time I tried to encode videos with it and also an NVIDIA RTX 3080 in a gaming machine.
So I wrote https://github.com/steelbrain/ffmpeg-over-ip and had the server running in the windows machine and the client in the media server (could be plex, emby, jellyfin etc) and it worked flawlessly.
This is more or less what I was hoping for when I saw the submission title. Was disappointed to see that the submission wasn't actually a useful generic tool but instead a paid cloud service. Of course the real content is in the comments.
As an aside, are there any uses for GPU-over-network other than video encoding? The increased latency seems like it would prohibit anything machine learning related or graphics intensive.
How do you use it for video encoding/decoding? Won't the uncompressed video (input for encoding or output of decoding) be too large to transmit over network practically?
Well, the ffmpeg-over-ip tool in the GP does it by just not sending uncompressed video. It's more of an ffmpeg server where the server is implicitly expected to have access to a GPU that the client doesn't have, and only compressed video is being sent back and forth in the form of video streams that would normally be the input and output of ffmpeg. It's not a generic GPU server that tries to push a whole PCI bus over the network, which I personally think is a bit of a fool's errand and doomed to never be particularly useful to existing generic workloads. It would work if you very carefully redesign the workload to not take advantage of a GPU's typical high bandwidth and low latency, but if you have to do that then what's the point of trying to abstract over the device layer? Better to work at a higher level of abstraction where you can optimize for your particular application, rather than a lower level that you can't possibly implement well and then have to completely redo the higher levels anyway to work with it.
Ah, you mean transcoding scenarios. Like it can't encode my screen capture.
I am increasingly growing tired of these "cloud" services, paid or not. :/
Well, feel free to spend your own time on writing such a tool and releasing it as Open Source. That would be a really cool project! Until then, don't complain that others aren't willing to donate a significant amount of their work to the public.
There is a vast gap between walled garden cloud service rent seeking and giving away software as open source. In the olden days you could buy software licenses to run it wherever you wanted.
Some computation tasks can tolerate the latency if they’re written with enough overlap and can keep enough of the data resident, but they usually need more performant networking than this. See older efforts like rcuda for remote cuda over infiniband as an example. It’s not ideal, but sometimes worth it. Usually the win is in taking a multi-GPU app and giving it 16 or 32 of them rather than a single remote GPU though.
There is a GPU-over-network software called Juice [1]. I've used it on AWS for running CPU-intensive workloads that also happen to need some GPU without needing to use a huge GPU instance. I was able to use a small GPU instance, which had just 4 CPU cores, and stream its GPU to one with 128 CPU cores.
I found Juice to work decently for graphical applications too (e.g., games, CAD software). Latency was about what you'd expect for video encode + decode + network: 5-20ms on a LAN if I recall correctly.
[1] - https://github.com/Juice-Labs/Juice-Labs
I mean, anything you use a GPU/TPU for could benefit.
IPMI and such could use it. Like, for example, Proxmox could use it. Machine learning tasks (like Frigate) and hashcat could also use such. All in theory, of course. Many tasks use VNC right now, or SPICE. The ability to extract your GPU in the Unix way over TCP/IP is powerful. Though Node.js would not be the way I'd want such to go.
Interesting. Do you know if your tool supports conversions resulting in multiple files, such as HLS and its myriad of timeslice files?
Since it’s sharing the underlying file system and just running ffmpeg remotely, it should support any variation of outputs
Have you done a Show HN yet? If not, please consider doing so!
https://gist.github.com/tzmartin/88abb7ef63e41e27c2ec9a5ce5d...
https://news.ycombinator.com/showhn.html
https://news.ycombinator.com/item?id=22336638