apt install is not working for me, is this by design?
nvtop : Depends: libnvidia-compute-418 but it is not going to be installed E: Unable to correct problems, you have held broken packages.
<rant>I find broken installs a huge turnoff, especially those related to NVIDIA. With their 2.3T market cap they can't afford someone to write an universal point and click install script for ML usage? Every time I reinstall Linux I have to spend a whole day sorting NVIDIA out. Why do they have so many layers - driver, cuda, cuda toolkit, cudnn with conflicting versioning - it's a total mess. Instead of a nice install script we have a million install guides 10 pages long, all outdated.</>
Because all these problems don’t hinder their bottom line.
Cluster admins or Ph.D. students handle these problems, allowing people to work. All this infra is already buried under Conda, Jupyter, etc. for most people already.
Sincerely,
Your friendly HPC admin.
That's just because they don't have competition.
Back in the dark days of 2015 we used to spend a day or two just getting tensorflow working on a GPU because of all the install issues, driver issues, etc. Theano was no better, but it was academic research code, we didn't expect better.
Once pytorch started gaining ground, it forced to adapt - Keras was written to hide tensorflow's awfulness. Then Google realized it's an unrecoverable situation of technical debt and they started building JAX.
With AMD, Intel, Tenstorrent, and several other AI chip specialists coming with pytorch compatibility, NVIDIA will eventually have to adapt. They still have the advantage of 15 years of CUDA code already written, but pytorch as ab abstraction layer can make the switch easier.
PyTorch and CUDA solve completely different problems. CUDA is a general purpose programming environment. PyTorch is for machine learning. PyTorch won't ever displace CUDA because there are things other than machine learning models that GPUs are good at accelerating.
Yeah, the amount of tunnel vision from AI/ML users thinking that Nvidia exists solely for their use is funny to watch. Try writing anything other than ML in pytorch. You can't? You can in CUDA. There's a much bigger world than ML out there.
Nvidia's stock price isn't at an all time price because of all the people writing fluid dynamics in CUDA.
Nor is it because of all the tensorflow models people are writing, to be honest.
Of course it's all of the mining, but that's not using pytorch either. It's using CUDA
GPU mining went waaay down since Ethereum went POS (Proof of Stake) almost 2 years ago. Does BTC even use GPU's for mining? I am pretty sure they use ASICS.
What is being mined using CUDA?
And similarly from people who consider Nvidia to be the "Gaming GPU" company, not understanding why it's so big now.
It was an example.
I don't see how Nvidia has to do anything since PyTorch works just fine on their GPUs, thanks to CUDA. If anything, they're still one of the best platforms and that's definitely not because CUDA isn't competitive.
I hate stuff that only works on certain GPUs as much as the next person, but sadly competition has only really started to catch up to CUDA very recently.
The problem is that NVidia is a single company participating with multiple interdependent markets. They are participating with the market of hardware specification, and they are participating in the market of driver implementation, and they are participating in the market of userland software. This is called "vertical integration".
Because of copyright, NVidia gets an explicit government-enforced monopoly over the driver implementation market. Sure, 3rd-party projects like nouveau get to "compete", but NVidia is given free reign to cripple that competition, simply by refusing to share necessary hardware (and firmware) specs; and also by compelling experienced engineers (anyone who works on NVidia's driver implementation) to sign NDAs, legally enforcing the secrecy of their specs.
On top of this, NVidia gets to be anti-competitive with the driver-compatibility of its userland software, including CUDA, GSync, DLSS, etc.
When a company's market participation is vertically integrated, that participation becomes anticompetitive. The only way we can resolve this problem is be dissolving the company into multiple market-specific companies.
I recently traded a friend my Nvidia 3070 for his Radeon 6700 XT, because I'd returned to Linux a few months ago and was tired of Nvidia. Nvidia should will likely get much better as NVK grows, but I think it's better to just not use their products unless you want to have Microsoft spywareOS installed on your computer.
Everybody's experience is different.
I've had one or two upgrade problems in the last 10 years, but otherwise the Nvidia drivers have worked great for me. My biggest complaint is they dropped support for the GPU in my Macbook, and I had to install the nouveau drivers (which I can never spell correctly).
At least it's not from the FSF, and GPUs aren't gendered, or you'd have to choose from multiple gendered drivers:
Installing CUDA is not that hard? You follow the official instructions and it's done within minutes.
Never had a problem with the initial installation, updates can get messy, but asides from that it's pretty much smooth sailing.
I've just spent the morning uninstalling and reinstalling different versions of Nvidia driver (Linux) to get nvcc back for llama.cpp after Linux Mint did an update - I had CUDA 12.3 and 12.4 (5GB each), in conflict, with no guidance. 550 was the charm, not 535 that was fine in January. This is the third time I'm going this since December. It is painful. I'm not in a hurry to return to my cuDF experiments as I'm pretty sure that'll be broken too (as it has been in the past). I'm the co author of O'Reilly's High Performance Python book and this experience mirrors what I was having with pyCUDA a decade back.
There are plenty of problems with NVIDIA on Linux, but I'm sad to tell you I think this one is your own fault.
The error message is telling you that you've held back broken packages that are conflicting with dependencies nvtop is trying to install. If you sort that out, nvtop should install.
I have nvtop installed on Debian via apt, and it works just fine.
Nah, screw install scripts, too. If I didn't prefer AMD anyway, I'd want an apt repo.