return to table of content

NVIDIA Transitions Fully Towards Open-Source Linux GPU Kernel Modules

hypeatei
55 replies
23h35m

How is the NVIDIA driver situation on Linux these days? I built a new desktop with an AMD GPU since I didn't want to deal with all the weirdness of closed source or lacking/obsolete open source drivers.

mepian
20 replies
23h17m

The current stable proprietary driver is a nightmare on Wayland with my 3070, constant flickering and stuttering everywhere. Apparently the upcoming version 555 is much better, I'm sticking with X11 until it comes out. I never tried the open-source one yet, not sure if it supports my GPU at all.

JasonSage
10 replies
20h43m

In defense of the parent, upcoming can still be a relative term, albeit a bit misleading. For example: I'm running the 550 drivers still because my upstream nixos-unstable doesn't have 555 for me yet.

mananaysiempre
5 replies
19h39m

nixos-unstable doesn't have 555

Version 555.58.02 is under “latest” in nixos-unstable as of about three weeks ago[1]. (Somebody should check with qyliss if she knows the PR tracker is dead... But the last nixos-unstable bump was two days ago, so it’s there.)

[1] https://github.com/NixOS/nixpkgs/commit/4e15c4a8ad30c02d6c26...

JasonSage
4 replies
18h50m

`nvidia-smi` shows that my driver version is 550.78. I ran `nixos-rebuild switch --upgrade` yesterday. My nixos channel is `nixos-unstable`.

Do you know something I don't? I'd love to be on the latest version.

I should have written my post better, it implies that 555 does not exist in nixpkgs, which I never meant. There's certainly a phrasing that captures what I'm seeing more accurately.

mananaysiempre
1 replies
6h25m

I did not mean to chastise you or anything, just to suggest you could be able to have a newer driver if you had missed the possibility.

The thing is, AFAIU, NVIDIA has several release channels for their Linux driver[1] and 555 is not (yet?) the "production" one, which is what NixOS defaults to (550 is). If you want a different degree of freshness for your NVIDIA driver, you need to say so explicitly[2]. The necessary incantation should be

  hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.latest;
This is somewhat similar to how you get a newer kernel by setting boot.kernelPackages to linuxPackages_latest, for example, if case you've ever done that.

[1] https://www.nvidia.com/en-us/drivers/unix/

[2] https://nixos.wiki/wiki/Nvidia

JasonSage
0 replies
2h32m

I had this configuration but was lacking a flake update to move my nixpkgs forward despite the channel, which I can understand much better looking back.

Thanks for the additional info, this HM thread has helped me quite a bit.

atrus
1 replies
16h43m

Are you using flakes? If you don't do `nix flake update` there won't be all that much to update.

JasonSage
0 replies
15h39m

I am! I forgot about this. Mental model check happening.

(Still on 550.)

zxexz
1 replies
15h52m

I love NixOS, and the nvidia-x11 package is truly wonderful and captures so many options. But having such a complex package makes updating and regression testing take time. For ML stuff I ended up using it as the basis for an overlay, and ripping out literally everything I don’t need, which makes it a matter of minutes usually to make the changes requires to upgrade when a new driver is released I’m running completely headless because these are H100 nodes, and I just need persistenced and fabricmanager, and GDRMA (which wasn’t working at all, causing me to go down this rabbit hole of stripping everything away until I could figure out why).

postcert
0 replies
2h5m

I was going to say specialisations might be useful for you to keep a previous driver version around for testing but you might be past that point!

Having the ability to keep alternate configurations for $previous_kernel and $nvidia_stable have been super helpful in diagnosing instead of rolling back.

mepian
0 replies
19h51m

Yep, I'm on openSUSE Tumbleweed, and it's not rolled out there yet. I would rather wait than update my drivers out-of-band.

llmblockchain
3 replies
23h15m

I have a 3070 on X and it has been great.

levkk
2 replies
22h15m

Same setup here. Multiple displays don't work well for me. One of the displays doesn't often get detected after resuming screen saver.

llmblockchain
1 replies
21h37m

I have two monitors connected to the 3070 and it works well. The only issue I had was suspending, the GPU would "fall of the bus" and not get its power back when the PC woke up. I had to add the kernel line "pcie_aspm=off" to prevent the GPU from falling asleep.

So... not perfect, but it works.

josephg
0 replies
19h54m

Huh. I’m using 2 monitors connected to a 4090 on Linux mint - which is still using X11. It works flawlessly, including DPI scaling. Wake from sleep is fine too.

I haven’t tried wayland yet. Sounds like it might be time soon given other comments in this thread.

misterbishop
2 replies
22h48m

this is resolved in 555 (currently running 555.58.02). my asus zephyrus g15 w/ 3060 is looking real good on Fedora 40. there's still optimizations needed around clocking, power, and thermals. but the graphics presentation layer has no issues on wayland. that's with hybrid/optimus/prime switching, which has NEVER worked seamlessly for me on any laptop on linux going back to 2010. gnome window animations remain snappy and not glitchy while running a game. i'm getting 60fps+ running baldurs gate 3 @ 1440p on the low preset.

robviren
1 replies
22h33m

Had similar experience with my Legion 5i 3070 with Wayland and Nvidia 555, but my HDMI out is all screwed up now of course. Working on 550. One step forward and one step back.

misterbishop
0 replies
16h39m

is there a mux switch?

gmokki
0 replies
10h23m

I switched to Wayland 10 years ago when it became an option ok Fedora. First thing I had to do was to drop NVIDIA and switch to Intel GPU, and past 5 years to AMD GPU. Makes a big difference if the upstream kernel is supported.

Maybe NVIDIA drivers have kind of worked on 12 month old kernels that Ubuntu on average uses.

anon291
17 replies
23h11m

I've literally never had an issue in decades of using NVIDIA and linux. They're closed source, but the drivers work very consistently for me. NVIDIA's just the only option if you want something actually good and to run ML workloads as well.

sqeaky
10 replies
23h5m

but the drivers work very consistently for me

The problem with comments like this is that you never know if you will be me or you on your graphics card or laptop.

I have tried nvidia a few times and kept getting burnt. AMD just works. I don't get the fastest ML machine, but I am just a tinkerer there and OpenCL works fine for my little toy apps and my 7900XTX blazes through every wine game.

If you need it professionally than you need it, warts an all. For any casual user that 10% extra gaming performance needs to weighed against reliability.

Workaccount2
7 replies
22h40m

It also depends heavily on the user.

A mechanic might say "This car has never given me a problem" because the mechanic doesn't consider cleaning an idle bypass circuit or adjusting valve clearances to be a "problem". To 99% percent of the population though, those are expensive and annoying problems because they have no idea what those words even mean, much less the ability to troubleshoot, diagnose, and repair.

chasil
3 replies
22h0m

If you use a search engine for "Torvalds Nvidia" you will discern a certain attitude towards Nvidia as a corporation and its products.

This might provide you a suggestion that alternate manufacturers should be considered.

I have confirmed this to be the case on Google and Bing, so DuckDuckGo and Startpage will also exhibit this phenomena.

Dylan16807
1 replies
16h40m

An opinion on support from over ten years ago is not a very strong suggestion.

chasil
0 replies
34m

Your problem there is that both search engines place this image and backstory at the top of the results, so neither Google nor Bing agree with any of you.

If you think they're wrong, be sure to let them know.

dahart
0 replies
2h51m

Torvalds has said nasty mean things to a lot of people in the past, and expressed regret over his temper & hyperbole. Try searching for something more recent https://youtu.be/wvQ0N56pW74

lyu07282
2 replies
20h59m

a lot has probably to do with not really understanding their distributions package manager and lkms specifically, I also always suspected that most Linux users don't know if they are using Wayland or X11 and the issues they had were actually Wayland specific ones they wouldn't have with Nvidia/x11 and come to think of it, how would they even know if it's a GPU driver issue in the first place? Guess I'm the mechanic in your analogy.

vetinari
0 replies
3h32m

If there's an issue with Nvidia/Wayland and there isn't with AMD/Wayland or Intel/Wayland, it is Nvidia issue then, not Wayland one.

sqeaky
0 replies
20h55m

When I run Gentoo or Arch, I know. But when I run Ubuntu or Fedora, should I have needed to know?

On plenty of distros "I want to install it and forget about is reasonable" and on both Gentoo and Ubuntu I have rebooted from a working system into a system where the display stopped working, at least on Gentoo I was ready because I broke it somehow.

lmm
1 replies
18h45m

AMD just works. I don't get the fastest ML machine, but I am just a tinkerer there and OpenCL works fine for my little toy apps and my 7900XTX blazes through every wine game.

That's the opposite of my experience. I'd love to support open-source. But the AMD experience is just too flaky, too card-dependent. NVidia is rock-solid (maybe not for Wayland, but I never wanted Wayland in the first place).

sqeaky
0 replies
3h28m

What kind of flakiness? The only AMD GPU problem I have had involved a lightning strike killing a card while I was gaming.

My nvidia problems are generally software and update related. The NVidia stuff usually works on popular distros, but as soon anything custom or a surprise update happens then there is a chance things break.

resoluteteeth
0 replies
18h45m

Are you using wayland or are you still on x11? My experience was that the closed source drivers were fine with x11 but a nightmare with wayland.

pizza234
0 replies
19h45m

Up to a couple of years ago, before permanently moving to AMD GPUs, I couldn't even boot Ubuntu with an Nvida GPU. This was because Ubuntu booted by default with Nouveau, which didn't support a few/several series (I had at least two different series).

The cards worked fine with binary drivers once the system was installed, but AFAIR, I had to integrate the binary driver packages in the Ubuntu ISO in order to boot.

I presume that now, the situation is much better, but necessiting binary drivers can be a problem in itself.

l33tman
0 replies
21h31m

Same here, been using the nvidia binary drivers on a dozen computers with various other HW and distros for decades with never any problems whatsoever.

isatty
0 replies
9h43m

Likewise. Rock solid for decades in intel + nvidia proprietary drivers even when doing things like hot plugging for passthroughs.

bobajeff
0 replies
22h41m

I did when my card stopped being supported by all the distros because it was too old while the legacy driver didn't fully work the same.

Keyframe
0 replies
10h51m

Me too. Now I have a laptop with discrete nvidia and an eGPU with 3090 in it, a desktop with 4090, another laptop with another discrete nvidia.. all switching combinations work, acceleration works, game performance is on par with windows (even with proton to within a small percentage or even sometimes better). All out of the box with stock Ubuntu and installing driver from Nvidia site.

The only "trick" is I'm still on X11 and probably will stay. Note that I did try wayland on few occasions but I steered away (mostly due to other issues with it at the time).

segmondy
1 replies
22h27m

plug, install then play, I got 3 different Nvidia GPU sets and all running without any issue, nothing crazy to do but follow installation instructions.

anonym29
0 replies
15h44m

To some of us, running any closed source software in userland qualifies as quite crazy indeed.

green-salt
1 replies
22h56m

Whatever pop_os uses has been quite stable for my 4070.

tormeh
0 replies
22h38m

Pop uses X by default because of Nvidia.

drdaeman
1 replies
19h34m

3090 owner here.

Wayland is even worse mess than it normally is. Used to flicker real bad before 555.58.02, less so with the latest driver - but still has some glitches with games. A bunch of older Electron apps still fail to render anything and require hardware acceleration disabled. I gave up trying to make it all work - can't get rid of all the flicker and drawing issues, plus Wayland seems to be a real pain in the ass with HiDPI displays.

X11 sort of works, but I had to entirely disable DPMS or one of my monitors never comes back online after going to sleep. I thought it was my KVM messing up, but that happened even with a direct connection... no idea what's going on there.

CUDA works fine, save for the regular version compatibility hiccups.

senectus1
0 replies
18h52m

4070ti super here, X11 is fine, i have zero issues.

Wayland is mostly fine, though i get some windowframe glitches when maxing them to the monitor and a another issue that i'm pretty sure is wayland but it has obnly happened a couple of times and it locks the whole device up. I cant prove it yet.

art0rz
1 replies
23h9m

I've been running Arch with KDE under Wayland on two different laptops both with NVIDIA GPUs using proprietary drivers for years and have not run into issues. Maybe I'm lucky? It's been flawless for me.

lyu07282
0 replies
21h35m

The experiences always vary quite a lot, it depends so much on what you do with it. For example discord doesn't support screen sharing with Wayland, it's just one small example but those can add up over time. Another example is display rotation which was broken in kde for a long time (recently fixed).

tgsovlerkhgsel
0 replies
11h29m

My experience with an AMD iGPU on Linux was so bad that my next laptop will be Intel. Horrible instability to the point where I could reliably crash my machine by using Google Maps for a few minutes, on both Chrome and Firefox. It got fixed eventually - with the next Ubuntu release, so I had a computer where I was afraid to use anything with WebGL for half a year.

tadasv
0 replies
23h30m

great. rtx 4090 works out of the box after installing drivers from non-free. That's on debian bookworm.

mathfailure
0 replies
19h37m

Depends on the version of drivers: 550 version results into black screen (you have to kill and restart X server) after waking up from sleep. 535 version doesn't have this bug. Don't know about 555.

Also tearing is a bitch. Still. Even with ForceCompositionPipeline.

jppittma
0 replies
23h23m

4070 worked out of the box on my arch system. I used the closed source drivers and X11 and I've not encountered a single problem.

My prediction is that it will continue to improve if only because people want to run nvidia on workstations.

jcranmer
0 replies
23h20m

I built my new-ish computer with an AMD GPU because I trusted in-kernel drivers better than out-of-kernel DKMS drivers.

That said, my previous experience with the DKMS driver stuff hasn't been bad. If you use Nvidia's proprietary driver stack, then things should generally be fine. The worst issues are that Nvidia has (historically, at least; it might be different for newer cards) refused to implement some graphics features that everybody else uses, which means that you basically need entirely separate codepaths for Nvidia in window managers, and some of them have basically said "fuck no" to doing that.

devwastaken
0 replies
15h25m

KDE plasma 6 + Nvidia beta 555 works well. Have to make .desktop files to launch some applications explicitly Wayland.

adrian_b
0 replies
13h46m

I am not using Wayland and I do not have any intention to use it, therefore I do not care for any problems caused by Wayland not supporting NVIDIA and demanding that NVIDIA must support Wayland.

I am using only Linux or FreeBSD on all my laptop, desktop or server computers.

On desktop and server computers I did not ever have the slightest difficulty with the NVIDIA proprietary drivers, either for OpenGL or for CUDA applications or for video decoding/encoding or for multiple monitor support, with high resolution and high color depth, on either Gentoo/Funtoo Linux or FreeBSD, during the last two decades. I also have AMD GPUs, which I use for compute applications (because they are older models, which still had FP64 support). For graphics applications they frequently had annoying bugs, unlike NVIDIA (however my AMD GPUs have been older models, preceding RDNA, which might be better supported by the open-source AMD drivers).

The only computers on which I had problems with NVIDIA on Linux were those laptops that used the NVIDIA Optimus method of coexistence with the Intel integrated GPUs. Many years ago I have needed a couple of days to properly configure the drivers and additional software so that the NVIDIA GPU was selected when desired, instead of the Intel iGPU. I do not know if any laptops with NVIDIA Optimus still exist. The laptops that I bought later had video outputs directly from the NVIDIA GPU, so there was no difference between them and desktops and the NVIDIA drivers worked flawlessly.

Both on Gentoo/Funtoo Linux and FreeBSD I never had to do anything else but to give the driver update command and everything worked fine. Moreover, NVIDIA has always provided a nice GUI application "NVIDIA X Server Settings", which provides a lot of useful information and which makes very easy any configuration tasks, like setting the desired positions of multiple monitors. A few years ago there was nothing equivalent for the AMD or Intel GPU drivers, but that might have changed meanwhile.

DaoVeles
0 replies
20h57m

I have never had an issue with them. That said I typically go mid range on cards so they are usually hardened architecture due to a year or two of being in the high end.

bradyriddle
39 replies
23h30m

I remember Nvidia getting hacked pretty bad a few years ago. IIRC, the hackers threatened to release everything they had unless they open sourced their drivers. Maybe they got what they wanted.

[0] https://portswigger.net/daily-swig/nvidia-hackers-allegedly-...

dralley
28 replies
23h10m

I doubt it. It's probably a matter of constantly being prodded by their industry partners (i.e. Red Hat), constantly being shamed by the community, and reducing the amount of maintenance they need to do to keep their driver stack updated and working on new kernels.

The meat of the drivers is still proprietary, this just allows them to be loaded without a proprietary kernel module.

kabes
12 replies
21h46m

It's hard to believe one of the highest valued companies in the world cares about being shamed for not having open source drivers.

commodoreboxer
5 replies
21h18m

They care when it affects their bottom line, and customers leaving for the competition does that.

I don't know if that's what's happening here, honestly, but you're right that they don't care about being shamed, but building a reputation of being hard to work with and target, especially in a growing market like Linux (still tiny, but growing nonetheless, and becoming significantly more important in the areas where non-gaming GPU use is concerned) can start to erode sales and B2B relationships, and the latter particularly if you make the programmers and PMs hate using your products.

bryanlarsen
3 replies
20h51m

in a growing market like Linux

Isn't Linux 80% of their market? ML et al is 80% of their sales, and ~99% of that is Linux.

fngjdflmdflg
1 replies
20h39m

True, although note that the Linux market itself is increasing in size due to ML. Maybe "increasingly dominant market" is a better phrase here.

bryanlarsen
0 replies
2h36m

Hah, good point. The OP was pedantically correct. The implication in "growing market share" is that "market share" is small, but that's definitely reading between the lines!

lmm
0 replies
18h48m

Right, and that's where most of their growth is.

gessha
0 replies
17h27m

customers leaving for the competition does that

What competition?

I do agree that companies don’t really care for public sentiment as long as business is going as usual. Nvidia is printing money with their data center hardware [1] where half of their yearly revenue comes from.

https://nvidianews.nvidia.com/news/nvidia-announces-financia...

nailer
4 replies
21h35m

Having products that require a bunch of extra work due to proprietary drivers, especially when their competitors don't require that work, is not good.

josefx
3 replies
12h54m

The biggest chunk of that "extra work" would be installing Linux in the first place, given that almost everything comes with Windows out of the box. An additional "sudo apt install nvidia-drivers" isn't going to stop anyone who already got that far.

sam_bristow
0 replies
11h51m

Does the "everything comes with Windows out of the box" still apply for the servers and workstations where I imagine the vast majority of these high-end GPUs are going these days?

nailer
0 replies
3h43m

Most cloud instances come with Linux out of the box.

Arch-TK
0 replies
51m

Tainted kernel. Having to sort out secure boot problems caused by use of an out of tree module. DKMS. Annoying weird issues with different kernel versions and problems running the bleeding edge.

ZeroCool2u
0 replies
18h18m

I mean I've personally given our Nvidia rep some light hearted shit for it. Told him I'd appreciate if he passed the feedback up the chain. Can't hurt to provide feedback!

p_l
11 replies
22h22m

I suspect it's mainly the reduced maintenance and reduction of workload needed to support, especially with more platforms coming to be supported (not so long ago there was no ARM64 nvidia support, now they are shipping their own ARM64 servers!)

What really changed the situation is that Turing architecture GPUs bring new, more powerful management CPU, which has enough capacity to essentially run the OS-agnostic parts of driver that used to be provided as blob on linux.

knotimpressed
10 replies
20h2m

Am I correct in reading that as Turing architecture cards include a small CPU on the GPU board, running parts of the driver/other code?

p_l
9 replies
19h14m

In Turing microarchitecture, nVidia replaced their old "falcon" cpu with NV-RISCV RV64 chip, running various internal tasks.

"Open Drivers" from nVidia include different firmware that utilizes the new-found performance.

matheusmoreira
8 replies
10h46m

How well isolated is this secondary computer? Do we have reason to fear the proprietary software running on it?

p_l
7 replies
10h27m

As well isolated as anything else on the bus.

So you better actually use IOMMU

stragies
3 replies
7h42m

Ah, yes, the magical IOMMU controller, that everybody just assumes to be implemented perfectly across the board. I'm expecting this to be like Hyperthreading, where we find out 20 years later, that the feature was faulty/maybe_bugdoored since inception in many/most/all implementations.

Same thing with USB3/TB-controllers, NPUs, etc that everybody just expects to be perfectly implemented to spec, with flawless firmwares.

p_l
2 replies
6h18m

It's not perfect or anything, but it's usually a step up[1], and the funniest thing is that GPUs generally had less of ... "interesting" compute facilities to jump over from, just easier to access usually. My first 64 bit laptop, my first android smartphone, first few iPhones, had more MIPS32le cores with possible DMA access to memory than the main CPU cores, and that was just counting one component of many (the wifi chip).

Also, Hyperthreading wasn't itself faulty or "bugdoored". The tricks necessary to get high performance out of CPUs were, and then there was intel deciding to drop various good precautions in name of still higher single core performance.

Fortunately, after several years, IOMMU availability becomes more common (current laptop I'm writing this on has proper separate groups for every device it seems)

[1] There's always the OpenBSD of navel gazing about writing "secure" C code, becoming slowly obsolescent thanks to being behind in performance and features, and ultimately getting pwned because your C focus and not implementing "complex" features helping mitigate access results in pwnable SMTPd running as root.

stragies
1 replies
3h59m

All fine and well, but I always come back to "If I were a manufacturer/creator of some work/device/software, that does something in the plausible realm of 'telecommunication', how do make sure, that my product can always comply with https://en.wikipedia.org/wiki/Lawful_interception requests? Allow for ingress/egress of data/commands at as low a level as possible!"

So as a chipset creator company director it would seem like a no-brainer to me to have to tell my engineers unfortunately to not fix some exploitable bug in the IOMMU/Chipset. Unless I want to never sell devices that could potentially be used to move citizens internet packets around in a large scale deployment.

And implement/not_fix something similar in other layers as well, e.g. ME.

p_l
0 replies
25m

If your product is supposed to comply with Lawful Interception, you're going to implement proper LI interfaces, not leave bullshit DMA bugs in.

The very point of Lawful Interception involves explicit, described interfaces, so that all parties involved can do the work.

The systems with LI interfaces also often end up in jurisdictions that simultaneously put high penalties on giving access to them without specific authorizations - I know, I had to sign some really interesting legalese once due to working in environment where we had to balance both Lawful Interception, post-facto access to data, and telecommunications privacy laws.

Leaving backdoors like that is for Unlawful Interception, and the danger of such approaches is greatly exposed in form of Chinese intelligence services exploiting NSA backdoor in Juniper routers (infamous DRBG_EC RNG)

matheusmoreira
2 replies
6h9m

you better actually use IOMMU

Is this feature commonly present on PC hardware? I've only ever read about it in the context of smartphone security. I've also read that nvidia doesn't like this sort of thing because it allows virtualizing their cards which is supposed to be an "enterprise" feature.

brendank310
1 replies
4h56m

Relatively common nowadays. It used to be delineated as a feature in Intel chips as part of their vPro line, but I think it’s baked in. Generally an IOMMU is needed for performant PCI passthrough to VMs, and Windows uses it for DeviceGuard which tries to prevent DMA attacks.

p_l
0 replies
24m

Seems to me that Zen 4 has no issues at all, but bridges/switches require additional interfaces to further fan-out access controls.

chillfox
2 replies
17h49m

Nvidia has historically given zero fucks about the opinions of their partners.

So my guess is it's to do with LLMs. They are all in on AI, and having more of their code be part of training sets could make tools like ChatGPT/Claude/Copilot better at generating code for Nvidia GPUs.

jmorenoamor
0 replies
12h49m

I also see this as the main reason. GPU drivers for Linux, as far as I know, were just a niche use case, maybe CUDA planted a small seed, and the AI hype is the flower. Now the industry, not the users, demand drivers, so this became a demanded feature instead of a niche user wish.

A bit sad, but hey, welcome anyways.

da_chicken
0 replies
6h54m

Yup. nVidia wants those fat compute center checks to keep coming in. It's an unsaturated market, unlike gaming consoles, home gaming PCs, and design/production workstations. They got a taste of that blockchain dollar, and now AI looks to double down on the demand.

The best solution is to have the industry eat their dogfood.

nicce
4 replies
23h14m

Kernel modules are not user-space drivers which are still proprietary.

bradyriddle
2 replies
19h44m

Ooops. Missed that part.

Re-reading that story is kind of wild. I don't know how valuable what they allegedly got would be (silicon, graphics and chipset files) but the hackers accused Nvidia of 'hacking back' and encrypting their data.

Reminds me of a story I heard about Nvidia hiring a private military to guard their cards after entire shipments started getting 'lost' somewhere in asia.

spookie
1 replies
17h18m

Wait what? That PMC story got me. Where can I find more info on that lmao?

porphyra
0 replies
23h9m

Much of the black magic has been moved from the drivers to the firmware anyway.

justinclift
3 replies
16h23m

For Nvidia, the most likely reason they've strongly avoided Open Sourcing their drivers isn't anything like that.

It's simply a function of their history. They used to have high priced professional level graphics cards ("Nvidia Quadro") using exactly the same chips as their consumer graphics cards.

The BIOS of the cards was different, enabling different features. So people wanting those features cheaply would buy the consumer graphics cards and flash the matching Quadro BIOS to them. Worked perfectly fine.

Nvidia naturally wasn't happy about those "lost sales", so began a game of whack-a-mole to stop BIOS flashing from working. They did stuff like adding resistors to the boards to tell the card whether it was a Geforce or Quadro card, and when that was promptly reverse engineered they started getting creative in other ways.

Meanwhile, they couldn't really Open Source their drivers because then people could see what the "Geforce vs Quadro" software checks were. That would open up software countermeasures being developed.

---

In the most recent few years the professional cards and gaming cards now use different chips. So the BIOS tricks are no longer relevant.

Which means Nvidia can "safely" Open Source their drivers now, and they've begun doing so.

--

Note that this is a copy of my comment from several months ago, as it's just as relevant now as it was then: https://news.ycombinator.com/item?id=38418278

1oooqooq
1 replies
15h1m

interesting timing to recall that story. now the same trick is used for h100 vs whatever the throttled-for-embargo-wink-wink Chinese version is called.

but those companies are really adverse to open sourcing because they can't be sure they own all the code. it's decades of copy pasting reference implementations after all

rfoo
0 replies
12h2m

now the same trick is used for h100 vs whatever the throttled-for-embargo-wink-wink Chinese version

No. H20 is a different chip designed to be less compute-dense (by having different combinations of SM/L2$/HBM controller). It is not a throttled chip.

A800 and H800 are A100/H100 with some area of the chip physically blown up and reconfigured. They are also not simply throttled.

SuperNinKenDo
0 replies
14h49m

Very interesting, thanks for the perspective. I suspect all the recent loss of face they experienced with the transition to Wayland happening around the time that this motivation evaporated also probably plays a part too though.

I swore off ever again buying Nvidia, or any laptops that come with Nvidia, after all this. Maybe in 10 years they'll have managed to right the brand perceptions of people like myself.

nicman23
0 replies
7h24m

they did release it. a magic drive i have seen, but totally do not own, has it

snailmailman
12 replies
23h18m

Better as of extremely recently. Explicit sync fixes most of the issues with flickering that I’ve had on Wayland. I’ve been using the latest (beta?) driver for a while because of it.

I’m using Hyprland though so explicit sync support isn’t entirely there for me yet. It’s actively being worked on. But in the last few months it’s gotten a lot better

JasonSage
11 replies
22h14m

Better as of extremely recently.

Yup. Anecdotally, I see a lot of folks trying to run wine/games on Wayland reporting flickering issues that are gone as of version 555, which is the most recent release save for 560 coming out this week. It's a good time to be on the bleeding edge.

hulitu
3 replies
21h57m

You can always use X11. /s

bornfreddy
2 replies
11h20m

I know that was a joke, but - as someone who is still on X, what am I missing? Any practical advantages to using Wayland when using a single monitor on desktop computer?

vetinari
1 replies
3h39m

Even that single monitor can be hidpi, vrr or hdr (this one is still wip).

Arch-TK
0 replies
31m

I have a 165 DPI monitor. This honestly just works with far less hassle on X. I don't have to listen to anyone try to explain to me how fractional scaling doesn't make sense (real explanation for why it wasn't supported). I don't have to deal with some silly explanation for why XWayland applications just can't be non-blurry with a fractional or non-1 scaling factor. I can just set the DPI to the value I calculated and things work in 99% of cases. In 0.9% of the remaining cases I need to set an environment variable or pass a flag to fix a buggy application and in the 0.1% of cases I need to make a change to the code.

VRR has always worked for me on single monitor X. I use it on my gaming computer (so about twice a year).

Fr0styMatt88
3 replies
16h53m

On latest NixOS unstable and KDE + Wayland is still a bit of a dumpster fire for me (3070 + latest NV drivers). In particular there’s a buffer wait bug in EGL that needs fixing on the Nvidia side that causes the Plasma UI to become unresponsive. Panels are also broken for me, with icons not showing.

Having said that, the latest is a pain on X11 right now as well, with frequent crashing of Plasma, which atleast restarts itself.

There’s a lot of bleeding on the bleeding edge right at this moment :)

JasonSage
2 replies
16h48m

That's interesting, maybe it's hardware-dependent? I'm doing nixos + KDE + Wayland and I've had almost no issues in day-to-day usage and productivity.

I agree with you that there's a lot of bleeding. Linux is nicer than it used to be and there's less fiddling required to get to a usable base, but still plenty of fiddling as you get into more niche usage, especially when it involves any GPU hardware/software. Yet somehow one can run Elden Ring on Steam via Proton with a few mouse clicks and no issues, which would've been inconceivable to me only a few years ago.

Fr0styMatt88
1 replies
12h36m

Yeah it’s pretty awesome overall. I think the issues are from a few things on my end:

- I’ve upgraded through a few iterations starting with Plasma 6, so my dotfiles might be a bit wonky. I’m not using Home Manager so my dotfiles are stateful.

- Could be very particular to my dock setup as I have two docks + one of the clock widgets.

- Could be the particular wallpaper I’m using (it’s one of the dynamic ones that comes with KDE).

- It wouldn’t surprise me if it’s related to audio somehow as I have Bluetooth set-up for when I need it.

I’m sure it’ll settle soon enough :)

postcert
0 replies
2h34m

I've been having a similar flakiness with plasma on Nixos (proprietary + 3070 as well). Sadly can't say whether it did{n't} happen on another distro as I last used Arch around the v535 driver.

I found it funny how silently it would fail at times. After coming out of a game or focusing on something I'd scratch my head as to where did the docks/background went. I'd say you're lucky in that it recovered itself, generally I needed to run `plasmashell` in the alt+f2 run prompt.

asyx
2 replies
21h55m

I think it's X11 stuff that is using Vulkan for rendering that is still flickering in 555. This probably affects pretty much all of Proton / Wine gaming.

doix
1 replies
4h31m

Any specific examples that you know should be broken? I am on X11 with 555 drivers and an nvidia gpu. I don't have any flickering when I'm gaming, it's actually why I stay on X11 instead of transitioning to wayland.

johnny22
0 replies
39m

They are probably talking about running the game in a wayland session via xwayland, since wine's wayland driver is not part of proton yet.

modzu
3 replies
10h4m

why switch to amd and not just switch to X? :D

whalesalad
1 replies
6h56m

once you go Wayland you usually don’t go back :)

kiney
0 replies
6h4m

I tested wayland for a while to see what the hype is about. No uoside lits of small workflows broken. Back to Xorg it was.

account42
0 replies
9h33m

Why not both?

joecool1029
0 replies
21h36m

It's buggy still with sway on nvidia. I really thought the 555 driver would wrinkle out last of the issues but it still has further to go. Switched to kde plasma 6 on wayland since then and it's been great, not buggy at all.

XorNot
0 replies
15h50m

Easy Linux use is what keeps me firmly on AMD. This move may earn them a customer.

berkeleyjunk
16 replies
23h34m

As someone who is pretty skeptical and reads the fine print, I think this is a good move and I really do not see a downside (other than the fact that this probably strengthens the nVidia monoculture).

vlovich123
15 replies
23h23m

AFAIK I believe all they did was move the closed source user space driver code to their opaque firmware blob leaving a thin shim in the kernel.

In essence I don’t believe that much has really changed here.

stkdump
8 replies
23h12m

But the firmware runs directly on the hardware, right? So they effectively rearchitected their system to move what used to be 'above' the kernel to 'below' the kernel, which seems like a huge effort.

vlovich123
5 replies
22h56m

It’s some effort but I bet they added a classical serial CPU to run the existing code. In fact, [1] suggests that’s exactly what they did. I suspect they had other reasons to add the GSP so the amortized cost of moving the driver code to firmware was actually not that large all things considered and in the long term reduces their costs (eg they reduce the burden further of supporting multiple OSes, they can improve performance further theoretically, etc etc)

[1] https://download.nvidia.com/XFree86/Linux-x86_64/525.78.01/R...

p_l
4 replies
22h12m

That's exactly what happened - Turing microarchitecture brought in new[1] "GSP" which is capable enough to run the task. Similar architecture happens AFAIK on Apple M-series where the GPU runs its own instance of RTOS talking with "application OS" over RPC.

[1] Turing GSP is not the first "classical serial CPU" in nvidia chips, it's just first that has enough juice to do the task. Unfortunately without recalling the name of the component it seems impossible to find it again thanks to search results being full of nvidia ARM and GSP pages...

mepian
1 replies
21h58m

the name of the component

Falcon?

p_l
0 replies
20h50m

THANK YOU, that was the name I was forgetting :)

here's[1] a presentation from nvidia regarding (unsure if done or not) plan for replacing Falcon with RISC-V, [2] suggests the GSP is in fact the "NV-RISC" mentioned in [1]. Some work on reversing Falcon was apparently done for Switch hacking[3]?

[1] https://riscv.org/wp-content/uploads/2016/07/Tue1100_Nvidia_... [2] https://www.techpowerup.com/291088/nvidia-unlocks-gpu-system... [3] https://github.com/vbe0201/faucon

knotimpressed
1 replies
19h48m

Would you happen to have a source or any further readings about Apple M-series GPUs running their own RTOS instance?

imtringued
1 replies
22h43m

Why? It should make it much easier to support Nvidia GPUs on Windows, Linux, Arm/x86/RISC-V and more OSes with a single firmware codebase per GPU now.

stkdump
0 replies
22h19m

Yes makes sense, in the long run it should make their life easier. I just suspect that the move itself was a big effort. But probably they can afford that nowadays.

adrian_b
5 replies
13h5m

Having as open-source all the kernel, more precisely all the privileged code, is much more important for security than having as open-source all the firmware of the peripheral devices.

Any closed-source privileged code cannot be audited and it may contain either intentional backdoors, or, more likely, bugs that can cause various undesirable effects, like crashes or privilege escalation.

On the other hand, in a properly designed modern computer any bad firmware of a peripheral device cannot have a worse effect than making that peripheral unusable.

The kernel should take care, e.g. by using the I/O MMU, that the peripheral cannot access anything where it could do damage, like the DRAM not assigned to it or the non-volatile memory (e.g. SSDs) or the network interfaces for communicating with external parties.

Even when the peripheral is so important as the display, a crash in its firmware would have no effect if the kernel had reserved some key combination to reset the GPU (while I am not aware of such a useful feature in Linux, its effect can frequently be achieved by switching, e.g. with Alt+F1, to a virtual console and then back to the GUI, the saving and restoring of the GPU state together with the switching of the video modes being enough to clear some corruption caused by a buggy GPU driver or a buggy mouse or keyboard driver).

In conclusion, making the NVIDIA kernel driver as open source does not deserve to have its importance minimized. It is an important contribution to a more secure OS kernel.

The only closed-source firmware that must be feared is that which comes from the CPU manufacturer, e.g. from Intel, AMD, Apple or Qualcomm.

All such firmware currently includes various features for remote management that are not publicly documented, so you can never be sure if they can be properly disabled, especially when the remote management can be done wirelessly, like through the WiFi interface of the Intel laptop CPUs, so you cannot interpose an external firewall to filter the network traffic of any "magic" packets.

A paranoid laptop user can circumvent the lack of control over the firmware blobs from the CPU manufacturer by disconnecting the internal antennas and using an external cheap and small single-board computer for all wired and wireless network access, which must run a firewall with tight rules. Such a SBC should be chosen among those for which complete hardware documentation is provided, i.e. including its schematics.

stragies
2 replies
7h59m

Everything you wrote assumes the IOMMUs across the board to be 100% correctly implemented without errors/bugdoors.

People used to believe similar things about Hyperthreading, glitchability, ME, Cisco, boot-loaders, ... the list goes on.

adrian_b
1 replies
4h39m

There still is a huge difference between running privileged code on the CPU, for which there is nothing limiting what it can do, and code that runs on a device, which should normally be contained by the I/O MMU, except if the I/O MMU is buggy.

The functions of an I/O MMU for checking and filtering the transfers are very simple, so the probability of non-intentional bugs is extremely small in comparison with the other things enumerated by you.

stragies
0 replies
4h16m

Agreed, that the feature-set of IOMMU is fairly small, but is this function not usually included in one of the Chipset ICs, which do run a lot other code/functions alongside a (hopefully) faithful correct IOMMU routine?

Which -to my eyes- would increase the possibility of other system parts mucking with IOMMU restrictions, and/or triggering bugs.

saagarjha
1 replies
10h44m

Did you run this through a LLM? I'm not sure what the point is of arguing with yourself and bringing up points that seem tangential to what you started off talking about (…security of GPUs?)

adrian_b
0 replies
8h25m

I have not argued with myself. I do not see what made you believe this.

I have argued with "I don’t believe that much has really changed here", which is the text to which I have replied.

As I have explained, an open-source kernel module, even together with closed-source device firmware, is much more secure than a closed-source kernel module.

Therefore the truth is that a lot has changed here, contrary to the statement to which I have replied, as this change makes the OS kernel much more secure.

shanoaice
9 replies
6h57m

There is little meaning for NVIDIA to open-source only the driver portion of their cards, since they heavily rely on proprietary firmware and userspace lib (most important!) to do the real job. Firmware is a relatively small issue - this is mostly same for AMD and Intel, since encapsulation reduces work done on driver side and open-sourcing firmware could allow people to do some really unanticipated modification which might heavily threaten even commercial card sale. Nonetheless at least for AMD they still keep a fair share of work done by driver compared to Nvidia. Userspace library is the worst problem, since they handle a lot of GPU control related functionality and graphics API, which is still kept closed-source.

The best thing we can hope is improvement on NVK and RedHat's Nova Driver can put pressure on NVIDIA releasing their user space components.

gpderetta
4 replies
5h10m

It is meaningful because, as you note, it enables a fully opensource userspace driver. Of course the firmware is still proprietary and it increasingly contains more and more logic.

sscarduzio
0 replies
4h44m

Which in a way is good because the hardware will more and more perform identically on Linux as on Windows.

pabs3
0 replies
4h21m

The firmware is also signed, so you can't even do reverse engineering to replace it.

matheusmoreira
0 replies
4h11m

Doesn't seem like a bad tradeoff so long as the proprietary stuff is kept completely isolated with no access to any other parts of my system.

bayindirh
0 replies
2h43m

The GLX libraries are the elephant(s) in the room. Open source kernel modules mean nothing without these libraries. On the other hand AMD and Intel uses "pltform GLX" natively, and with great success.

matheusmoreira
1 replies
4h14m

Why is the user space component required? Won't they provide sysfs interfaces to control the hardware?

cesarb
0 replies
4h4m

It's something common to all modern GPUs, not just NVIDIA: most of the logic is in a user space library loaded by the OpenGL or Vulkan loader into each program. That library writes a stream of commands into a buffer (plus all the necessary data) directly into memory accessible to the GPU, and there's a single system call at the end to ask the operating system kernel to tell the GPU to start reading from that command buffer. That is, other than memory allocation and a few other privileged operations, the user space programs talk directly to the GPU.

AshamedCaptain
1 replies
4h13m

I really don't know where this crap about "Moving everything to the firmware" is coming from. The kernel part of the nvidia driver has always been small, and this is the only thing they are open-sourcing (they have been announcing it for months now......). The immense majority of the user-space driver is still closed and no one has seen any indications that this may change.

I see no indications either that either nvidia nor any of the rest of the manufacturers has moved any respectable amount of functionality to the firmware. If you look at the opensource drivers you can even confirm by yourself that the firmware does practically nothing -- the size of the binary blobs of AMD cards are minuscule for example, and long are the times of ATOMBIOS. The drivers are literally generating bytecode-level binaries for the shader units in the GPU, what do you expect the firmware could even do at this point? Re-optimize the compiler output?

There was an example of a GPU that did move everything to the firmware -- the videocore on the raspberry pi, and it was clearly a completely distinct paradigm, as the "driver" would almost literally pass through OpenGL calls to a mailbox, read by the secondary ARM core (more powerful than the main ARM core!) that was basically running the actual driver as "firmware". Nothing I see on nvidia indicates a similar trend, otherwise RE-ing it would be trivial, as happened with the VC.

ploxiln
0 replies
4h4m

https://lwn.net/Articles/953144/

Recently, though, the company has rearchitected its products, adding a large RISC-V processor (the GPU system processor, or GSP) and moving much of the functionality once handled by drivers into the GSP firmware. The company allows that firmware to be used by Linux and shipped by distributors. This arrangement brings a number of advantages; for example, it is now possible for the kernel to do reclocking of NVIDIA GPUs, running them at full speed just like the proprietary drivers can. It is, he said, a big improvement over the Nouveau-only firmware that was provided previously.

There are a number of disadvantages too, though. The firmware provides no stable ABI, and a lot of the calls it provides are not documented. The firmware files themselves are large, in the range of 20-30MB, and two of them are required for any given device. That significantly bloats a system's /boot directory and initramfs image (which must provide every version of the firmware that the kernel might need), and forces the Nouveau developers to be strict and careful about picking up firmware updates.
floam
6 replies
22h57m

NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules

or

NVIDIA Transitions Towards Fully Open-Source GPU Kernel Modules?

slashdave
3 replies
21h41m

Not much point in a "partially" open-source kernel module.

floam
2 replies
20h3m

But “fully towards” is pretty ambiguous, like an entire partial implementation.

Anyhow I read the article, I think they’re saying fully as in exclusively, like there eventually will not be both a closed source and open source driver co-maintained. So “fully open source” does make more sense. The current driver situation IS partially open source, because their offerings currently include open and closed source drivers and in the future the closed source drivers may be deprecated?

einpoklum
1 replies
19h50m

See my answer. It's not going to be fully-open-source drivers, it's rather that all drivers will have open-source kernel modules.

slashdave
0 replies
19m

You can argue against proprietary firmware, but is this all that different from other types of devices?

j4hdufd8
1 replies
22h30m

haven't read it but probably the former

throwadobe
0 replies
22h19m

"towards" basically negates the "fully" before it for all real intents and purposes

benjiweber
4 replies
22h33m

I wonder if we'll ever get hdcp on nvidia. As much as I enjoy 480p video from streaming services.

viraptor
2 replies
18h42m

Which service goes that low? The ones I know limit you from using 4k, but anything up to 1080p works fine.

9991
1 replies
10h0m

Nonsense that a 1080p limit is acceptable for (and accepted by) paying customers.

viraptor
0 replies
9h50m

Depends. I disagree with HDCP in theory on ideological grounds. In practice, my main movie device is below 720p (projector), so it will take another decade before it affects me in any way.

ozgrakkurt
0 replies
14h22m

Just download it to your pc. It is better user experience and costs less

asaiacai
4 replies
20h4m

I really hope this makes it easier to install/upgrade NVIDIA drivers on Linux. It's a nightmare to figure out version mismatches between drivers, utils, container-runtime...

riddley
2 replies
17h16m

A nightmare how? When i used their cards, I'd just download the .run and run it. Done.

jaimex2
0 replies
16h11m

After a reboot of coarse :)

Everything breaks immediately otherwise.

amelius
0 replies
5h28m

And when it doesn't work, what do you do then?

Exactly, that's when the nightmare starts.

einpoklum
0 replies
19h52m

From my limited experience with their open-sourcing of kernel modules so far: It doesn't make things easier; but - the silver lining is that, for the most part, it doesn't make installation and configuration harder! Which is no small thing actually.

Animats
4 replies
23h13m

NVidia revenue is now 78% from "AI" devices.[1] NVidia's market cap is now US$2.92 trillion. (Yes, trillion.) Only Apple and Microsoft can beat that. Their ROI climbed from about 10% to 90% in the last two years. That growth has all been on the AI side.

Open-sourcing graphics drivers may indicate that NVidia is moving away from GPUs for graphics. That's not where the money is now.

[1] https://www.visualcapitalist.com/nvidia-revenue-by-product-l...

[2] https://www.macrotrends.net/stocks/charts/NVDA/nvidia/roi

joe_the_user
2 replies
23h3m

Well, Nvidia seems to be claiming in the article that this is everything, not just graphics drivers: "NVIDIA GPUs share a common driver architecture and capability set. The same driver for your desktop or laptop runs the world’s most advanced AI workloads in the cloud. It’s been incredibly important to us that we get it just right."

And For cutting-edge platforms such as NVIDIA Grace Hopper or NVIDIA Blackwell, you must use the open-source GPU kernel modules. The proprietary drivers are unsupported on these platforms. (These are two most advanced NVIDIA architectures currently)

Animats
1 replies
22h30m

That's interesting. I've been expecting the AI cards to diverge more from the graphics cards. AI doesn't need triangle fill, Z-buffering, HDMI out, etc. 16 bit 4x4 multiply/add units are probably enough. What's going on in that area?

p_l
0 replies
19h16m

TL;DR - there seems to be not that much improvement from dropping the "graphics-only" parts of the chip if you already have a GPU instead of breaking into AI market as your first product.

1. nVidia compute dominance is not due to hyperfocus on AI (that's Google's TPU for you, or things like intel's NPU in Meteor Lake), but because CUDA offers considerable general purpose compute. In fact, considerable revenue came and still comes from non-AI compute. This also means that if you figure out a novel mechanism for AI that isn't based around 4x4 matrix addition, or which mixes it with various other operations, you can do them inline. This also includes any pre and post processing you might want to do on the data.

2. The whole advantage they have in software ecosystem builds upon their PTX assembly. Having it compile to CPU and only implement the specific variant of one or two instructions that map to "tensor cores" would be pretty much nonsensical (especially given that AI is not the only market they target with tensor cores - DSP for example is another).

Additionally, a huge part of why nvidia built such a strong ecosystem is that you could take cheapest G80-based card and just start learning CUDA. Only some highest-end features are limited to most expensive cards, like RDMA and NVMe integration.

Compare this with AMD, where for many purposes only the most expensive compute-only cards are really supported. Or specialized AI only chips that are often programmable either in very low-level way or essentially as "set a graph of large-scale matrix operations that are limited subset of operations exposed by Torch/Tensorflow" (Google TPU, Intel Meteor Lake NPU, etc).

3. CUDA literally began with how evolution of shader model led to general purpose "shader processor" instead of specialized vector and pixel processors. The space taken by specialized hardware for graphics that isn't also usable for general purpose compute is pretty minimal, although some of it is omitted, AFAIK, in compute only cards.

In fact, some of the "graphics only" things like Z-buffering are done by the same logic that is used for compute (with limited amount of operations done by fixed-function ROP block), and certain fixed-function graphical components like texture mapping units are also used for high-performance array access.

4. Simplified manufacturing and logistics - nVidia uses essentially the same chips in most compute and graphics cards, possibly with minor changes achieved by changing chicken bits to route pins to different functions (as you mentioned, you don't need DP-outs of RTX4090 on an L40 card, but you can probably reuse the SERDES units to run NVLink on the same pins).

orbital-decay
0 replies
22h54m

It indicates nothing; they started it a few years ago, before that. They just transferred the most important parts of their driver to the (closed source) firmware, to be handled by the onboard ARM CPU, and open sourced the rest.

sillywalk
3 replies
22h48m

From the github repo[0]:

Most of NVIDIA's kernel modules are split into two components:

    An "OS-agnostic" component: this is the component of each kernel module that is independent of operating system.

    A "kernel interface layer": this is the component of each kernel module that is specific to the Linux kernel version and configuration.
When packaged in the NVIDIA .run installation package, the OS-agnostic component is provided as a binary:

[0] https://github.com/NVIDIA/open-gpu-kernel-modules

p_l
2 replies
22h25m

That was the "classic" drivers.

The new open source ones effectively move majority of the OS-agnostic component to run as blob on-GPU.

arghwhat
1 replies
21h50m

Not quite - it moves some logic to the GSP firmware, but the user-space driver is still a significant portion of code.

The exciting bits there is the work on NVK.

p_l
0 replies
20h58m

Yes, I was not including userspace driver in this, as a bit "out of scope" for the conversation :D

brrrrrm
3 replies
23h35m

Kernel is an overloaded term for GPUs. This is about the linux kernel

karamanolev
1 replies
23h23m

"... Linux GPU Kernel Modules" is pretty unambiguous to me.

brrrrrm
0 replies
21h22m

Yep the title was updated.

brrrrrm
0 replies
4h47m

Guh, wish i could delete this now that the title was updated. the original title (shown on the linked page) wasn't super clear

muhehe
2 replies
7h27m

What is GPU kernel module? Is it something like a driver for GPU?

qalmakka
1 replies
5h42m

Yes. In modern operating systems, GPU drivers usually consist in kernel component that is loaded inside of the kernel or in a privileged context, and a userspace component that talks with it and implements the GPU-specific part of the APIs that the windowing system and applications use. In the case of NVIDIA, they have decided to drop their proprietary kernel module in favour of an open one. Unfortunately, it's out of tree.

In Linux and BSD, you usually get all of your drivers with the system; you don't have to install anything, it's all mostly plug and play. For instance, this has been the case for AMD and Intel GPUs, which have a 100% open source stack. NVIDIA is particularly annoying due to the need to install the drivers separately and the fact they've got different implementations of things compared to anyone else, so NVIDIA users are often left behind by FOSS projects due to GeForce cards being more annoying to work with.

muhehe
0 replies
5h5m

Thanks. I'm not well versed in these things. It sounded like something you load into GPU (it reminded me old hp printer, which required firmware upload after start).

jdonaldson
2 replies
12h35m

It’s kind of surprising that these haven’t just been reverse engineered yet by language models.

special-K
1 replies
12h28m

That's simply not how LLMs work, and are actually awful at reverse engineering of any kind.

jdonaldson
0 replies
4h24m

Are you saying that they cant explain the contents of machine code in human readable format? Are you saying that they can’t be used in a system that iteratively evaluates combinations of inputs and check their results?

jcalvinowens
2 replies
20h32m

Throwing the tarball over the wall and saying "fetch!" is meaningless to me. Until they actually contribute a driver to the upstream kernel, I'll be buying AMD.

aseipp
1 replies
6h54m

You can just use Nouveau and NVK for that if you just need workstation graphics (and the open-gpu-modules source code/separate GSP release has been a big uplift to Nouveau too, at least.)

jcalvinowens
0 replies
3h59m

Nouveau is great, and I absolutely admire what the community around it has been able to achieve. But I can't imagine choosing that over AMD's first class upstream driver support today.

enoeht
2 replies
23h29m

didn't they say that many times before?

vlovich123
1 replies
23h21m

Not sure but with the Turing series they support having a cryptographically signed binary blob that they load on the GPU. So before where their kernel driver was a thin shim for the user space driver, now it’s a thin shim for the black box firmware loaded on the GPU

p_l
0 replies
19h5m

the scope of what the kernel interface provides didn't change, but what was previously a blob wrapped by source-provided "os interface layer" is now moved to run on GSP (RISC-V based) inside the GPU.

xyst
1 replies
20h25m

Nvidia has finally realize they couldn’t write drivers for their own hardware, especially for Linux.

Never thought I would see the day.

TeMPOraL
0 replies
20h19m

Suddenly they went from powering gaming to being the winners of the AI revolution; AI is Serious Cloud Stuff, and Serious Cloud Stuff means Linux, so...

smcleod
1 replies
22h20m

So does this mean actually getting rid of the binary blobs of microcode that are in their current ‘open’ drivers?

p_l
0 replies
22h12m

No, it means the blob from the "closed" drivers is moved to run on GSP.

pluto_modadic
1 replies
22h53m

damn, only for new GPUs.

mynameisvlad
0 replies
22h26m

For varying definitions of "new". It supports Turing and up, which was released in 2018 with the 20xx line. That's two generations back at this point.

magicloop
1 replies
20h35m

Remember that time when Linus looked at the camera and gave Nvidia the finger. Has that time now passed? Is it time to reconcile? Or are there still some gotchas?

jaimex2
0 replies
16h13m

These are kernel modules not the actual drivers. So the finger remains up.

Varloom
1 replies
7h50m

They know CUDA monopoly won't last forever.

aseipp
0 replies
6h53m

CUDA lives in userspace; this kernel driver release does not contain any of that. It's still very useful to release an open source DKMS driver, but this doesn't change anything at all about the CUDA situation.

v3ss0n
0 replies
6h41m

Thank You Nvidia hacker! You did it! The Lapasu$ team threaten a few years back that if nvidia is not going to release nvidia opensource they are gonna release their code. That lead nvidia to releasing first kernel opensource module in a few months later but it was quite incomplete. Now it seems they are opensourcing fully more.

sylware
0 replies
5h58m

Hopefully, we get a plain and simple C99 user space vulkan implementation.

shmerl
0 replies
20h20m

That's not upstream yet. But they supposedly showed some interesting in nova too.

rldjbpin
0 replies
10h46m

mind the wording they've used here - "fully towards open-source" and not "towards fully open-source".

big difference. almost nobody is going to give you the sauce hidden behind blobs. but i hope the dumb issues of the past (imagine using it on laptops with switchable graphics) go away slowly with this and it is not only for pleasing the enterprise crowd.

risho
0 replies
22h15m

does this mean you will be able to use NVK/Mesa and CUDA at the same time? The non mesa proprietary side of nvidia's linux drivers are such a mess and NVK is improving by the day, but I really need cuda.

resource_waste
0 replies
3h12m

This means Fedora can bundle it?

qalmakka
0 replies
19h41m

Well, it is something, even if it's still only the kernel module, and it will be probably never upstreamed anyway.

nikolayasdf123
0 replies
9h32m

hope linux gets first class open source gpu drivers.. and dare I hope that Go adds native support for GPUs too

nicman23
0 replies
7h24m

they are worthless. the main code is in the userspace

n3storm
0 replies
9h54m

I read "NVIDIA transitions fully Torvalds..."

matheusmoreira
0 replies
11h6m

Transition is not done until their drivers are upstreamed into the mainline kernel and ALL features work out of the box, especially power management and hybrid graphics.

john2x
0 replies
21h59m

Maybe that’s one way to retain engineers who are effectively millionaires.

gorkish
0 replies
3h8m

This is great. I've been having to build my own .debs of the OSS driver for some time because of the crapola NVIDIA puts in their proprietary driver that prevents it from working in a VM as a passthrough device. (just a regular whole-card passthru, not trying to use GRID/vGPU on a consumer card or anything)

NVIDIA can no longer get away with that nonsense when they have to show their code.

gigatexal
0 replies
9h3m

will this mean that we'll be able to remove the arbitrary distinctions between quadro and geforce cards maybe by hacking some configs or such in the drivers?

exabrial
0 replies
5h42m

Are Nvidia grace CPUs even available? I thought it was interesting they mentioned that.

einpoklum
0 replies
19h54m

The title of this statement is misleading:

NVIDIA is not transitioning to open-source drivers for its GPUs; most or all user-space parts of the drivers (and most importantly for me, libcuda.so) are closed-source; and as I understand from others, most of the logic is now in a binary blob that gets sent to the GPU.

Now, I'm sure this open-sourcing has its uses, but for people who want to do something like a different hardware backend for CUDA with the same API, or to clear up "corners" of the API semantics, or to write things in a different-language without going through the C API - this does not help us.

doctoboggan
0 replies
15h24m

My guess is Meta and/or Amazon told Nvidia that they would contribute considerable resources to development as long as the results were open source. Both companies bottom lines would benefit from improved kernel modules, and like another commenter said elsewhere, Nvidia doesn't have much to lose.

aussieguy1234
0 replies
13h2m

I'll update as soon at its in NixOS unstable. Hopefully this will change the mind of the sway maintainers to start supporting Nvidia cards, I'm using i3 and X but would like to try out Wayland.

Narhem
0 replies
13h5m

I cant wait to use linux without having to spend multiple weekends trying to get the right drivers to work.

CivBase
0 replies
15h40m

Too late for me. I tried switching to Linux years ago but failed because of the awful state of NVIDIA's drivers. Switched to AMD least year and it's been a breeze ever since.

Gaming on Linux with an NVIDIA card (especially an old one) is awful. Of course Linux gamers aren't the demographic driving this recent change of heart so I expect it to stay awful for a while yet.