AMD Funded a Drop-In CUDA Implementation Built on ROCm: It's Now Open-Source

Why would this not be AMD’s top priority among priorities? Someone recently likened the situation to an Iron Age where NVIDIA owns all the iron. And this sounds like AMD knowing about a new source of ore and not even being willing to sink a single engineer’s salary into exploration.

My only guess is they have a parallel skunkworks working on the same thing, but in a way that they can keep it closed-source - that this was a hedge they think they no longer need, and they are missing the forest for the trees on the benefits of cross-pollination and open source ethos to their business.

The problem with effectively supporting CUDA is that encourages CUDA adoption all the more strongly. Meanwhile, AMD will always be playing catch-up, forever having to patch issues, work around Nvidia/AMD differences, and accept the performance penalty that comes from having code optimised for another vendor's hardware. AMD needs to encourage developers to use their own ecosystem or an open standard.

With Nvidia controlling 90%+ of the market, this is not a viable option. They'd better lean hard into CUDA support if they want to be relevant.

A bit of story telling here:

IBM and Microsoft made OS/2. The first version worked on 286s and was stable but useless.

The second version worked only on 386s and was quite good, and even had wonderful windows 3.x compatibility. "Better windows than windows!"

At that point Microsoft wanted out of the deal and they wanted to make their newer version of windows, NT, which they did.

IBM now had a competitor to "new" windows and a very compatible version of "old" windows. Microsoft killed OS2 by a variety of ways (including just letting IBM be IBM) but also by making it very difficult for last month's version of OS/2 to run next month's bunch of Windows programs.

To bring this back to the point -- IBM vs Microsoft is akin to AMD vs Nvidia -- where nvidia has the standard that AMD is implementing, and so no matter what if you play in the backward compatibility realm you're always going to be playing catch-up and likely always in a position where winning is exceedingly hard.

As WOPR once said "interesting game; the only way to win is to not play."

IBM also made a whole bunch of strategic mistakes beyond that. Most importantly their hardware division didn't give a flying f about OS/2. Even when they had a 'better Windows' they did not actually use it themselves and didn't push it to other vendors.

Windows NT wasn't really relevant in that competition for much longer, only XP was finally for end consumers.

where nvidia has the standard that AMD is implementing, and so no matter what if you play in the backward compatibility realm you're always going to be playing catch-up

That's not true. If AMD starts adding their own features and have their own advantages, that can flip.

It only takes a single generation of hardware, or a single feature for things to flip.

Look at Linux and Unix. Its started out with Linux implementing Unix, and now the Unix are trying to add compatibility with with Linux.

Is SGI still the driving force behind OpenGL/Vulcan? Did you think it was a bad idea for other companies to use OpenGL?

AMD was successful against Intel with x86_64.

There are lots of example of the company making something popular, not being able to take full advantage of it in the long run.

Slapping a price tag of over $300 on OS/2 didn’t do IBM any favors either.

That's what happens when your primary business model is selling to the military. They had to pay what IBM charged them (within a small bit of reason) and it was incredibly difficult for them to pivot away from any path they chose in the 80's once they had chosen it.

However, that same logic doesn't apply to consumers, and since they continued to fail to learn that lesson now IBM doesn't even target the consumer market given that they never learned how to be competitive and could only ever effectively function when they had a monopoly or at least a vendor lock-in.

https://en.wikipedia.org/wiki/Acquisition_of_the_IBM_PC_busi...

Windows before NT was crap, so users had an incentive to upgrade. If there had existed a Windows 7 alternative that was near fully compatible and FOSS, I would wager Microsoft would have lost to it with Windows 8 and even 10. The only reason to update for most people was Microsoft dropping support.

For CUDA, it is not just AMD who would need to catch up. Developers also are not necessarily going to target the latest feature set immediately, especially if it only benefits (or requires) new hardware.

I accept the final statement, but that also means AMD for compute is gonna be dead like OS/2. Their stack just will not reach critical mass.

Todays linux OS's would have competed incredibly strongly against Vista and probably would have gone blow for blow against 7.

Proton, Wine, and all of the compatibility fixes and drive improvements that the community has made in the last 16 years has been amazing, and every day is another day where you can say that it has never been easier to switch away from Windows.

However, Microsoft has definitely been drinking the IBM koolaid a little to long and has lost the mandate of heaven. I think in the next 7-10 years we will reach a point where there is nothing Windows can do that linux cannot do better and easier without spying on you, and we may be 3-5 years from a "killer app" that is specifically built to be incompatible with Windows just as a big FU to them, possibly in the VR world, possibly in AR, and once that happens maybe, maybe, maybe it will finally actually be the year of the linux desktop.

However, Microsoft has definitely been drinking the IBM koolaid a little to long and has lost the mandate of heaven. I think in the next 7-10 years we will reach a point where there is nothing Windows can do that linux cannot do better and easier without spying on you

that's a fascinating statement with the clear ascendancy of neural-assisted algorithms etc.

my prediction is that in 10 years we are looking at the rise of tag+collection based filesystems and operating system paradigms. all of us generate a huge amount of "digital garbage" constantly, and you either sort it out into the important stuff, keep temporarily, and toss, or you accumulate a giant digital garbage pile. AI systems are gonna automate that process, it's gonna start on traditional tree-based systems but eventually you don't need the tree at all, AI is what's going to make that pivot to true tag/collection systems possible.

microsoft and macos are both clearly racing for this with the "AI os" concept. It's not just better relevance searches etc, it's going to be cognitive-load-reduction for the operator. I simply cannot see linux being able to keep up with this change, in the same way the kernel can't just switch to rust - at some point you are too calcified to ever do the big-bang rewrite if there is not a BDFL telling you that it's got to happen.

the downside of being "the bazaar" is that you are standards-driven and have to deal with corralling a million whiny nerds constantly complaining about "spying on me just like microsoft" and continuing to push in their own other directions (sysvinit/upstart/systemd factions, etc) and whatever else, on top of all the other technical issues of doing a big-bang rewrite. linux is too calcified to ever pivot away from being a tree-based OS and it's going to be another 2-3 decades before they catch up with "proper support for new file-organization paradigms" etc even in the smaller sense.

that's really just the tip of the iceberg on the things AI is going to change, and linux is probably going to be left out of most of those commercial applications despite being where the research is done. It's just too much of a mess and too many nerdlingers pushing back to ever get anything done. Unix will be represented in this new paradigm but not Linux - the commercial operators who have the centralization and fortitude to build a cathedral will get there much quicker, and that looks like MacOS or Solaris not linux.

IBM was also incompetent and the os/2 team in Boca was had some exceptional engineers but was packed witg mostly mediocre-to-bad ones, which is why so many things in OS/2 were bad and why IBM got upset for Microsoft contributing negative work to the project because their lines of code contribution was negative (they were rewriting a lot of inefficient bloated IBM code).

A lot went wrong with os/2. For CUDA, I think a better analogy is vhs. The standard, in the effective not open sense, is what it is. AMD sucks at software and views it as an expense rather than an advantage.

You would think that by now AMD realizes that poor software is what left them behind in the dust, and would have changed that mindset.

Most businesses understand the pain points of their suppliers very well, as they feel that pain and gave themselves organized around it.

They have a hard time to understand the pain points of their consumers, as they don't feel that pain, look trough their own organisation-coloured glases, and can't see the real pain points from the whiney-customer ones.

AMD probably thinks software ecosystems are the easy part, ready to take it on whenever they feel like it and throw a token amount at it. They've built a great engine, see the carossery as beneath them, and don't understand why the lazy customer wants them to build the rest of the car too.

I'm not in the gpu programming realm, so this observation might be inaccurate:

I think the case of cuda vs an open standard is different from os2 vs Windows because the customers of cuda are programmers with access to source code while the customers of os2 were end users trying to run apps written by others.

If your shrink-wrapped software didn't run on os2, you'd have no choice but to go buy Windows. Otoh if your ai model doesn't run on an AMD device and the issue is something minor, you can edit the shader code.

Intel embraced Amd64 ditching Itanium. Wasn't it a good decision that worked out well? Is it comparable?

In hindsight, yes, but just because a specific technology is leading an industry doesn’t mean it’s going to be the best option. It has to play out long enough for the market to indicate a preference. In this case, for better or worse, it looks like CUDA’s the preference.

It has to play out long enough for the market to indicate a preference

By what measures hasn't that happened already? CUDA been around and constantly improving for more than 15 years, and there is no competitors in sight so far. It's basically the de facto standard in many ecosystems.

There haven’t been any as successful, but there have been competitors. OpenCL, DirectX come to mind.

SYCL is the latest attempt that I'm aware of. It's still pretty active and may just work as it doesn't rely on video card manufactures to work out.

SYCL is the quasi-successor to OpenCL, built on the same flavor of SPIR-V. Various efforts are trying to run it on top of Vulkan Compute (which tends to be broadly support by modern GPU's) but it's non-trivial because the technologies are independently developed and there are some incompatibilities.

Intel & AMD have a cross-license agreement covering everything x86 (and x86_64) thanks to lots and lots of lawsuits over their many years of competition.

So while Intel had to bow to AMD's success and give up Itanium, they weren't then limited by that and could proceed to iterate on top of it.

Meanwhile it'll be a cold day in hell before Nvidia licenses anything about CUDA to AMD, much less allows AMD to iterate on top of it.

The problem with effectively supporting CUDA is that encourages CUDA adoption all the more strongly

Worked fine for MS with Excel supporting Lotus 123 and Word supporting WordPerfect's formats when those were dominant...

But MS controlled the underlying OS. Letting them both throw money at the problem, and (by accounts at the time) frequently tweak the OS in ways that made life difficult for Lotus, WordPerfect, Ashton-Tate, etc.

Last I checked, Lotus did themselves by not innovating, and betting on the wrong horse (OS/2) then not doing well on a pivot to Windows.

Meanwhile Excel was gaining features and winning users with them even before Windows was in play.

betting on the wrong horse (OS/2)

Ahhhh, your hindsight is well developed. I would be interested to know the background on the reasons why Lotus made that bet. We can't know the counterfactual, but Lotus delivering on a platform owned by their deadly competitor Microsoft would seem to me to be a clearly worrysome idea to Lotus at the time. Turned out it was an existentially bad idea. Did Lotus fear Microsoft? "DOS ain't done till Lotus won't run" is a myth[1] for a reason. Edit: DRDOS errors[2] were one reason Lotus might fear Microsoft. We can just imagine a narritive of a different timeline where Lotus delivered on Windows but did some things differently to beat Excel. I agree, Lotus made other mistakes and Microsoft made some great decisions, but the point remains.

We can also suspect that AMD have a similar choice now where they are forked. Depending on Nvidea/CUDA may be a similar choice for AMD - fail if they do and fail if they don't.

[1] http://www.proudlyserving.com/archives/2005/08/dos_aint_done...

[2] https://www.theregister.com/1999/11/05/how_ms_played_the_inc...

I've seen rumours from self-claimed ex-Lotus employees that IBM made a deal with Lotus to prioritise OS/2

This is a key point. Before windows we had all the dos players - WordPerfect was king. Microsoft was more focused on the Mac. I’ve always assumed that Microsoft understood that a GUI was coming and trained a generation of developers on the main gui of the day. Once windows came out the dos focused apps could not adapt in time

Microsoft could do that because they had the Operating System monopoly to leverage and take out both Lotus 123 and WordPerfect. Without the monopoly of the operating system they wouldn't of been able to Embrace, Extend, Extinguish.

https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...

Is it? Apple Silicon exists, but Apple created a translation layer above it so the transition could be smoother.

This is extremely different, apple was targeting end consumers that just want their app to run. The performance between apple rosetta and native cpu were still multiple times different.

People writing CUDA apps don't just want stuff to run, performance is an extremely important factor else they would target CPUs which are easier to program for.

From their readme: > On Server GPUs, ZLUDA can compile CUDA GPU code to run in one of two modes: > Fast mode, which is faster, but can make exotic (but correct) GPU code hang. > Slow mode, which should make GPU code more stable, but can prevent some applications from running on ZLUDA.

The performance between apple rosetta and native cpu were still multiple times different.

Not at all, the performance hit was in the low 10s %, before natively supporting Apple Silicon most of the apps I use for music/video/photography didn't seem to have a performance impact at all, even more when the M1 machines were so much faster than the Intels.

The performance between apple rosetta and native cpu were still multiple times different.

Rosetta 2 runs apps at 80-90% their native speed.

not really the same in that Apple was absolutely required to do this in order for people to transition smoothly and it wasn't competing against another company / platform, it just needed apps from its previous platform to work while people recompile apps for the current one which they will

If you replace CUDA -> x86 and NVIDIA -> Intel, you'll see a familiar story which AMD has already proved it can work through.

These were precisely the arguments for 'x86 will entrench Intel for all time', and we've seen AMD succeed at that game just fine.

And indeed more than succeed, they invented x86_64.

And indeed more than succeed, they invented x86_64.

If AMD invented the analogous to x86_64 for CUDA, this would increase competition and progress in AI by some huge fraction.

Transmetta was Intels boogey-man in the 90s.

These were precisely the arguments for 'x86 will entrench Intel for all time', and we've seen AMD succeed at that game just fine.

... after a couple decades of legal proceedings and a looming FTC monopoly case convinced Intel to throw in the towel, cross-license, and compete more fairly with AMD.

https://jolt.law.harvard.edu/digest/intel-and-amd-settlement

AMD didn't just magically do it on its own.

The latest version of CUDA is 12.3, and version 12.2 came out 6 months prior. How many people are running an older version of CUDA right now on NVIDIA hardware for whatever particular reason?

Even if AMD lagged support on CUDA versioning, I think it would be widely accepted if the performance per dollar at certain price points was better.

Taking the whole market from NVIDIA is not really an option, it's better to attack certain price points and niches and then expand from there. The CUDA ship sailed a long time ago in my view.

I just went through this this weekend - If you're running in Windows and want to use deepspeed, you have to still use Cuda 12.1 because deepspeed 13.1 is the latest that works with 12.1. There's no deepspeed for windows that works with 12.3.

I tried to get it working this weekend but it was a huge PITA so I switched to putting everything into WSL2 then in arch on there pytorch etc in containers so I could flip versions easily now that I know how SPECIFIC the versions are to one another.

I'm still working on that part, halfway into it my WSL2 completely broke and I had to reinstall windows. I'm scared to mount the vhdx right now. I did ALL of my work and ALL of my documentation is inside of the WSL2 archlinux and NOT on my windows machine. I have EVERYTHING I need to quickly put another server up (dotfiles, configs) sitting in a chezmoi git repo ON THE VM. That I only git committed one init like 5 mins into everything. THAT was a learning experience, now I have no idea if I should follow the "best practice" of keeping projects in wsl or having wsl reach out to windows, there's a performance drop. The p9 networking stopped working and no matter what I reinstalled, reset, removed features, reset windows, etc, it wouldn't start. But at least I have that WSL2 .vhdx image that will hopefully mount and start. And probably break WSL2 again. I even SPECIFICALLY took backups of the image as tarballs every hour in case I broke LINUX, not WSL.

If anyone has done sd containers in wsl2 already let me know. I've tried to use WSL for dev work (i use osx) like this 2-3 times in the last 4-5 years and I always run into some catastrophically broken thing that makes my WSL stop working. I hadn't used it in years so hoped it was super reliable by now. This is on 3 different desktops with completely different hardware, etc. I was terrified it would break this weekend and IT DID. At least I can be up in windows in 20 minutes thanks to chocolately and chezmoi.

Sorry I'm venting now this was my entire weekend.

This repo is from a deepspeed contrib (iirc) and lists the reqs for deepspeed + windows that mention the version matches

https://github.com/S95Sedan/Deepspeed-Windows

It may sound weird to do any of this in Windows, or maybe not, but if it does just remember that it's a lot of gamers like me with 4090s who just want to learn ML stuff as a hobby. I have absolutely no idea what I'm doing but thank god I know containers and linux like the back of my hand.

Vent away! Sounds frustrating for sure.

As much as I love Microsoft/Windows for the work they have put into WSL, I ended up just putting Kubuntu on my devices and use QEMU with GPU passthrough whenever I need Windows. Gaming perf is good. You need an iGPU or a cheap second GPU for Linux in order to hand off a 4090 etc. to Windows (unless maybe your motherboard happens to support headless boot but if it's a consumer board it doesn't). Dual boot with Windows always gave me trouble.

I recently gave this a go as I’d not had a windows desktop for a long time, have a beefy Proxmox server and wanted to play some windows only games - works shockingly well with an a4000 and 35m optical hdmi cables! - however I’m getting random audio crackling and popping and I’ve yet to figure out what’s causing it.

First I thought it was hardware related in a Remote Desktop session leading me to think some weird audio driver thing

have you encountered anything like this at all?

Great comment.

I bet there are at least two markets (or niches):

1. People who want the absolute best performance and the latest possible version and are willing to pay the premium for it;

2. People who want to trade performance by cost and accept working with not-the-latest versions.

In fact, I bet the market for (2) is much larger than (1).

The problem with effectively supporting CUDA is that encourages CUDA adoption all the more strongly.

I'm curious about this. Sure some CUDA code has already been written. If something new comes along that provides better performance per dollar spent, why continue writing CUDA for new projects? I don't think the argument that "this is what we know how to write" works in this case. These aren't scripts you want someone to knock out quickly.

If something new comes along that provides better performance per dollar spent

They won’t be able to do that, their hardware isn’t fast enough.

Nvidia is beating them at hardware performance, AND ALSO has an exclusive SDK (CUDA) that is used by almost all deep learning projects. If AMD can get their cards to run CUDA via ROCm, then they can begin to compete with Nvidia on price (though not performance). Then, and only then, if they can start actually producing cards with equivalent performance (also a big stretch) they can try for an Embrace Extend Extinguish play against CUDA.

They won’t be able to do that, their hardware isn’t fast enough.

Well, then I guess CUDA is not really the problem, so being able to run CUDA on AMD hardware wouldn't solve anything.

try for an Embrace Extend Extinguish play against CUDA

They wouldn't need to go that route. They just need a way to run existing CUDA code on AMD hardware. Once that happens, their customers have the option to save money by writing ROCm or whatever AMD is working on at that time.

Intel has the same software issue as AMD but their hardware is genuinely competitive if a generation behind. Cost and power wise, Intel is there; software? No.

If something new comes along that provides better performance per dollar, but you have no confidence that it'll continue to be available in the future, it's far less appealing. There's also little point in being cheaper if it just doesn't have the raw performance to justify the effort in implementing in that language.

CUDA currently has the better raw performance, better availability, and a long record indicating that the platform won't just disappear in a couple of years. You can use it on pretty much any NVIDIA GPU and it's properly supported. The same CUDA code that ran on a GTX680 can run on an RTX4090 with minimal changes if any (maybe even the same binary).

In comparison, AMD has a very spotty record with their compute technologies, stuff gets released and becomes effectively abandonware, or after just a few years support gets dropped regardless of the hardware's popularity. For several generations they basically led people on with promises of full support on consumer hardware that either never arrived or arrived when the next generation of cards were already available, and despite the general popularity of the rx580 and the popularity of the Radeon VII in compute applications, they dropped 'official' support. AMD treats its 'consumer' cards as third class citizens for compute support, but you aren't going to convince people to seriously look into your platform like that. Plus, it's a lot more appealing to have "GPU acceleration will allow us to take advantage of newer supercomputers, while also offering massive benefits to regular users" than just the former.

This was ultimately what removed AMD as a consideration for us when we were deciding on which to focus on for GPU acceleration in our application. Many of us already had access to an NVIDIA GPU of any sort, which would make development easier, while the entire facility had one ROCm capable AMD GPU at the time, specifically so they could occasionally check in on its status.

There are some great replies to my comment - my original comment was too reductive. However, I still think that entrenching CUDA as the de-facto language for heterogeneous computing is a mistake. We need an open ecosystem for AI and HPC, where vendors compete on producing the best hardware.

The problem with open standards is that someone has to write them.

And that someone usually isn't a manufacturer, lest the committee be accused of bias.

Consequently, you get (a) outdated features that SotA has already moved beyond, (b) designed in a way that doesn't correspond to actual practice, and (c) that are overly generalized.

There are some notable exceptions (e.g. IETF), but the general rule has been that open specs please no one, slowly.

IMHO, FRAND and liberal cross-licensing produce better results.

Vulkan already has some standard compute functionality. Not sure if it's low level enough to be able to e.g. recompile and run CUDA kernels, but I think if people were looking for a vendor-neutral standard to build GPGPU compute features on top of, I mean, that seems to be the obvious modern choice.

There is already a work-in-progress implementation of HIP on top of OpenCL https://github.com/CHIP-SPV/chipStar and the Mesa RustiCL folks are quite interested in getting that to run on top of Vulkan.

(To be clear, HIP is about converting CUDA source code not running CUDA-compiled binaries but the ZLUDA project discussed in OP heavily relies on it.)

When the alternative is failure I suppose you choose the least bad option. Nobody is betting the farm on ROCm!

True. This is the big advantage of an open standard instead jumping from one vendors walled garden to another.

That's not guaranteed at all. One could make the same argument about Linux vs Commercial Unix.

If the put their stuff as OpenSource, including firmware, I think they will win out eventually.

And its also not a guarantee that Nvidia will always produce the superior hardware for that code.

They have already lost. The question is do they want to come in second in the game to control the future of the world or not play at all?

Yep. This is very similar to the "catch-22" that IBM wound up in with OS/2 and the Windows API. On the one hand, by supporting Windows software on OS/2, they gave OS/2 customers access to a ready base of available, popular software. But in doing so, they also reduced the incentive for ISV's to produce OS/2 native software that could take advantage of unique features of OS/2.

It's a classic "between a rock and a hard place" scenario. Quite a conundrum.

Why do you think running after nVidia for this submarket is a good idea for them? The AMD GPU team isn't especially big and the development investment is massive. Moreover, they'll have the opportunity cost for projects they're now dominating in (all game consoles for example).

Do you expect them to be able to capitalize on the AI fad so much (and quickly enough!) that it's worth dropping the ball on projects they're now doing well in? Or perhaps continue investing into the part of the market where they're doing much better than nVidia?

If the alternative it to ignore one of the biggest developing markets then yeah, maybe they should start trying to catch up. Unless you think GPU compute is a fad that's going to fizzle out?

One of the most important decisions a company can do, is to decide which markets they'll focus in and which they won't. This is even true for megacorps (see: Google and their parade of messups). There's just not enough time to be in all markets all at once.

So, again, it's not at all clear that AMD being in the compute GPU game is the automatic win for them in the future. There's plenty of companies that killed themselves trying to run after big profitable new fad markets (see: Nokia and Windows Phone, and many other cases).

So let's examine that - does AMD actually have a good shot of taking a significant chunk of market that will offset them not investing in some other market?

AMD is literally the only company on the market poised to exploit the explosion in demand for GPU compute after nVidia (sorry Intel). To not even really try to break in is insanity. nVidia didn't grow their market cap by 5x over the course of a year because people really got into 3D gaming. Even as an also ran on the coat tails of nVidia with a compatibility glue library the market is clearly demanding more product.

Isn't Intel's next gen GPU supposed to be pretty strong on compute?

Read an article about it recently, but when trying to remember the details / find it again just now I'm not seeing it. :(

Their OneAPI is really interesting!

Intel is trying, but all of their efforts thus far have been pretty sad and abortive. I don't think anybody is taking them seriously at this point.

I'm not an expert like you would find here on HN, I am only really a tinkerer and learner, amateur at best, but I think Intel's compute is very promising on Alchemist. The A770 beats out the 4060ti 16gb in video rendering via Davinci Resolve and Adobe; has AV1 support in free Davinci Resolve while Lovelace only has AV1 support in studio. Then for AI, the A770 has had a good showing in stable diffsion against Nvidia's midrange Lovelace since the summer: https://www.tomshardware.com/news/stable-diffusion-for-intel...

The big issue for Intel is pretty similar to that of AMD; everything is made for CUDA, and Intel has to either build their own solutions or convince people to build support for Intel. While I'm working on learning AI and plan to use an Nvidia card, its pretty the progress Intel has made in the last couple of years since introducing their first GPU to market has been pretty wild, and I think it really give AMD pause.

So, again, it's not at all clear that AMD being in the compute GPU game is the automatic win for them in the future. There's

You’re right about that but it seems that it’s pretty clear that not being in the compute GPU game is an automatic loss for them (look at their recent revenue growth in the past quarter and two by in each sector)

Investing in what other market?

Are you seriously telling me they shouldn't invest into one of their core markets? The necessary investments are probably insignificant. Let's say you need a budget of 10 million dollars (50 developers) to assemble a dev team to fix ROCM. How many 7900 XTX to break even on revenue? Roughly 9000. How many did they sell? I'm too lazy to count but Mindfactory a German online shop alone sold around 6k units.

AMD is betting big on GPUs. They recently released the MI300, which has "2x transistors, 2.4x memory and 1.6x memory bandwidth more than the H100, the top-of-the-line artificial-intelligence chip made by Nvidia" (https://www.economist.com/business/2024/01/31/could-amd-brea...).

They very much plan to compete in this space, and hope to ship $3.5B of these chips in the next year. Small compared to Nvidia's revenues of $59B (includes both consumer and data centre), but AMD hopes to match them. It's too big a market to ignore, and they have the hardware chops to match Nvidia. What they lack is software, and it's unclear if they'll ever figure that out.

They are trying to compete in the segment of data center market where the shots are called by bean counters calculating FLOPS per dollar.

That's why I'm going to democratize that business and make it available to anyone who wants access. How does bare metal rentals of MI300x and top end Epyc CPUs sound? We take on the capex/opex/risk and give people what they want, which is access to HPC clusters.

A market where Nvidia chips are all bought out, so what's left?

IIRC (this could be old news) AMD GPUs are preferred in the supercomputer segment because they offer better flops/unit energy. However without a cuda-like you're missing out on the AI part of supercompute, which is increasing proportion.

The margins on supercompute-related sales are very high. Simplifying, but you can basically take a consumer chip, unlock a few things, add more memory capacity, relicense, and your margin goes up by a huge factor.

They are preferred not because of inherent superiority of AMD GPUs. But simply because they have to price lower and have lower margins.

Nvidia could always just half their prices one day, and wipe out every non-state-funded competitor. But Nvidia prefers to collect their extreme margins and funnel it into even more R&D in AI.

It's more that the resource balance in AMD's compute line of GPUs (the CDNA ones) has been more focused on the double precision operations that most supercomputer code makes heavy use of.

Because their current market valuation was massively inflated because of the AI/GPU boom and/or bubble?

In rational world their stock price would collapse if they don’t focus on it and are unable to deliver anything competitive in the upcoming year or two

of the market where they're doing much better than nVidia?

So the market that’s hardly growing, Nvidia is not competing in and Intel still has bigger market share and is catching up performance wise? AMD’s valuation is this highly only because they are seen as the only company that could directly compete with Nvidia in the data center GPU market.

GPU for compute has been a thing since the 00s. Regardless of whether AI is a fad (it isn't, but we can agree to disagree on this one) not investing more in GPU compute is a weird decision.

everyone buying GPUs for AI and scientific workloads wishes AMD was a viable option, and this has been true for almost a decade now.

the hardware is already good enough, people would be happy to use it and accept that's it's not quite as optimized for DL as Nvidia.

people would even accept that the software is not as optimized as CUDA, I think, as long as it is correct and reasonably fast.

the problem is just that every time i've tried it, it's been a pain in the ass to install and there are always weird bugs and crashes. I don't think it's hubris to say that they could fix these sorts of problems if they had the will.

Because the supply for this market is constrained.

It's a pure business decision based on simple math.

If the estimated revenues from selling to the underserved market are higher than the cost of funding the project (they probably are, considering the obscene margins from NVIDIA), then it's a no-brainer.

AMD also has the problem that they make much better margins on their CPUs than on their GPUs and there are only so many TSMC wafers. So in a way making more GPUs is like burning up free money.

It was Microsoft’s strategy for several decades (outsiders called it embrace, extend, extinguish, only partially in jest). It can work for some companies.

According to the article, AMD seems to have pulled the plug on this as they think it will hinder ROCMv6 adoption, which still btw only supports two consumer cards out of their entire line up[1]

1. https://www.phoronix.com/news/AMD-ROCm-6.0-Released

With the most recent card being their one year old flagship ($1k) consumer GPU...

Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released. They'll even go as far as doing things like adding support for new GPUs/compute families to older CUDA versions (see Hopper/Ada and CUDA 11.8).

You can go out and buy any Nvidia GPU the day of release, take it home, plug it in, and everything just works. This is what people expect.

AMD seems to have no clue that this level of usability is what it will take to actually compete with Nvidia and it's a real shame - their hardware is great.

You've got to remember that AMD are behind at all aspects of this, including documenting their work in an easily digestible way.

"Support" means that the card is actively tested and presumably has some sort of SLA-style push to fix bugs for. As their stack matures, a bunch of cards that don't have official support will work well [0]. I have an unsupported card. There are horrible bugs. But the evidence I've seen is that the card will work better with time even though it is never going to be officially supported. I don't think any of my hardware is officially supported by the manufacturer, but the kernel drivers still work fine.

Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released...

A lot of older Nvidia cards don't support CUDA v9 [1]. It isn't like everything supports everything, particularly in the early part of building out capability. The impression I'm getting is that in practice the gap in strategy here is not as large as the current state makes it seem.

[0] If anyone has bought an AMD card for their machine to multiply matrices they've been gambling on whether the capability is there. This comment is reasonable speculation, but I want to caveat the optimism by asserting that I'm not going to put money into AMD compute until there is some some actual evidence on the table that GPU lockups are rare.

[1] https://en.wikipedia.org/wiki/CUDA#GPUs_supported

CUDA dropped Tesla (from 2006!) only as of 7.0, which seems to have released around 2015. Fermi support lasted from 2010 until 2017, giving it a solid 7 years still. Kepler support was dropped around 2020, and the first cards were released in 2012.

As such Fermi seems to be the shortest supported architecture, and it was around for 7 years. GCN4 (Polaris) was introduced in 2016, and seems to have been officially dropped around 2021, just 5 years in. While you could still get it working with various workarounds, I don't see the evidence of Nvidia being even remotely as hasty as AMD with removing support, even for early architectures like Tesla and Fermi.

On top of this some Kepler support (for K80s etc) is still maintained in CUDA 11 which was last updated late 2022, and libraries like PyTorch and TensorFlow still support CUDA 11.8 out of the box.

To be fair, if anything, that table still shows you'll have compatibility with at least 3 major releases. Either way, I agree their strategy is getting results, it just takes time. I do prefer their open source commitment, I just hope they continue.

All versions of CUDA support PTX, which is an intermediate bytecode/compiler representation that can be finally-compiled by even CUDA 1.0.

So the contract is: as long as your future program does not touch any intrinsics etc that do not exist in CUDA 1.0, you can export the new program from CUDA 27.0 as PTX, and the GTX 6800 driver will read the PTX and let your gpu run it as CUDA 1.0 code… so it is quite literally just as they describe, unlimited forward and backward capability/support as long as you go through PTX in the middle.

https://docs.nvidia.com/cuda/archive/10.1/parallel-thread-ex...

https://en.wikipedia.org/wiki/Parallel_Thread_Execution

AMD thinks the reason Nvidia is ahead of them is bad marketing on their part, and good marketing (All is AI) by Nvidia. They don't see the difference in software stacks.

For years I want to get off the Nvidia train for AI, but I'm forced to buy another Nvidia card b/c AMD stuff just doesn't work, and all examples work with Nvidia cards as they should.

At the risk of sounding like Jeff Ballmer, the reason I only use NVIDIA for GPGPU work (our company does a lot of it!) is the developer support. They have compilers, tools, documentation, and tech support for developers who want to do any type of GPGPU computing on their hardware that just isn't matched on any other platform.

The most recent "card" is their MI300 line.

It's annoying as hell to you and me that they are not catering to the market of people who want to run stuff on their gaming cards.

But it's not clear it's bad strategy to focus on executing in the high-end first. They have been very successful landing MI300s in the HPC space...

Edit: I just looked it up: 25% of the GPU Compute in the current Top500 Supercomputers is AMD

https://www.top500.org/statistics/list/

Even though the list has plenty of V100 and A100s which came out (much) earlier. Don't have the data at hand, but I wouldn't be surprised if AMD got more of the Top500 new installations than nVidia in the last two years.

I'm building a bare metal business around MI300x and top end Epyc CPUs. We will have them for rental soon. The goal is to build a public super computer that isn't just available to researchers in HPC.

In the embedded space, Nvidia regularly drops support for older hardware. The last supported kernel for their Jetson TX2 was 4.9. Their newer Jetson Xavier line is stuck on 5.10.

The hardware may be great, but their software ecosystem is utter crap. As long as they stay the unchallenged leader in hardware, I expect Nvidia will continue to produce crap software.

I would push to switch our products in a heartbeat, if AMD actually gets their act together. If this alternative offers a path to evaluate our current application software stack on an AMD devkit, I would buy one tomorrow.

I have been using rocm on my 7800xt, it seems to be supported just fine.

AMD should have the funds to push both of these initiatives at once. If the ROCM team has political reasons to kill the competition, it is because they are scared it will succeed. I've seen this happen in big companies.

But management at AMD should be above petty team politics and fund both because at the company level they do not care which solution wins in the end.

If your AMD you don't want to be compatible till you have a compelling feature of your own.

Good enough CUDA + New feature x gives them leverage in the inevitable court battle(S) and patten sharing agreement that everyone wants to see.

AMD' already stuck its toe in the water: new CPU's with their AI cores built in. If you can get a AM5 socket to run with 196 gigs, that's a large (all be it slow) model you can run.

Why would they be worried about people using their product? Some CUDA wrapper on top of ROCM isn't going to get them fired. It doesn't get rid of ROCM's function as a GPGPU driver.

That is really out of touch. ROCm is garbage as far as I am concerned. A drop in replacement, especially one that seems to perform quite well, is really interesting however.

AMD truly deserves its misfortune in the GPU market.

Someone built the same a while ago for Intel gpus, I think even still the old pre-Xe ones. With arc/xe on the horizon, people had the same question: why isn't Intel sponsoring this or even building their own. It was speculated that this might get them into legal hot water with Nvidia, Google VS. Oracle was brought up, etc...

They financed the prior iteration of Zluda: https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq

but then stopped

[2021] After some deliberation, Intel decided that there is no business case for running CUDA applications on Intel GPUs.

oof

That's an oof indeed. Are AMD and Intel really that delusional, ie "once we get our own version of Cuda right everybody will just rewrite all their software to make use of it", or do they know something we mere mortals don't?

Maybe their lawyers are afraid of another round of "are APIs copyrightable"?

Are you freaking kidding me!?!? Fire those MBAs immediately.

After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs.

Oof x2

I've been critical of AMD's failure to compete in AI for over a decade now, but I can see why AMD wouldn't want to go the route of cloning CUDA and I'm surprised they even tried. They would be on a never ending treadmill of feature catchup and bug-for-bug compatibility, and wouldn't have the freedom to change the API to suit their hardware.

The right path for AMD has always been to make their own API that runs on all of their own hardware, just as CUDA does for Nvidia, and push support for that API into all the open source ML projects (but mostly PyTorch), while attacking Nvidia's price discrimination by providing features they use to segment the market (e.g. virtualization, high VRAM) at lower price points.

Perhaps one day AMD will realize this. It seems like they're slowly moving in the right direction now, and all it took for them to wake up was Nvidia's market cap skyrocketing to 4th in the world on the back of their AI efforts...

But AMD was formed to shadow Intel's x86?

ISAs are smaller and less stateful and better documented and less buggy and most importantly they evolve much more slowly than software APIs. Much more feasible to clone. Especially back when AMD started.

PTX is just an ISA too. Programming languages annd ISA representations are effectively fungible, that’s the lesson of Microsoft CLR/Intermediate Language and Java too. A “machine” is a hardware and a language.

PTX is not a hardware ISA though, it's still software and can change more rapidly.

Not without breaking the support contract? If you change PTX format then CUDA 1.0 machines can no longer it and it's no longer PTX.

Again, you are missing the point. Java is both a language (java source) and a machine (the JVM). The latter is a hardware ISA - there are processors that implement Java bytecode as their ISA format. Yet most people who are running Java are not doing so on java-machine hardware, yet they are using the java ISA in the process.

https://en.wikipedia.org/wiki/Java_processor

https://en.wikipedia.org/wiki/Bytecode#Execution

any bytecode is an ISA, the bytecode spec defines the machine and you can physically build such a machine that executes bytecode directly. Or you can translate via an intermediate layer, like how Transmeta Crusoe processors executed x86 as bytecode on a VLIW processor (and how most modern x86 processors actually use RISC micro-ops inside).

these are completely fungible concepts. They are not quite the same thing but bytecode is clearly an ISA in itself. Any given processor can choose to use a particular bytecode as either an ISA or translate it to its native representation, and this includes both PTX, Java, and x86 (among all other bytecodes). And you can do the same for any other ISA (x86 as bytecode representation, etc).

furthermore, what most people think of as "ISAs" aren't necessarily so. For example RDNA2 is an ISA family - different processors have different capabilities (for example 5500XT has mesh shader support while 5700XT does not) and the APUs use a still different ISA internally etc. GFX1101 is not the same ISA as GFX1103 and so on. These are properly implementations not ISAs, or if you consider it to be an ISA then there is also a meta-ISA encompassing larger groups (which also applies to x86's numerous variations). But people casually throw it all into the "ISA" bucket and it leads to this imprecision.

like many things in computing, it's all a matter of perspective/position. where is the boundary between "CMT core within a 2-thread module that shares a front-end" and "SMT thread within a core with an ALU pinned to one particular thread"? It's a matter of perspective. Where is the boundary of "software" vs "hardware" when virtually every "software" implementation uses fixed-function accelerator units and every fixed-function accelerator unit is running a control program that defines a flow of execution and has schedulers/scoreboards multiplexing the execution unit across arbitrary data flows? It's a matter of perspective.

AMD's management seems to be only vaguely aware that GPU compute is a thing. All of their efforts in the field feel like afterthoughts. Or maybe they are all just hardware guys who think of software as just a cost center.

Maybe they just can't lure in good software developers with the right skill set, either due to not paying them enough or not having a good work environment in comparison to the other places that could hire them.

I did a cursory glance at Nvidia's and AMD's respective careers pages for software developers at one point - what struck me was they both have similarly high requirements for engineers in fields like GPU compute and AI but Nvidia hires much more widely, geographically speaking, than AMD.

As a total outsider it seems to me that maybe one of AMD's big problems is they just aren't set up to take advantage of the global talent pool in the same way Nvidia is.

That doesn't explain CDNA. They focused on high-throughput FP64 which is not where the market went.

They are aware, but it wasn’t until recently that they had the resources to invest in the space. They had to build Zen and start making buckets of money first

It feels like "Make the AI software work on our GPUs," is on some VP's OKRs, but isn't really being checked on for progress or quality.

DirectX vs OpenGL.

This brings back memories of late 90s / early 00s of Microsoft pushing hard their proprietary graphic libraries (DirectX) vs open standards (OpenGL).

Fast forward 25-years and even today, Microsoft still dominates in PC gaming as a result.

There's a bad track record of open standard for GPUs.

Even Apple themselves gave up on OpenGL and has their own proprietary offering (Metal).

To add to that, Linux gaming today is dominated by a wrapper implementing DirectX.

Vulkan running an emulation of DirectX and being faster

Let's not forget the Fahrenheit maneuver by Microsoft that left SGI stranding and not forward OpenGL.

Because the two CEOs are family? Like literally.

That didn't stop World War I...

It certainly seems ironic that the company that beat Intel at its own compatibility game with x86-64 would abandon compatibility with today's market leader.

The situation is a bit different: AMD got its foot in the door with the x86 market because IBM back in the early 1980s forced Intel to license the technology so AMD could act as a second source of CPUs. In the GPU market, ATI (later bought by AMD) and nVidia emerged as the market leaders after the other 3D graphics pioneers (3Dfx) gave up - but their GPUs were never compatible in the first place, and if AMD tried to make them compatible, nVidia could sue the hell out of them...

Why would this not be AMD’s top priority among priorities?

Same reason it wasn't when it was obvious Nvidia was taking over this space maybe 8 years ago now when they let OpenCL die then proceeded to do nothing till it's too late.

Speaking to anyone working in general purpose GPU coding back then they all just said the same thing, OpenCL was a nightmare to work with and CUDA was easy and mature compared to it. Writing was on the wall where things were heading the second you saw a photon based renderer running on GPU vs CPU all the way back then, AMD has only themselves to blame because Nvidia basically showed them the potential with CUDA.

Code portability isn't performance portability, a fact that was driven home back in the bad old OpenCL era. Code is going to have to rewritten to be efficient on AMD architectures.

At which point why tie yourself to the competitor's language. Probably much more effective to just write a well optimized library that serves the MLIR/whatever is popular API in order to run big ML jobs.

This feels like a massive punch in the gut. An opensource project, not ruined by AMD's internal mismanaged gets shit done within two years and AMD goes "meh"?!? There are billions of dollars on the line! It's like AMD actively hates it's customers.

Now the only thing they need to do is make sure ROCm itself is stable.

Well simplest reason would be money. There are few companies rolling in kind of money like Nvidia and AMD is not one of them. Cloud vendors would care a bit for them it is just business if Nvidia cost a lot more they in turn charge their customers a lot more while keeping their margins. I know some people still harbors notion that competition will lower the price, and it may, just not in sense customers imagine.

To be fair to AMD, they've been trying to solve ML workload portability at more fundamental levels with the acquisition of Nod.ai and de-facto incorporation of Google's IREE compiler project + MLIR.

ROCm is not spelled out anywhere in their documentation and the best answers in search come from Github and not AMD official documents

"Radeon Open Compute Platform"

https://github.com/ROCm/ROCm/issues/1628

And they wonder why they are losing. Branding absolutely matters.

I have no idea what CUDA stands for, and I live just fine without knowing it.

Countless Updates Developer Agony

This is the right definition.

Compute Unified Device Architecture [1]

[1] https://en.wikipedia.org/wiki/CUDA

Cleverly Undermining Disorganized AMD

Funnily enough it doesn't work on their RDNA ("Radeon DNA") hardware (with some exceptions I think), but it's aimed at their CDNA (Compute DNA). If they would come up with a new name today it probably wouldn't include Radeon.

AMD seems to be a firm believer in separating the consumer chips for gaming and the compute chips for everything else. This probably makes a lot of sense from a chip design and current business perspective, but I think it's shortsighted and a bad idea. GPUs are very competent compute devices, and basically wasting all that performance for "only" gaming is strange to me. AI and other compute is getting more and more important for things like image and video processing, language models, etc. Not only for regular consumers, but for enthusiasts and developers it makes a lot of sense to be able to use your 10 TFLOPS chip even when you're not gaming.

While reading through the AMD CDNA whitepaper I saw this and got a good chuckle. "culmination of years of effort by AMD" indeed.

The computational resources offered by the AMD CDNA family are nothing short of astounding. However, the key to heterogeneous computing is a software stack and ecosystem that easily puts these abilities into the hands of software developers and customers. The AMD ROCm 4.0 software stack is the culmination of years of effort by AMD to provide an open, standards-based, low-friction ecosystem that enables productivity creating portable and efficient high-performance applications for both first- and third-party developers.

https://www.amd.com/content/dam/amd/en/documents/instinct-bu...

ROCm works fine on the RDNA cards. On Ubuntu 23.10 and Debian Sid, the system packages for the ROCm math libraries have been built to run on every discrete Vega, RDNA 1, RDNA 2, CDNA 1, and CDNA 2 GPU. I've manually tested dozens of cards and every single one worked. There were just a handful of bugs in a couple of the libraries that could easily be fixed by a motivated individual. https://slerp.xyz/rocm/logs/full/

The system package for HIP on Debian has been stuck on ROCm 5.2 / clang-15 for a while, but once I get it updated to ROCm 5.7 / clang-17, I expect that all discrete RDNA 3 GPUs will work.

That is intentional. We had to change the name. ROCm is no longer an acronym.

I assume you’re on the team if you’re saying “we”

Can you say why you had to change the name?

Later in the same thread:

ROCm is a brand name for ROCm™ open software platform (for software) or the ROCm™ open platform ecosystem (includes hardware like FPGAs or other CPU architectures).

Note, ROCm no longer functions as an acronym.

> Note, ROCm no longer functions as an acronym.

That is really dumb. Like LLVM.

That, and it only runs on a handful of their GPUs.

If you are talking about the "supported" list of GPUs, those listed are only the ones they fully validate and QA test, other of same gen are likely to work, but most likely with some bumps along the way. In one of the a bit older phoronix posts about ROCm one of their engeneers did say they are trying to expand the list of validated & QA'd cards, as well as destinguishing between "validated", "supported" and "non-functional"

My understanding is that there was some trademark silliness around "open compute", and AMD decided that instead of doing a full rebrand, they would stick to ROCm but pretend that it wasn't ever an acronym.

Yeah it was due to the Open Compute Project AFAIK... Though for a little while AMD was telling me they really meant to call it "Radeon Open eCosystem" before then dropping that too with many still using the original name.

I mean, I also had to look up what CUDA stands for.

Compute unified device architecture ?

Cannot understand why AMD would stop funding this. It seems like this should have a whole team allocated to it.

They would always be at the mercy of NVIDIA's API. Without knowing the inner workings, perhaps a major concern with this approach is the need to implement on NVIDIA's schedule instead of AMD's which is a very reactive stance.

This approach actually would make sense if AMD felt, like most of us perhaps, that the NVIDIA ecosystem is too entrenched, but perhaps they made the decision recently to discontinue funding because they (now?) feel otherwise.

They've been at mercy of Intel x86 APIs for a long time. Didn't kill them.

What happens here is that the original vendor loses control of the API once there are multiple implementations. That's the best possible outcome for AMD.

In either case, they have a limited window to be adopted, and that's more important. The abstraction layer here helps too. AMD code is !@#$%. If this were adopted, it makes it easier to fix things underneath. All that is a lot more important than a dream of disrupting CUDA.

x86 is not the same, the courts forced the release of x86 architecture to AMD during an antitrust lawsuit

You don't think the courts would force the opening of CUDA? Didn't a court already rule that API cannot be patented. I believe it was a Google case. As long as no implementation was stolen, the API itself is not able to be copyrighted.

Here it is: https://arstechnica.com/tech-policy/2021/04/how-the-supreme-...

Regardless of the legal status of APIs, this Phoronix article is about AMD providing a replacement ABI and I wouldn't assume the legal issues are necessarily the same. But because this is a case where AMD is following a software target there's the possibility, if AMD starts to succeed, that NVidia might change their ABI in ways that deliberatly hurt AMD's compatibility efforts in ways that would be much more difficult for APIs or hardware. That's, presumably, why AMD is going forward with their API emulation effort instead.

If you read the article, it's about Google's re-implementation of the Java API and runtime. Thus, yes, Google was providing both API and ABI compatibility.

I read the article when it came out and re-scimmed it just now. My understanding at the time and still was that the legal case revolved around the API and the exhibits entered into evidence I saw were all Java function names with their arguments and things of that sort. And I'm given to understand that the Dalvik Java implementation Google was using with Android was register based rather than than the stack based standard Java, which sounds to me like it would make actual binary compatibility impossible.

Didn't a court already rule that API cannot be patented. I believe it was a Google case. As long as no implementation was stolen, the API itself is not able to be copyrighted.

That is... not accurate in the slightest.

Oracle v Google was not about patentability. Software patentability is its own separate minefield, since anyone who looks at the general tenor of SCOTUS cases on the issue should be able to figure out that SCOTUS is at best highly skeptical of software patents, even if it hasn't made any direct ruling on the topic. (Mostly this is a matter of them being able to tell what they don't like but not what they do like, I think). But I've had a patent attorney straight-out tell me that in absence of better guidance, they're just pretending the most recent relevant ruling (which held that X-on-a-computer isn't patentable) doesn't exist. In any case, a patent on software APIs (as opposed to software as a whole) would very clearly fall under the "what are you on, this isn't patentable" category of patentability.

The case was about the copyrightability of software APIs. Except if you read the decision itself, SCOTUS doesn't actually answer the question [1]. Instead, it focuses on whether or not Google's use of the Java APIs were fair use. Fair use is a dangerous thing to rely on for legal precedent, since there's no "automatic" fair use category, but instead a balancing test ostensibly of four factors but practically of one factor: does it hurt the original copyright owner's profits [2].

There's an older decision which held that the "structure, sequence, and organization" of code is copyrightable independent of the larger work of software, which is generally interpreted as saying that software APIs are copyrightable. At the same time, however, it's widespread practice in the industry to assume that "clean room" development of an API doesn't violate any copyright. The SCOTUS decision in Google v Oracle was widely interpreted as endorsing this interpretation of the law.

[1] There's a sentence or two that suggests to me there was previously a section on copyrightability that had been ripped out of the opinion.

[2] See also the more recent Andy Warhol SCOTUS decision which, I kid you not, says that you have to analyze this to figure out whether or not a use is "transformative". Which kind of implicitly overturns Google v Oracle if you think about it, but is unlikely to in practice.

To be fair, there were patent claims in Oracle vs. Google too. That's why the appeals went through the CAFC rather than the 9th circuit. Those claims were simply thrown out pretty early. Whether that says something about more generally or was simply a set of weak claims intended for venue shopping is a legitimate discussion to be had though.

You think x86 would be changed in such a way that it'd break and?

Because what else?

If so, then i think that this is crazy because software is harder to change than hardware

My understanding is that with AMD64 there's a circular dependency where AMD need Intel for x86 and Intel need AMD for x86_64?

That's true now, but AMD has been making x86 compatible CPUs since the original 8086.

More than that, a second implementation of CUDA acts as a disincentive for NVIDIA to make breaking changes to it, since it would reduce any incentive for software developers to follow those changes, as it reduces the value of their software by eliminating hardware choice for end-users (which in some case like large companies are also the developers themselves).

At the same time, open source projects can be pretty nimble in chasing things like changing APIs, potentially frustrating the effectiveness of API pivoting by NVIDIA in a second way.

They would always be at the mercy of NVIDIA's API.

They only need to support PyTorch. Not CUDA

Aside from the latest commit, there has been no activity for almost 3 years (latest code change on Feb 22, 2021).

People are criticizing AMD for dropping this, but it makes sense to stop paying for development when the dev has stopped doing the work, no?

And if he means that AMD stopped paying 3 years ago - well, that was before dinosaurs and ChatGPT, and alot has changed since then.

https://github.com/vosen/ZLUDA/commits/v3

Pretty sure this was developed in private, but because AMD cancelled the contract he has been allowed to open source the code, and this is the "throw it over the fence" code dump.

This.

    762 changed files with 252,017 additions and 39,027 deletions.

https://github.com/vosen/ZLUDA/commit/1b9ba2b2333746c5e2b05a...

Have a look at the latest commit and the level of change.

Effectively the internal commits while he was working for AMD aren't in the repo, but the squashed commit contains all of the changes.

As I wrote in the article, it was privately developed the past 2+ years while being contracted by AMD during that time... In a private GitHub repo. Now that he's able to make it public / open-source, he squashed all the changes into a clean new commit to make it public. The ZLUDA code from 3+ years ago was when he was experimenting with CUDA on Intel GPUs.

If only this exact concern was addressed explicitly in the first FAQ at the bottom of the README...

https://github.com/vosen/ZLUDA/tree/v3?tab=readme-ov-file#fa...

The code prior to this was all for the intel gpu zluda, and then the latest commit is all the amd zluda code, hence why the commit talks about the red team

My thinking is that the dev _did_ work on it for X amount of time, but as part of their contract is not allowed to share the _actual_ history of the repo, thus the massive code dumped in their "Nobody expects the Red Team" commit?

I'm really rooting for AMD to break the CUDA monopoly. To this end, I genuinely don't know whether a translation layer is a good thing or not. On the upside it makes the hardware much more viable instantly and will boost adoption, on the downside you run the risk that devs will never support ROCm, because you can just use the translation layer.

I think this is essentially the same situation as Proton+DXVK for Linux gaming. I think that that is a net positive for Linux, but I'm less sure about this. Getting good performance out of GPU compute requires much more tuning to the concrete architecture, which I'm afraid devs just won't do for AMD GPUs through this layer, always leaving them behind their Nvidia counterparts.

However, AMD desperately needs to do something. Story time:

On the weekend I wanted to play around with Stable Diffusion. Why pay for cloud compute, when I have a powerful GPU at home, I thought. Said GPU is a 7900 XTX, i.e. the most powerful consumer card from AMD at this time. Only very few AMD GPUs are supported by ROCm at this time, but mine is, thankfully.

So, how hard could it possibly to get Stable Diffusion running on my GPU? Hard. I don't think my problems were actually caused by AMD: I had ROCm installed and my card recognized by rocminfo in a matter of minutes. But the whole ML world is so focused on Nvidia that it took me ages to get a working installation of pytorch and friends. The InvokeAI installer, for example, asks if you want to use CUDA or ROCm, but then always installs the CUDA variant whatever you answer. Ultimately, I did get a model to load, but the software crashed my graphical session before generating a single image.

The whole experience left me frustrated and wanting to buy an Nvidia GPU again...

They are focusing on HPC first. Which seems reasonable if your software stack is lacking. Look for sophisticated customers that can help build an ecosystem.

As I mentioned elsewhere, 25% of GPU compute on the Top 500 Supercomputer list is AMD. This all on the back of a card that came out only three years ago. We are very rapidly moving towards a situation where there are many, many high-performance developers that will target ROCm.

Is a top 500 super computer list a good way of measuring relevancy in the future?

No, it isn't. What is a better measure is to look at businesses like what I'm building (and others), where we take on the capex/opex risk around top end AMD products and bring them to the masses through bare metal rentals. Previously, these sorts of cards were only available to the Top 500.

I'm really rooting for AMD to break the CUDA monopoly

Personally I want Nvidia to break the x86-64 monopoly, with how amazing properly spec'd Nvidia cards are to work with I can only dream of a world where Nvidia is my CPU too.

apt username

I would love to be able to have a native stable diffusion experience, my rx 580 takes 30s to generate a single image. But it does work after following https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...

I got this up and running on my windows machine in short order and I don't even know what stable diffusion is.

But again, it would be nice to have first class support to locally participate in the fun.

It seems to me that AMD are crazy to stop funding this. CUDA-on-ROCm breaks NVIDIA's moat, and would also act as a disincentive for NVIDIA to make breaking changes to CUDA; what more could AMD want?

When you're #1, you can go all-in on your own proprietary stack, knowing that network effects will drive your market share higher and higher for you for free.

When you're #2, you need to follow de-facto standards and work on creating and following truly open ones, and try to compete on actual value, rather than rent-seeking. AMD of all companies should know this.

and would also act as a disincentive for NVIDIA to make breaking changes to CUDA

I don't know about that. You could kinda argue the opposite. "We improved CUDA. Oh it stopped working for you on AMD hardware? Too bad. Buy Nvidia next time"

NVIDIA is about ecosystem plays, they have no interest in sabotage or anti competition plays. Leave to to apple and google and their dumb app stores and mobile OSs.

Also known as OS/2: Redux strategy.

Most CUDA applications do not target the newest CUDA version! Despite 12.1 being out, lots of code still targets 7 or 8 to support old NVIDIA cards. Similar support for AMD isn’t unthinkable (but a rewrite to rocm would be).

Yep, I develop several applications that use CUDA. I see AMD/Radeon powered computers for sale and want to buy one, but I am not going to risk not being able to run those applications or having to rewrite them.

If they want me as a customer, and they have not created a viable alternative to CUDA, they need to pursue this.

I don't really follow this, but isn't it a bad sign for ROCm that, for example, ZLUDA + Blender 4's CUDA back-end delivers better performance than the native Radeon HIP back-end?

Could be that the CUDA backend has seen far more specialization optimizations whereas the seeingly fairly fresh HIP backend hasn't had as many developers looking at it, in the end a few more control instructions on the CPU side to go through the ZLUDA wrapper will be insignificant compared to all the time spent inside better optimized GPU kernels.

I'd say it's even worse, since for rendering Optix is like 30% faster than CUDA. But that requires the tensor cores. At this point AMD is waaay behind hardware wise.

It really shows how neglected their software stack is, or at least how neglected this implementation is.

Surely this can be attributed to Blender's HIP code just being suboptimal because nobody really cares about it. By extension nobody cares about it because performance is suboptimal.

It's AMDs job to break that circle.

The interest in this thread tells me there are a lot of people who are not cool with the CUDA monopoly.

Those people should have spoken up when their hardware manufacturers abandoned OpenCL. The industry set itself 5-10 years behind by ignoring open GPGPU compute drivers while Nvidia slowly built their empire. Just look at how long it's taken to re-impliment a fraction of the CUDA featureset on a small handful of hardware.

CUDA shouldn't exist. We should have hardware manufacturers working together, using common APIs and standardizing instead of going for the throat. The further platforms drift apart, the more valuable Nvidia's vertical integration becomes.

Common API means being replaceable, fungible. There are no margins in that.

Correct. It's why the concept of 'proprietary UNIX' didn't survive long once program portability became an incentive.

Is my impression wrong, that people understood the need for OCL only after CUDA had already cornered and strangled the market?

Why is CUDA so prevalent oppose to its alternatives?

At first, it was because Nvidia had a wide variety of highly used cards that almost all support some form of CUDA. By-and-large, your gaming GPU could debug and run the same code that you'd scale up to a datacenter, which was a huge boon for researchers and niche industry applications.

With that momentum, CUDA got incorporated into a lot of high-performance computing applications. Few alternatives show up because there aren't many acceleration frameworks that are as large or complete as CUDA. Nvidia pushed forward by scaling down to robotics and edge-compute scale hardware, and now are scaling up with their DGX/Grace platforms.

Today, Nvidia is prevalent because all attempts to subvert them have failed. Khronos Group tried to get the industry to rally around OpenCL as a widely-supported alternative, but too many stakeholders abandoned it before the initial crypto/AI booms kicked off the demand for GPGPU compute.

Opencl was the alternative, came along later, couldn't write a lot of programs that cuda can. Cuda is legitimately better than opencl.

https://github.com/vosen/ZLUDA - source

https://github.com/vosen/ZLUDA/tree/v3

Latest commit message: "Nobody expects the Red Team"

after the CUDA back-end was around for years and after dropping OpenCL, Blender did add a Radeon HIP back-end... But the real kicker here is that using ZLUDA + CUDA back-end was slightly faster than the native Radeon HIP backend.

This is absolutely crazy.

Is AMD just a puppet org to placate antitrust fears? Why are they like this?

Is this really a theory? If so my $8 AMD stock from, 2015? is currently worth $176 so they should make more shell companies they're doing great.

I guess that might answer my "Why would AMD find that having a CUDA competitor isn't a business case unless they couldn't do it or the cards underperformed significantly."

AMD fail to realize software toolchain is what makes nvidia great. AMD thinks the hardware is all that’s needed

Nvidia's toolchain is really not great. Applications are just written to step around the bugs.

ROCm has different bugs, which the application workarounds tend to miss.

From the ARCHITECTURE.md:

Those pointers point to undocumented functions forming CUDA Dark API. It's impossible to tell how many of them exist, but debugging experience suggests there are tens of function pointers across tens of tables. A typical application will use one or two most common. Due to they undocumented nature they are exclusively used by Runtime API and NVIDIA libraries (and in by CUDA applications in turn). We don't have names of those functions nor names or types of the arguments. This makes implementing them time-consuming. Dark API functions are are reverse-engineered and implemented by ZLUDA on case-by-case basis once we observe an application making use of it.

fertile soil for Alyssa and Asahi Lina :)

https://rosenzweig.io/

https://vt.social/@lina

Question: Why aren't we using LLMs to translate programs to use ROCm?

Isn't translation one of the strengths of LLMs?

You can translate cuda to hip using a regex. LLM is rather overkill.

"For reasons unknown to me, AMD decided this year to discontinue funding the effort and not release it as any software product."

Managers at AMD never heard of AI?

Sam could get more chips for way less than $7 trillion if he helps fund and mature this

If anyone wants to work in this area, AMD currently has a lot of related job posts open.

I may have missed it in the article, but this post would mean absolutely nothing to me except for the fact that last week I got into stable diffusion so I'm crushing my 4090 with pytorch and deepspeed, etc and dealing with a lot of nvidia ctk/sdk stuff. Well, I'm actually trying to do this in windows w/ wsl2 and deepmind/torch/etc in containers and it's completely broken so not crushing currently.

I guess awhile ago it was found that Nvidia was bypassing the kernels GPL license driver check and I read that kernel 6.6 was going to lock that driver out if they didn't fix it, and from what I've read there was no reply or anything done by nvidia yet. Which I think I probably just can't find.

Am I wrong about that part?

We're on kernel 6.7.4 now and I'm still using the same drivers. Did it get pushed back, did nvidia fix it?

Also, while trying to find answers myself I came across this 21 year old post which is pretty funny and very apt for the topic https://linux-kernel.vger.kernel.narkive.com/eVHsVP1e/why-is...

I'm seeing conflicting info all over the place so I'm not really sure what the status of this GPL nvidia driver block thing is.

Anything that breaks CUDA lock-in is great! This reminds how DX/D3D lock-in was broken by dxvk and vkd3d-proton.

> It apparently came down to an AMD business decision to discontinue the effort

Bad decision if that's the case. May be someone can pick it up, since it's open now.

So polyglot programing workflows via PTX targeting are equally supported?

The other big need is for a straightforward library for dynamic allocation/sharing of GPUs. Bitfusion was a huge pain in the ass, but at least it was something. Now it’s been discontinued, the last version doesn’t support any recent versions of PyTorch, and there’s only two(?) possible replacements in varying levels of readiness (Juice and RunAI). We’re experimenting now with replacing our Bitfusion installs with a combination of Jupyter Enterprise Gateway and either MIGed GPUs or finding a way to get JEG to talk to a RunAI installation to allow quick allocation and deallocation of portions of GPUs for our researchers.

Wow, this is great news. I really hope that the community will find ways to sustainable fund this project, being suddenly run a lot of innovative CUDA based projects on AMD GPUs is a big game-changer, especially because you don't have to deal with the poor state of nvidia on linux support.

Złuda roughly means "delusion" / "mirage" / "illusion" in Polish, given the author is called Andrzej Janik this may be a pun :)

Fun fact: ZLUDA means something like illusion/delusion/figment. Well played! (I see the main dev is from Poland.)

Keeping my hopes curtailed until I see proper benchmarks…

I feel like AMD's senior executives all own a lot of nVIDIA stock.

This event of release is however a result of AMD stopped funding it per "After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs. One of the terms of my contract with AMD was that if AMD did not find it fit for further development, I could release it. Which brings us to today." from https://github.com/vosen/ZLUDA?tab=readme-ov-file#faq

so, same mistake intel made before.

One thing I didn't see mentioned anywhere apart from the repos readme:

PyTorch received very little testing. ZLUDA's coverage of cuDNN APIs is very minimal (just enough to run ResNet-50) and realistically you won't get much running.

Hope this can benefit from the seemingly infinite enthusiasm from rust programmers