A lot of people think AMD should support these translation layers but I think it's a bad idea. CUDA is not designed to be vendor agnostic and Nvidia can make things arbitrarily difficult both technically and legally. For example I think it would be against the license agreement of cuDNN or cuBLAS to run them on this. So those and other Nvidia libraries would become part of the API boundary that AMD would need to reimplement and support.
Chasing bug-for-bug compatibility is a fool's errand. The important users of CUDA are open source. AMD can implement support directly in the upstream projects like pytorch or llama.cpp. And once support is there it can be maintained by the community.
Are you aware of HIP? It's officially supported and, for code that avoids obscure features of CUDA like inline PTX, it's pretty much a find-and-replace to get a working build:
https://github.com/ROCm/HIP
Don't believe me? Include this at the top of your CUDA code, build with hipcc, and see what happens:
https://gitlab.com/StanfordLegion/legion/-/blob/master/runti...
It's incomplete because I'm lazy but you can see most things are just a single #ifdef away in the implementation.
if you're talking about building anything, that is already too hard for ML researchers.
you have to be able to pip install something and just have it work, reasonably fast, without crashing, and also it has to not interfere with 100 other weird poorly maintained ML library dependencies.
Don’t most orgs that are deep enough to run custom cuda kernels have dedicated engineers for this stuff. I can’t imagine a person who can write raw cuda not being able to handle things more difficult than pip install.
Engineers who are really, really good at CUDA are worth their weight in gold, so there's more projects for them than they have time. Worth their weight in gold isn't figurative here – the one I know has a ski house more expensive than 180 lbs of gold (~$5,320,814).
Would you (or your friend) be able to drop any good CUDA learning resources? I'd like to be worth my weight in gold...
A working knowledge of C++, plus a bit of online reading about CUDA and the NVidia GPU architecture, plus studying the LCZero chess engine source code (the CUDA neural net part, I mean) seems like enough to get started. I did that and felt like I could contribute to that code, at least at a newbie level, given the hardware and build tools. At least in the pre-NNUE era, the code was pretty readable. I didn't pursue it though.
Of course becoming "really good" is a lot different and like anything else, it presumably takes a lot of callused fingertips (from typing) to get there.
The real challenge is probably getting your hands on a 4090 for a price you can pay before you are worth your weight in gold. Because an arm and a limb in gold is quite a lot.
I was looking into this recently and it seems like the cheapest AWS instance with a CUDA GPU is something on the order of $1/hr. It looks like an H100 instance might be $15/hr (although I’m not sure if I’m looking at a monthly price).
So yeah it’s not ideal if you’re on a budget, but it seems like there are some solutions that don’t involve massive capex.
Look on vast.ai instead of AWS, you can rent machines with older GPU's dirt cheap. I don't see how they even cover the electricity bills. A 4090 machine starts at about $.25/hour though I didn't examine the configuration.
A new 4090 costs around $1800 (https://www.centralcomputer.com/asus-tuf-rtx4090-o24g-gaming...) and that's probably affordable to AWS users. I see a 2080Ti on Craigslist for $300 (https://sfbay.craigslist.org/scz/sop/d/aptos-nvidia-geforce-...) though used GPU's are possibly thrashed by bitcoin mining. I don't have a suitable host machine, unfortunately.
Thrashed? What type of damage could a mostly-solid state device suffer? Fan problems? Worn PCi connectors? Deteriorating Arctic Ice from repeated heat cycling?
replying to sibling @dotancohen, they melt, and they suffer from thermal expansion and compression
You don't really need a 4090. An older board is plenty. The software is basically the same. I fooled around with what I think was a 1080 on Paperspace for something like 50 cents an hour, but it was mostly with some Pytorch models rather than CUDA directly.
Does this pay more than $500k/yr? I already know C++, could be tempted to learn CUDA.
I kinda doubt it. Nobody paid me to do that though. I was just interested in LCZero. To get that $500k/year, I think you need up to date ML understanding and not just CUDA. CUDA is just another programming language while ML is a big area of active research. You could watch some of the fast.ai ML videos and then enter some Kaggle competitions if you want to go that route.
You're wrong. The people building the models don't write CUDA kernels. The people optimizing the models write CUDA kernels. And you don't need to know a bunch of ML bs to optimize kernels. Source: I optimize GPU kernels. I don't make 500k but I'm not that far from.
Heh I'm in the wrong business then. Interesting. Used to be that game programmers spent lots of time optimizing non-ML CUDA code. They didn't make anything like 500k at that time. I wonder what the ML industry has done to game development, or for that matter to scientific programming. Wow.
How much performance difference is there between writing a kernel in a high level language/framework like PyTorch (torch.compile) or Triton, and hand optimizing? Are you writing kernels in PTX?
What's your opinion on the future of writing optimized GPU code/kernels - how long before compilers are as good or better than (most) humans writing hand-optimized PTX?
Are there any certifications or other ways to prove your knowledge to employers in order to get your foot in the door?
On bro forget gold if like to be worth my weight in paper lmao
The fact that "worth their weight in cold" typically means in the single-digit millions is fascinating to me (though I doubt I'll be able to get there myself, maybe someday). I looked it up though and I think this is undercounting the current value of gold per ounce/lb/etc.
5320814 / 180 / 16 = ~1847.5
Per https://www.apmex.com/gold-price and https://goldprice.org/, current value is north of $2400 / oz. It was around $1800 in 2020. That growth for _gold_ of all things (up 71% in the last 5 years) is crazy to me.
It's worth noting that anyone with a ski house that expensive probably has a net worth well over twice the price of that ski house. I guess it's time to start learning CUDA!
Note: gold uses troy ounces, so adjust by ~10%. It's easier to just use grams or kilograms :).
Thanks, I'm a bit new to this entire concept. Do troy lbs also exist, or is that just a term when measuring ounces?
> That growth for _gold_ of all things (up 71% in the last 5 years) is crazy to me.
For comparison: S&P500 grew about the same during that period (more than 100% from Jan 2019, about 70 from Dec 2019), so the higher price of gold did not outperform the growth of the general (financial) economy.
But that's still surprising performance, because the S&P generates income and pays dividends. Its increase reflects (at least, is supposed to!) expectations of future higher income. Gold doesn't even bear interest....
That’s pretty funny. Good test of value across the millennia. I wonder if the best aqueduct engineers during the peak of Ancient Rome’s power had villas worth their body weight in gold.
Lol. For once being overweight may come with some advantages here.
Or disadvantages: you may be as rich as your skinny neighbour, but they are the only ones worth their weight in gold ;)
What do people study to figure out CUDA? I’m studying to get me GED and hope to go to school one day
Computer science. This is a grad level topic probably.
Nvidia literally wrote most of the textbooks in this field and you’d probably be taught using one of these anyway:
https://developer.nvidia.com/cuda-books-archive
“GPGPU Gems” is another “cookbook” sort of textbook that might be helpful starting out but you’ll want a good understanding of the SIMT model etc.
Just wait until someone trains an ML model that can translate any CUDA code into something more portable like HIP.
GP says it is just some #ifdefs in most cases, so an LLM should be able to do it, right?
OpenAI Triton? Pytorch 2.0 already uses it.
https://openai.com/index/triton/
This seems to be fairly common problem with software. The people who create software regularly deal with complex tool chains, dependency management, configuration files, and so on. As a result they think that if a solutions "exists" everything is fine. Need to edit a config file for your particular setup? No problem. The thing is, I have been programming stuff for decades and I really hate having to do that stuff and will avoid tools that make me do it. I have my own problems to solve, and don't want to deal with figuring out tools no matter how "simple" the author thinks that is to do.
A huge part of the reason commercial software exists today is probably because open source projects don't take things to this extreme. I look at some things that qualify as products and think they're really simplistic, but they take care of some minutia that regular people are will to pay so they don't have to learn or deal with it. The same can be true for developers and ML researchers or whatever.
I don't think so. I agree it is too hard for the ML researches at the companies which will have their rear ends handed to them by the other companies whose ML researchers can be bothered to follow a blog post and prompt ChatGPT to resolve error messages.
a lot of ML researchers stay pretty high level and reinstall conda when things stop working
and rightly so, they have more complicated issues to tackle
It's on developers to provide better infrastructure and solve these challenges
Not rightly. It'd be faster on the long term to address the issues.
Currently nobody think that long term. They just reinstall, that’s it.
I'm not really talking about companies here for the most part, I'm talking about academic ML researchers (or industry researchers whose role is primarily academic-style research). In companies there is more incentive for good software engineering practices.
I'm also speaking from personal experience: I once had to hand-write my own CUDA kernels (on official NVIDIA cards, not even this weird translation layer): it was useful and I figured it out, but everything was constantly breaking at first.
It was a drag on productivity and more importantly, it made it too difficult for other people to run my code (which means they are less likely to cite my work).
If your point is that HIP is not a zero-effort porting solution, that is correct. HIP is a low-effort solution, not a zero effort solution. It targets users who already use and know CUDA, and minimizes the changes that are required from pre-existing CUDA code.
In the case of these abstraction layers, then it would be the responsibility of the abstraction maintainers (or AMD) to port them. Obviously, someone who does not even use CUDA would not use HIP either.
To be honest, I have a hard time believing that a truly zero-effort solution exists. Especially one that gets high performance. Once you start talking about the full stack, there are too many potholes and sharp edges to believe that it will really work. So I am highly skeptical of original article. Not that I wouldn't want to be proved wrong. But what they're claiming to do is a big lift, even taking HIP as a starting point.
The easiest, fastest (for end users), highest-performance solution for ML will come when the ecosystem integrates it natively. HIP would be a way to get there faster, but it will take nonzero effort from CUDA-proficient engineers to get there.
I agree completely with your last point.
As other commenters have pointed out, this is probably a good solution for HPC jobs where everyone is using C++ or Fortran anyway and you frequently write your own CUDA kernels.
From time to time I run into a decision maker who understandably wants to believe that AMD cards are now "ready" to be used for deep learning, and points to things like the fact that HIP mostly works pretty well. I was kind of reacting against that.
God this explains so much about my last month, working with tensorflow lite and libtorch in C++
The target audience of interoperability technology is whoever is building, though. Ideally, interoperability technology can help software that supports only NVIDIA GPUs today go on to quickly add baseline support for Intel and AMD GPUs tomorrow.
(and for one data point, I believe Blender is actively using HIP for AMD GPU support in Cycles.)
As someone doing a lot of work with CUDA in a big research organization, there are few of us. If you are working with CUDA, then you are not from the type of people who wait to have something that just works like you describe. CUDA itself is a battle with poorly documented stuff.
Their target is hpc users, not ml researchers. I can understand why this would be valuable to this particular crowd.
Inline PTX is hardly an obscure feature. It's pretty widely used in practice, at least in the AI space.
Yeah, a lot of the newer accelerators are not even available without using inline PTX assembly. Even the ones that are have weird shapes that are not amenable to high-performance work.
Are you saying that the latest NVIDIA nvcc doesn't support the latest NVIDIA devices?
For any compiler, "supporting" a certain CPU or GPU only means that they can generate correct translated code with that CPU or GPU as the execution target.
It does not mean that the compiler is able to generate code that has optimal performance, when that can be achieved by using certain instructions without a direct equivalent in a high-level language.
No compiler that supports the Intel-AMD ISA knows how to use all the instructions available in this ISA.
Sure, but I'm not sure if that is what the parent poster was saying (that nvcc generates poor quality PTX for newer devices).
It's been a while since I looked at CUDA, but it used to be that NVIDIA were continually extending cuDNN to add support for kernels needed by SOTA models, and I assume these kernels were all hand optimized.
I'm curious what kind of models people are writing where not only is there is no optimized cuDNN support, but also solutions like Triton or torch.compile, and even hand optimized CUDA C kernels are too slow. Are hand written PTX kernels really that common ?
How does it run CUDA Fortran?
AMD has hipify for this, which converts cuda code to hip.
https://github.com/ROCm/HIPIFY
It would be good if AMD did something, anything.
Support this, reimplement that, support upstream efforts, dont really care. Any of those would cost a couple of million and be worth a trillion dollars to AMD shareholders.
Is it weird how the comments here are blaming AMD and not Nvidia? Sure, the obvious argument is that Nvidia has no practical motivation to build an open platform. But there are counterexamples that suggest otherwise (Android). And there is a compelling argument that long term, their proprietary firmware layer will become an insufficient moat to their hardware dominance.
Who’s the root cause? The company with the dominant platform that refuses to open it up, or the competitor who can’t catch up because they’re running so far behind? Even if AMD made their own version of CUDA that was better in every way, it still wouldn’t gain adoption because CUDA has become the standard. No matter what they do, they’ll need to have a compatibility layer. And in that case maybe it makes sense for them to invest in the best one that emerges from the community.
Not even a little bit. It simply isn't Nvidia's job to provide competitive alternatives to Nvidia. Competing is something AMD must take responsibility for.
The only reason CUDA is such a big talking point is because AMD tripped over their own feet supporting accelerated BLAS on AMD GPUs. Realistically it probably is hard to implement (AMD have a lot of competent people on staff) but Nvidia hasn't done anything unfair apart from execute so well that they make all the alternatives look bad.
I agree with you, but replace NVIDIA with Apple. What would the EU say?
I don't think nvidia bans anyone from running code on their devices.
They do from time to time: https://wirelesswire.jp/2017/12/62708/
This seems to be more about certain devices (consumer-grade GPUs) in certain settings (data centers), though I do question how enforceable it actually is. My guess is that it can only apply when you try to get discounts from bulk-ordering GPUs.
Also, was there any followup to this story? It seems a bit unnecessary because nVidia has already neutered consumer cards for many/most data center purposes by not using ECC and by providing so few FP64 units that double precision FLOPS is barely better than CPU SIMD.
it’s also not really a thing anymore because of the open kernel driver… at that point it’s just MIT licensed.
the userland they paid to develop is still proprietary though, but idk if the libraries have the no-datacenter restrictions on them.
https://www.pcgamer.com/nvidia-officially-confirms-hash-rate...
Also: look into why the Nouveau driver performance is limited.
so terrible that vendors can enforce these proprietary licenses on software they paid to develop /s
Nvidia has put in the legwork and are reaping the rewards. They've worked closely with the people who are actually using their stuff, funding development and giving loads of support to researchers, teachers and so on, for probably a decade now. Why should they give all that away?
How is Android a counterexample? Google makes no money off of it, nor does anyone else. Google keeps Android open so that Apple can't move everyone onto their ad platform, so it's worth it for them as a strategic move, but Nvidia has no such motive.
Maybe. But again, that's because NVidia has been putting in the work to make something better for a decade or more. The best time for AMD to start actually trying was 10 years ago; the second-best time is today.
Google makes no money off of Android? That seems like a really weird claim to make. Do you really think Google would be anywhere near as valuable of a company if iOS had all of the market share that the data vacuum that is Android has? I can't imagine that being the case.
Google makes a boatload off of Android, just like AMD would if they supported open GPGPU efforts aggressively.
Google gave away the software platform - Android - to hardware vendors for free, vendors compete making the hardware into cheap, low-margin commodity items, and google makes boatloads of money from ads, tracking and the app store.
nvidia could give away the software platform - CUDA - to hardware vendors for free, making the hardware into cheap, low-margin commodity items. But how would they make boatloads of money when there's nowhere to put ads, tracking or an app store?
Because it IS AMD/Apple/etcs fault for the position they're in right now. CUDA showed where the world was heading and where the gains in compute would be made well over a decade ago now.
They even had OpenCL, didn't put the right amount of effort into it, all the talent found CUDA easier to work with so built there. Then what did AMD, Apple do? Double down and try and make something better and compete? Nah they fragmented and went their own way, AMD with what feels like a fraction of the effort even Apple put in.
From the actions of the other teams in the game it's not hard to imagine a world without CUDA being a world where this tech is running at a fraction of it's potential.
Khrons, AMD and Intel have had 15 years to make something out of OpenCL that could rival CUDA.
Instead they managed 15 years of disappointment, in a standard stuck in C99, that adopted C++ and a polyglot bytecode too late to matter, never produced an ecosystem of IDE tooling and GPU libraries.
Naturally CUDA became the standard, when NVIDIA provided what the GPU community cared about.
It's not. Even as it is, I do not trust HIP or RocM to be a viable alternative to Cuda. George Hotz did plenty of work trying to port various ML architectures to AMD and was met with countless driver bugs. The problem isn't nvidia won't build an open platform - the problem is AMD won't invest in a competitive platform. 99% of ML engineers do not write CUDA. For the vast majority of workloads, there are probably 20 engineers at Meta who write the Cuda backend for Pytorch that every other engineer uses. Meta could hire another 20 engineers to support whatever AMD has (they did, and it's not as robust as CUDA).
Even if CUDA was open - do you expect nvidia to also write drivers for AMD? I don't believe 3rd parties will get anywhere writing "compatibility layers" because AMD's own GPU aren't optimized or tested for CUDA-like workloads.
It's always been on the straggler to catch up by cheating. That's just how the world works - even in open source. If AMD supported CUDA, it would have a bigger market share. That's a fact. Nvidia doesn't want that. That's a fact. But when Reddit started, it just scraped feeds from Digg, and when Facebook started, it let you link your MySpace credentials and scraped your MySpace account. Adversarial interoperability is nothing new.
Funnily, who I blame the most for there not being real competition to CUDA is apple. As of late, Apple has been really pushing for vender lock in APIs rather than adopting open standards. The end result is you can get AMD and Intel onboard with some standard which is ultimately torpedoed by apple. (See apple departing from and rejecting everything that comes from the khronos group).
With the number of devs that use Apple silicon now-a-days, I have to think that their support for khronos initiatives like SYCL and OpenCL would have significantly accelerated progress and adoption in both.
We need an open standard that isn't just AMD specific to be successful in toppling CUDA.
ROCm counts as "something"
Pretty much any modern NVIDIA GPU supports CUDA. You don't have to buy a datacenter-class unit to get your feet wet with CUDA programming. ROCm will count as "something" when the same is true for AMD GPUs.
Here is the support list:
https://rocm.docs.amd.com/projects/install-on-linux/en/lates...
AMD's definition of "support" I think is different than what people expect, and pretty misleading - ROCm itself will run on almost anything, back as far as the RX 400/500 series:
https://en.wikipedia.org/wiki/ROCm#:~:text=GCN%205%20%2D%20V...
Stable Diffusion ran fine for me on RX 570 and RX 6600XT with nothing but distro packages.
There are out-of-bounds writes in the BLAS libraries for gfx803 GPUs (such as the RX 570). That hardware might work fine for your use case, but there's a lot of failures in the test suites.
I agree with your general sentiment, but I wouldn't recommend pre-Vega GPUs for use with ROCm. Stick to gfx900 and newer, if you can.
I don't buy it. Even running things like llama.cpp on my RX 570 via Vulkan crashes the entire system.
The last time I checked, I was stuck with a pretty old kernel if I wanted to have the last version of ROCm available for my rx470. It's compatible at some point in time, but not kept compatible with recent kernels.
I don't think AMD needs to support 5+ year old GPUs personally. And all the recent generations are already practically supported.
AMD only claims support for a select few GPUs, but in my testing I find all the GPUs work fine if the architecture is supported. I've tested rx6600, rx6700xt for example and even though they aren't officially supported, they work fine on ROCm.
I'm on a 5+ year old GPU, because I don't trust AMD to offer a compelling GPU that actually works. An RX 7 570 is good enough for the little gaming I do. It mostly acts as an oversized iGPU that has good Linux drivers, but since AMD is not supporting ROCm on this GPU, there is no need to hurry on upgrading to a better GPU or to get my feet wet on running things locally on the GPU like Stable Diffusion, LLMs, etc.
AMD had a big architecture switchover exactly 5 years ago, and the full launch wasn't over until 4.5 years ago. I think that generation should have full support. Especially because it's not like they're cutting support now. They didn't support it at launch, and they didn't support it after 1, 2, 3, 4 years either.
The other way to look at things, I'd say that for a mid to high tier GPU to be obsolete based on performance, the replacement model needs to be over twice as fast. 7700XT is just over 50% faster than 5700XT.
ROCm supports current gen consumer gpus officially and a decent chunk of recent gen consumer gpus unofficially. Not all of them of course but a decent chunk.
It's not ideal but I'm pretty sure CUDA didn't support everything from day 1. And ROCm is part of AMD's vendor part of the Windows AI stack so from upcoming gen on out basically anything that outputs video should support ROCm.
No, but CUDA at least supported the 8800 gt on release [1]. ROCm didn't support any consumer cards on release, looks like they didn't support any till last year? [2]
[1]https://www.gamesindustry.biz/nvidia-unveils-cuda-the-gpu-co...
[2]https://www.tomshardware.com/news/amd-rocm-comes-to-windows-...
AMD should focus their efforts on competitive hardware offerings, because that is where the need and the money is. Sorry, I don't think the hobbyist should be a priority.
Huh? I've been running ROCm for SD and LLMs for over a year and a half on my puny consumer 6750X - not even latest gen.
A couple of million doesn't get you anything in corporate land
A couple dozen billion for a 10% chance of becoming NVIDIA competitive is worth it, looking at the stock prices.
Pretty sure APIs are not copyrightable, e.g. https://www.law.cornell.edu/supremecourt/text/18-956
They don’t run either of them, they instead implement an equivalent API on top of something else. Here’s a quote: “Open-source wrapper libraries providing the "CUDA-X" APIs by delegating to the corresponding ROCm libraries. This is how libraries such as cuBLAS and cuSOLVER are handled.”
I believe it was decided that they are copyrightable but that using them for compatibility purposes is fair use.
No, it's stranger than that: SCOTUS did not rule on copyrightability of APIs at all, but simply ruled that even if they are copyrightable, what Google did (completely reimplement Sun/Oracle's public API) was still fair use.
It would have been nice to get a clear SCOTUS precedent on this. On the other hand, I also value a SCOTUS which rules minimally and narrowly by default (I also appreciate SCOTUS' return to stricter constitutional grounding in the past decade).
Incredibly loud laughing from the lawyers whose study of law is being thrown around willy nilly because of all the unprecedented joke decisions they are making right now.
We are stuck between a rock and a hard place politically. The real decisions should be coming from Congress not the courts. However, Congress is too disorganized and disconnected to answer the important questions, leaving the courts to either muddle along or else become semi-dictatorial. In most countries, this would cause a constitutional crisis, but the modern U.S. system seems to be a little too resilient to such otherwise concerning signals.
We're far past a constitutional crisis, and the courts taking power nobody wanted to give to them (who wasn't interested in a unitary executive at least) isn't a good solution.
What constitutional crisis has occurred that hasn't been resolved?
Constitutional crises involve fundamental breaks in the working of government that bring two or more of its elements into direct conflict that can't be reconciled through the normal means. The last of these by my accounting was over desegregation, which was resolved with the President ordering the Army to force the recalcitrant states to comply. Before that was a showdown between the New Deal Congress and the Supreme Court, which the former won by credibly threatening to pack the latter (which is IMO a much less severe crisis but still more substantial than anything happening today). However, that was almost a century ago, and Congress has not been that coherent lately.
I would think the latest one where SCOTUS ruled that the president was a king except in matters where the SCOTUS decides they aren't counts as a constitutional crisis.
Constitutional crises are not a matter of opinion but of occurrence, arising from an actual power conflict between arms of the government that is caused by a conflicted reading of the constitutional text. Basically, if the system just ticks on, it's not a constitutional crisis.
If "I think this is a very bad decision" was cause for a constitutional crisis, any state with more than three digit population would be in constitutional crisis perpetually.
This happened as recently as 2021-01-06; strong evidence that the military subverted the president to call the National Guard into Washington DC and secure the electoral count.
That's close. Both the excessively long lame duck period (2 months for Congress and 2.5 months for the President) and disunity between the President and the rest of the executive branch have also been fodder for crises in the past (Marbury v Madison, Andrew Johnson's impeachment).
If Trump didn't back down it could have definitely been a constitutional crisis.
I'd say it was narrowly averted though.
That is how the SC used to work: they would decide cases on the narrowest possible grounds. If they don't have to decide a tough question, but they can finesse it with something simpler, good enough. More recently they have been willing to tear up decades of established law on a regular basis.
"Used to work"... this was 2021.
And generally courts/judges just choose the scope of their legal opinions based on how far reaching they want the legal principles to apply.
IMHO, copyright-ability of APIs is so far away from their political agenda that they probably just decided to leave the issue on a cliffhanger...
Yes, "used to". Now, in 2024, the same supreme court has decided that presidents have immunity in all official acts, from stealing documents, up to and including assassination attempts on their opponents. This is a radical shift in how the court operates.
This "opponent assassination" hypothetical gets bandied about a lot but I have not seen any evidence that any court considers that to be an "official act". Official acts are constrained to legitimate exercises of constitutional authority and are not merely anything a President (or especially, an ex-President) does.
the only thing radical is the opinions of people you are listening to if you believe SCOTUS enabled legally sanctioned assassinations. It was political hyperbole based on nothing, and it worked (with you). Think for yourself.
You're correct! Fair Use Doctrine
Agreed. Rather than making CUDA the standard; AMD should push/drive an open standard that can be run on any hardware.
We have seen this succeed multiple times: FreeSync vs GSync, DLSS vs FSR, (not AMD but) Vulkan vs DirectX & Metal.
All of the big tech companies are obsessed with ring-fencing developers behind the thin veil of "innovation" - where really it's just good for business (I swear it should be regulated because it's really bad for consumers).
A CUDA translation layer is okay for now but it does risk CUDA becoming the standard API. Personally, I am comfortable with waiting on an open standard to take over - ROCm has serviced my needs pretty well so far.
Just wish GPU sharing with VMs was as easy as CPU sharing.
Vulkan only matters on Android (from version 10 onwards) and GNU/Linux.
Zero impact on Switch, Playstation, XBox, Windows, macOS, iOS, iPadOS, Vision OS.
"Windows"
dxvk-gplasync is a game changer for dx9-11 shader stutter.
Sure, for the 2% folks that enjoy Windows games, written againt DirectX, on Linux Steam Store.
Which Android Studios can't even be bothered to target with their NDK engines, based on GL ES, Vulkan.
I'm on windows 11, if I see not dx12 in my afterburner overlay, I use it.
Even if there's no shader stutter, Vulkan tends to use less juice than DX.
we actually also saw this historically with openGL. openGL comes from an ancient company whispered about by the elderly programmers (30 + year old) known as SGI. Originally it was CLOSED SOURCE and SGI called it "SGI-GL" for a computer codename IRIS which was cool looking with bright popping color plastic and faux granite keyboard. Good guy SGI open sourced SGI-GL to become what we called "openGL" (get it, now it's open), and then it stuck.
That's all to say NVIDIA could pull a SGI and open their stuff, but they're going more sony style and trying to monopolize. Oh, and SGI also wrote another ancient lore library known as "STL" or the "SGI Template Library" which is like the original boost template metaprogramming granddaddy
Nice story, but is it correct? Wikipedia says STL was first implemented by HP and later by the same authors at SGI.
STL started even earlier, obviously without using the name "STL", as a library of generic algorithms for the programming language Ada (David R. Musser & Alexander A. Stepanov, 1987).
Also the XFS file system.
AMD has always been notoriously bad at the software side, and they frequently abandon their projects when they're almost usable, so I won't hold my breath.
AMD shouldn't push on anything. They have the wrong incentives. They should just make sure that software runs on their GPUs and nothing else.
Karol Herbst is working on Rusticl, which is mesa's latest OpenCL implementation and will pave the way for other things such as SYCL.
OpenCL was released in 2009. AMD has had plenty of time to push and drive that standard. But OpenCL had a worse experience than CUDA, and AMD wasn't up to the task in terms of hardware, so it made no real sense to go for OpenCL.
A strategic and forward-thinking approach
"We have seen this succeed multiple times: FreeSync vs GSync, DLSS vs FSR, (not AMD but) Vulkan vs DirectX & Metal."
I'll definitely agree with you on Sync and Vulkan, but dlss and xess are both better than fsr.
https://youtube.com/watch?v=el70HE6rXV4
Like supporting x86 was a bad idea as well?
Was there a large entity steering x86 spec alone with a huge feature lead against their competition, free to steer the spec in any ways they choose? Also, hardware is not opensource software, you get big players onboard and they will be able to implement the spec they want every gen, software has more moving parts and unaligned parties involved.
Ever heard of Intel?
I had't considered that angle. Is your point that Intel was the creator of x86, but software chose to support it, then AMD had nothing else but to play catch up in x86 support to be part of the software target market? If so and factual (I've no idea), fair point, I didn't know.
It was exactly the same instruction set.
C compilers didn't offer an "AMD" CPU target* until AMD came out with the "AMD64" instruction set. Today we call this "x86_64" or "x64".
* Feel free to point out some custom multimedia vector extensions for Athlons or something, but the point remains.
And Intel named its licenced implementation of AMD64 as IA-32e, just to make it clear to everyone that it is based on Intel architecture 32bit version with an extension. Luckily they dropped that name few years later
Before starting, AMD signed an agreement with Intel that gave them an explicit license to x86. And x86 was a whole lot smaller and simpler back then in 1982. A completely different and incomparable situation.
Technically it was after starting - AMD was founded in 1969 as a second-sourcer for Fairchild and National Semiconductor, and had reverse-engineered the 8080 by 1975 and acquired a formal license to it by 1976.
The 1982 deal you speak of was actually pretty interesting: as a condition of the x86's use in the IBM PC, IBM requested a second source for x86 chips. AMD was that source, and so they cross-licensed the x86 in 1982 to allow the IBM PC project to proceed forward. This makes the Intel/AMD deal even more important for both companies: the PC market would never have developed without the cross-licensing, which would've been bad for all companies involved. This gave Intel an ongoing stake in AMD's success at least until the PC market consolidated on the x86 standard.
I disagree. AMD can simply not implement those APIs, similar to how game emulators implement the most used APIs first and sometimes never bother implementing obscure ones. It would only matter that NVIDIA added eg. patented APIs to CUDA if those APIs were useful. In which case AMD should have a way to do them anyway. Unless NVIDIA comes up with a new patented API which is both useful and impossible to implement in any other way, which would be bad for AMD in any event. On the other hand, if AMD start supporting CUDA and people start using AMD cards, then developers will be hesitant to use APIs that only work on NVIDIA cards. Right now they are losing billions of dollars on this. Then again they barely seem capable of supporting RocM on their cards, much less CUDA.
You have a fair point in terms of cuDNN and cuBLAS but I don't know that that kind of ToS is actually binding.
Patented API? I thought Google v. Oracle settled this? Making an implementation of an API spec is fair use, is it not?
My understanding is that Google v. Oracle only applies to copyright.
Well you can't patent an API so....
You can patent the implementation. You can't patent the API name DecodeH265Video() but you can still sue someone for implementing that function correctly.
How's this situation different than the one around Java, Sun/Oracle and Google?
The judge might not be a coder next time.
The US law is highly dependent on precedents. The Google-Oracle case has set one fortunately, so anything following it won't start from scratch. Fortunately we may not need a closer judge.
Google-Oracle side stepped the issue of API copyrightability by saying Google's particular implementation would fall under fair use. Whether APIs are copyrightable remains an open question.
Until you get an activist court
Isn't cuDNN a much better case for reimplementing than CUDA? It has much more choice in how things actually happen and cuDNN itself chooses different implementations at runtime + does fusing. It seems way more generic and the reimplementation would allow using the best AMD-targeted kernel rather than one the original has.
AMD have "MIOpen" which is basically cuDNN-for-AMD. Ish.
And that thing is left for unreleased on windows for almost a whole year for unknown reason. Even though there is activity on github and build fix frequently. There is just no .exe or .msi for you to download. In fact, the rocm for linux is on major 6 release (which includes miopen). But somehow windows is still on major 5 (don't have miopen) for almost a whole year.
It almost make me wonder. Is there a shady trade somewhere to ask amd never release sdk for Windows to hike the price of nvidia card higher? Why they keep developing these without release it at all?
(Let's put the legal questions aside for a moment.)
nVidia changes GPU architectures every generation / few generations, right? How does CUDA work across those—and how can it have forwards compatibility in the future—if it's not designed to be technologically agnostic?
PTX is meant to be portable across GPU microarchitectures. That said, Nvidia owns the entire spec, so they can just keep adding new instructions that their GPUs now support but AMD GPUs don't.
One way is to make sure the hardware team does certain things to support easy transition to new architectures, we have seen this with Apple Silicon for example!
Not having a layer like this has left AMD completely out of the AI game that has made NVDA the world's most valuable company.
Self-inflicted wounds hurt the most.
I really hope they will do what you suggested. With some innovative product placement, GPUs with a lot of memory for example, they could dethrone nvidia if it doesn't change strategy.
That said, easier said than done. You need very specialized developers to build a CUDA equivalent and have people start using it. AMD could do it with a more open development process leveraging the open source community. I believe this will happen at some point anyway by AMD or someone else. The market just gets more attractive by the day and at some point the high entry barrier will not matter much.
So why should AMD skimp on their ambitions here? This would be a most sensible investment, few risks and high gains if successful.
This expanding market provides AMD with a lucrative opportunity indeed
CUDA v1...CUDA v2... CUDA v... CUDA isnt commonly assosiated with a version number...
…yes it is? https://developer.nvidia.com/cuda-toolkit-archive
Cries in OpenCL
CUDA is the juice that built Nvidia in the AI space and allowed them to charge crazy money for their hardware. To be able to run CUDA on cost effective AMD hardware can be a big leap forward, allow more people to research, and break away from Nvidia's stranglehold over VRAM. Nvidia will never open source their own platform unless their hand is forced. I think we all should support this endeavor and contribute where possible.
The legal, technical and strategic challenges make it a less attractive option
Given AMDs prior lack of interest I'll take whatever options there are. My daily driver has a Vega 10 GPU and it's been quite frustrating not to be able to easily leverage it for doing basic ML tasks, to the point that I've been looking at buying an external nvidia GPU instead just to try out some of the popular Python libraries.
That is why an open standard should be made so it isn't locked to a particular piece of hardware and then allow modular support for different hardware to interface with supported drivers.
Ya, honestly better to leave that to third parties who can dedicate themselves to it and maybe offer support or whatever. Let AMD work on good first party support first.