Just tried to run this using their sample script on my 4090 (which has 24GB of VRAM). It ran for a little over 1 minute and crashed with an out-of-memory error. I tried both SV3D_u and SV3D_p models.
[edit]Managed to generate by tweaking the script to generate less frames simultaneously. 19.5GB peak VRAM usage, 1 min 25 secs to generate at 225 watts.[/edit]
4090 is in weird spot. High speed but low RAM. Theoretically everything should run in ai but practically nothing runs
Maybe dont use a gaming card for ai then? 24 is plenty as most games dont use more than half in 4k.
Maybe give me lots of money to give Nvidia for a card with more memory then?
Nvidia have held back the majority of their cards from going over 24GB for years now. It's 2024 and my laptop has 96GB of RAM available to the GPU but desktop GPUs that cost several thousands just by themselves are stuck at 24GB.
They don’t get their absurd profit margins by cannibalising their data centre chips.
This is like Intel and their refusal to support ECC memory; when AMD does on nearly all Ryzens.
—
Note: your laptop is probably using a 64-bit memory bus for system RAM. For GPUs, the 4090 is 384-bit. That takes up a lot more die area for the bus and memory controller.
But GP's laptop with 96GB of unified memory would be a M2 Max Macbook or better. The M2 Max has a 4 x 128-bit memory bus (410GB/s) and the M2 Ultra is 8 x 128bit (819GB/s), versus a 4090 at 1008GB/s. But see here for caveats about Mac bandwidth: https://news.ycombinator.com/item?id=38811290
Which laptop models share system RAM with an Nvidia RTX cards?
Op probably referring to an M series MacBook since it has a unified memory architecture and the same memory space used by both cpu and gpu
Why would they do that with a gaming card? If you want more you can rent in Aws etc.
It wouldn’t be a local model if it has to work on AWS.
Isn't there the risk that if they give the gaming cards enough RAM for such tasks then they'll get bought up for that purpose and the second-hand price will go even higher?
I guess my point is, rather than give the cards more RAM, the gaming cards should just be priced cheaper.
This is unfairly downvoted. They launched 3090 on Sep 2020 with 24GB which was more than AMD's 16GB 6900XT launched on that same month. Maybe before blaming Nvidia, blame AMD for lack of trying to compete with them? Of course they're not gonna release a gaming card with loads more VRAM because a) competition doesn't exist nor has gaming cards with more VRAM b) it would all be bought up for AI workloads c) games don't really need more as parent said.
Perhaps NVIDIA or somebody could invent a RAM upgrade via NVLINK? Seems plausible and not every problem would want to add another GPU when the ability to add the extra memory alone is all they need.
But why would NVIDIA do that when they can just sell you an A100 for ten times the price of a 4090?
We need AMD to compete, but from what I know their software is subpar to NVIDIA's offering and most of the current ML stacks are built around CUDA. Still there's a lot of money to be made in this area now so competition big and small should pop up.
I'd love it if AMD and Intel teamed up to make a wrapper layer for CUDA. Surely they'd both benefit greatly.
First Intel and then AMD funded a wrapper, yes. Unfortunately the new version supports AMD but no longer Intel.
https://github.com/vosen/ZLUDA
That's a binary level wrapper. Of course there's also ROCm HIP at the source level, and many other things, such as SYCL
In a hypothetical near-future world, competition?
the memory is inherent to the gpu architecture. You cannot just add VRAM and expect no other bottlenecks to pop up. Yes they can reduce the VRAM to create budget models and save a bit here and there. But adding VRAM to a top model is a tricky endeavour
Yeah, I’m still debating whether I go with a Mac Studio with the RAM maxed out (approx $7500 for 192 GB) or a PC with a 4090. Is there a better value path with the Nvidia A series or something else? (I’m not sure about tibygrad)
You can get a previous gen RTX A6000 with 48GB of gddr6 for about $5000 (1). Disclosure: I run that website. Is anyone using the pro cards for inference?
(1) https://gpuquicklist.com/pro?models=RTX%20A6000
I have an M1 Max with 64GB and 3090 Ti. M1 Max is ~4x slower at inference for the same models than 3090 (i.e. 7t/s vs 30t/s), which depending on the task can be very annoying. As a plus you get to run really large models, albeit very slowly. Think if that will bother you or not. I will not give up my 3090 Ti and am rather waiting for 5090 to see what it can do because when programming, the Mac is too slow to shoot of questions. I use it mostly to better understand book topics now and 3090 Ti to do fast chat sessions.
Groq may be an option?
Just don't max out the Mac Studio and get both...
"Theoretically everything should run in ai"
Odd statement. I don't really know what you mean by that. Perhaps 'math _works_, code should too' ?
I would definitely agree that it _should_ work.
I'm of the belief that no one should _have to_ publish (e.g. to graduate, get promotions, etc) in academia, and that publications should only occur if they're believed to be near Novel prize worthy, and fully reproducible by code with packaging that should last and work in 10 years, from data archives that will exist in 10 years.
But it seems I have been outvoted by the administration in academia.
Hence, we get this "ai that doesn't run" phenomenon
What's the point of academia if not to publish?
Do you want to publicly fund researchers only for the industrial research partner's benefit?
It already is effectively just for industry benefit. It's been like that since the start. Work that is too expensive for industry to do (research and discovery) was put into the public sphere such that the role of industry was to take that innovation and optimize it. That's at least how it is intentionally constructed.
My main point was that there is a lot of noise in scientific journals that are caused from pressures in academia that are requirements if publishing. If these are removed, then the quality of work published increases and quantity decreased.
There are other places to post work that is derivative and non-novel like blogs. The field of biology has an immense amount of work that is mostly observational without strong conclusions or predictivity. A tabulation of observation should definitely be put out by a lab, and it should be much sooner with far less pressures than today, such as the typical dance of putting the data in during publication. The SRA is one example of a place to share data. If the typical way to work was put all data immediately onto a public repo, sometimes comment on it in ways that have been seen before on blogs and other classes below scientific journals, and then if something truly substantial comes out of it (a novel model that is analytical and highly predictive of cell behavior in all situations for example) then publish.
It could alleviate the noise from the signal. LLMs is one case where the noise is very strong in that many papers are simply 'we fine tuned an llm'.
So how should knowledge be shared in academia without publishing? Any work worthy of a Nobel Prize (or more likely, a Turing Award) is built on top of significant amounts of other research that itself wasn't so groundbreaking.
That said, I certainly think that researchers can do more to make their code and data more accessible. We have the tools to do so already but the incentives are often misaligned.
Almost sounds like a GPU vendor who isn't seeing enough competition.
Or, you know, the fact that the card is made for playing video games, not training AI models.
Almost like the only competition of Nvidia is the niece of the CEO.
Didn't know 24GB was considered low lol.
Here's a vram requirements table for fine-tuning an LLM: https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...
No matter how much vram you have, there's something that doesn't fit :)
For AI that's either a very fat SDXL model at it's max native resolution, or a quantized 34B parameter model, so it's on the low size. Compare that with the Blackwell AI "superchip" announced yesterday that appears to the programmer as single GPU with 30TB of RAM.
They don't want to cannibalize sales of the super-expensive GPUs dedicated to ML/AI.
5090 likely won't have more than 32 GB, if even that much.
Even 32GB would be great for a gaming card, any more and you're never seeing on sale as it will be bought by truckloads for AI, so of course they're not gonna balloon the VRAM. I suspect we'd still be at 16GB but they launched 3090 on Sep 2020 with 24GB, before all this craze really, and lowering is bad optics now.
I made a Manifold market[0] on the amount of ram a 5090 will have, and while pretty much nobody has participated, I just checked and the market is amusingly at the 32GB you've also quoted. Just like you, I hope it will be more but I fear it will be even less.
0. https://manifold.markets/Tenoke/how-much-vram-will-nvidia-50...
You can add multiple, but practically speaking you're better off with used 3090s which you get 2 for the price of one 4090.
I have 3090 Ti and I can run Q4 quant 33b models at 30t/s with 8k context. A 4090 would allow me to do the same but with ~45t/s, both inference speeds are more than fast enough for people so 3090 is the usual choice. In my tests on runpod, H100 with 80GB memory is around the same speed as 3090, so slower than a 4090.
Don't forget the 24GB P40, which is a third the speed but also a third the cost if a 3090 (both used).
What can't you run? Unquantised large text models are the only thing I can't run
Stable diffusion, stable video, text models, audio models, I never had issues with anything yet
The 4090 is in a bit of a funny space for LLMs.
There's a lot of open weights activity around 7B/13B models which the 4090 will run with ease. But you could can run those OK on much cheaper cards like the 4070Ti (which is of course why they're popular).
And there's a lot of open weights activity around 70B and 8x7B models which are state-of-the-art - but too big to fit on a 4090. There's not much activity around 30B models, which are too big to be mainstream and too small to be cutting edge.
If you're specifically looking to QLoRA fine-tune a 7B/13B model a 4090 can do that - but if you want to go bigger than that you'll end up using a cloud multi-gpu machine anyway.
4090 has more VRAM than most computers have system RAM. Surprised this is considered "low RAM" in any way except for relative to datacenter cards and top-spec ASi.
You're comparing RAM amounts to other RAM amounts without considering requirements. 24GB is more than (most) current games would ever require, but is considered a uncomfortably-constrictive minimum for most industrial work.
Traditional CPU-bound physics/simulation models have typically wanted all the RAM they could get; the more RAM the more accurate the model. The same is true for AI models.
I can max out 24GB just using spreadsheets and databases, let alone my 3D work or anything computational.
It is targeted to gamers, that professionals are buying. They should be buying A6000 which has 48GB.
Yeah, this is to be expected with early adoption. This stuff comes out of the lab and it's not perfect. The key thing to evaluate is the trajectory and pace of development. Much of what folks challenged ChatGPT with a year ago is long lost in the dust. Go look at stable diffusion this time last year. Dall-E couldn't do words and hands, it nails that 90% of the time in my experience today.
About words, Dall-e is nor even close to nail it 90% of the time. Not even 50%. Maybe they nerf it when you request a logo from it, but that was my experience in the last few days.
I managed to get it working with a 4090. You need to adjust the parameter decoding_t of the sample function in simple_video_sample.py to a lower value (decoding_t = 5 works fine for me). I also needed to install imageio==2.19.3 and imageio-ffmpeg
Ah, yep! You're right! It works now!
Dunno why the defaults for this stuff isn't the base performance, feel I always have to tweak the batch size down on all the base scripts even with 24gb cos everything assumes 48gb