HN comments for: Stability.ai – Introducing Stable Video 3D

kouteiheika

48 replies

19h49m

2024-03-18 22:35:58 UTC

Just tried to run this using their sample script on my 4090 (which has 24GB of VRAM). It ran for a little over 1 minute and crashed with an out-of-memory error. I tried both SV3D_u and SV3D_p models.

[edit]Managed to generate by tweaking the script to generate less frames simultaneously. 19.5GB peak VRAM usage, 1 min 25 secs to generate at 225 watts.[/edit]

ganeshkrishnan

42 replies

19h48m

2024-03-18 22:37:40 UTC

4090 is in weird spot. High speed but low RAM. Theoretically everything should run in ai but practically nothing runs

Hikikomori

9 replies

19h4m

2024-03-18 23:21:25 UTC

Maybe dont use a gaming card for ai then? 24 is plenty as most games dont use more than half in 4k.

smcleod

7 replies

18h56m

2024-03-18 23:29:20 UTC

Maybe give me lots of money to give Nvidia for a card with more memory then?

Nvidia have held back the majority of their cards from going over 24GB for years now. It's 2024 and my laptop has 96GB of RAM available to the GPU but desktop GPUs that cost several thousands just by themselves are stuck at 24GB.

dannyw

1 replies

18h44m

2024-03-18 23:41:17 UTC

They don’t get their absurd profit margins by cannibalising their data centre chips.

This is like Intel and their refusal to support ECC memory; when AMD does on nearly all Ryzens.

—

Note: your laptop is probably using a 64-bit memory bus for system RAM. For GPUs, the 4090 is 384-bit. That takes up a lot more die area for the bus and memory controller.

versteegen

0 replies

12h40m

2024-03-19 05:44:59 UTC

But GP's laptop with 96GB of unified memory would be a M2 Max Macbook or better. The M2 Max has a 4 x 128-bit memory bus (410GB/s) and the M2 Ultra is 8 x 128bit (819GB/s), versus a 4090 at 1008GB/s. But see here for caveats about Mac bandwidth: https://news.ycombinator.com/item?id=38811290

chaostheory

1 replies

15h50m

2024-03-19 02:34:58 UTC

Which laptop models share system RAM with an Nvidia RTX cards?

stygiansonic

0 replies

15h36m

2024-03-19 02:48:58 UTC

Op probably referring to an M series MacBook since it has a unified memory architecture and the same memory space used by both cpu and gpu

Hikikomori

1 replies

18h35m

2024-03-18 23:50:24 UTC

Why would they do that with a gaming card? If you want more you can rent in Aws etc.

smcleod

0 replies

16h17m

2024-03-19 02:08:39 UTC

It wouldn’t be a local model if it has to work on AWS.

DSMan195276

0 replies

17h30m

2024-03-19 00:55:07 UTC

Isn't there the risk that if they give the gaming cards enough RAM for such tasks then they'll get bought up for that purpose and the second-hand price will go even higher?

I guess my point is, rather than give the cards more RAM, the gaming cards should just be priced cheaper.

karolist

0 replies

11h48m

2024-03-19 06:37:25 UTC

This is unfairly downvoted. They launched 3090 on Sep 2020 with 24GB which was more than AMD's 16GB 6900XT launched on that same month. Maybe before blaming Nvidia, blame AMD for lack of trying to compete with them? Of course they're not gonna release a gaming card with loads more VRAM because a) competition doesn't exist nor has gaming cards with more VRAM b) it would all be bought up for AI workloads c) games don't really need more as parent said.

Zenst

6 replies

18h14m

2024-03-19 00:11:32 UTC

Perhaps NVIDIA or somebody could invent a RAM upgrade via NVLINK? Seems plausible and not every problem would want to add another GPU when the ability to add the extra memory alone is all they need.

wongarsu

4 replies

17h47m

2024-03-19 00:38:40 UTC

But why would NVIDIA do that when they can just sell you an A100 for ten times the price of a 4090?

margorczynski

2 replies

16h59m

2024-03-19 01:26:32 UTC

We need AMD to compete, but from what I know their software is subpar to NVIDIA's offering and most of the current ML stacks are built around CUDA. Still there's a lot of money to be made in this area now so competition big and small should pop up.

idonotknowwhy

1 replies

14h46m

2024-03-19 03:39:19 UTC

I'd love it if AMD and Intel teamed up to make a wrapper layer for CUDA. Surely they'd both benefit greatly.

versteegen

0 replies

13h0m

2024-03-19 05:25:05 UTC

First Intel and then AMD funded a wrapper, yes. Unfortunately the new version supports AMD but no longer Intel.

https://github.com/vosen/ZLUDA

That's a binary level wrapper. Of course there's also ROCm HIP at the source level, and many other things, such as SYCL

dragonwriter

0 replies

16h39m

2024-03-19 01:46:27 UTC

In a hypothetical near-future world, competition?

dacryn

0 replies

7h55m

2024-03-19 10:30:17 UTC

the memory is inherent to the gpu architecture. You cannot just add VRAM and expect no other bottlenecks to pop up. Yes they can reduce the VRAM to create budget models and save a bit here and there. But adding VRAM to a top model is a tricky endeavour

chaostheory

4 replies

17h11m

2024-03-19 01:14:07 UTC

Yeah, I’m still debating whether I go with a Mac Studio with the RAM maxed out (approx $7500 for 192 GB) or a PC with a 4090. Is there a better value path with the Nvidia A series or something else? (I’m not sure about tibygrad)

kristianp

0 replies

7h29m

2024-03-19 10:56:14 UTC

You can get a previous gen RTX A6000 with 48GB of gddr6 for about $5000 (1). Disclosure: I run that website. Is anyone using the pro cards for inference?

(1) https://gpuquicklist.com/pro?models=RTX%20A6000

karolist

0 replies

11h55m

2024-03-19 06:30:08 UTC

I have an M1 Max with 64GB and 3090 Ti. M1 Max is ~4x slower at inference for the same models than 3090 (i.e. 7t/s vs 30t/s), which depending on the task can be very annoying. As a plus you get to run really large models, albeit very slowly. Think if that will bother you or not. I will not give up my 3090 Ti and am rather waiting for 5090 to see what it can do because when programming, the Mac is too slow to shoot of questions. I use it mostly to better understand book topics now and 3090 Ti to do fast chat sessions.

chaxor

0 replies

11h5m

2024-03-19 07:20:41 UTC

Groq may be an option?

Oioioioiio

0 replies

9h5m

2024-03-19 09:19:47 UTC

Just don't max out the Mac Studio and get both...

chaxor

3 replies

11h6m

2024-03-19 07:18:48 UTC

"Theoretically everything should run in ai"

Odd statement. I don't really know what you mean by that. Perhaps 'math _works_, code should too' ?

I would definitely agree that it _should_ work.

I'm of the belief that no one should _have to_ publish (e.g. to graduate, get promotions, etc) in academia, and that publications should only occur if they're believed to be near Novel prize worthy, and fully reproducible by code with packaging that should last and work in 10 years, from data archives that will exist in 10 years.

But it seems I have been outvoted by the administration in academia.

Hence, we get this "ai that doesn't run" phenomenon

KeplerBoy

1 replies

10h31m

2024-03-19 07:54:27 UTC

What's the point of academia if not to publish?

Do you want to publicly fund researchers only for the industrial research partner's benefit?

chaxor

0 replies

3h11m

2024-03-19 15:13:53 UTC

It already is effectively just for industry benefit. It's been like that since the start. Work that is too expensive for industry to do (research and discovery) was put into the public sphere such that the role of industry was to take that innovation and optimize it. That's at least how it is intentionally constructed.

My main point was that there is a lot of noise in scientific journals that are caused from pressures in academia that are requirements if publishing. If these are removed, then the quality of work published increases and quantity decreased.

There are other places to post work that is derivative and non-novel like blogs. The field of biology has an immense amount of work that is mostly observational without strong conclusions or predictivity. A tabulation of observation should definitely be put out by a lab, and it should be much sooner with far less pressures than today, such as the typical dance of putting the data in during publication. The SRA is one example of a place to share data. If the typical way to work was put all data immediately onto a public repo, sometimes comment on it in ways that have been seen before on blogs and other classes below scientific journals, and then if something truly substantial comes out of it (a novel model that is analytical and highly predictive of cell behavior in all situations for example) then publish.

It could alleviate the noise from the signal. LLMs is one case where the noise is very strong in that many papers are simply 'we fine tuned an llm'.

michaelmior

0 replies

6h46m

2024-03-19 11:39:14 UTC

So how should knowledge be shared in academia without publishing? Any work worthy of a Nobel Prize (or more likely, a Turing Award) is built on top of significant amounts of other research that itself wasn't so groundbreaking.

That said, I certainly think that researchers can do more to make their code and data more accessible. We have the tools to do so already but the incentives are often misaligned.

jug

2 replies

9h18m

2024-03-19 09:07:36 UTC

Almost sounds like a GPU vendor who isn't seeing enough competition.

paxys

0 replies

6h16m

2024-03-19 12:09:04 UTC

Or, you know, the fact that the card is made for playing video games, not training AI models.

ImHereToVote

0 replies

9h14m

2024-03-19 09:10:59 UTC

Almost like the only competition of Nvidia is the niece of the CEO.

idonotknowwhy

2 replies

14h45m

2024-03-19 03:39:50 UTC

Didn't know 24GB was considered low lol.

michaelt

0 replies

4h50m

2024-03-19 13:35:36 UTC

Here's a vram requirements table for fine-tuning an LLM: https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

No matter how much vram you have, there's something that doesn't fit :)

mcbuilder

0 replies

6h17m

2024-03-19 12:08:02 UTC

For AI that's either a very fat SDXL model at it's max native resolution, or a quantized 34B parameter model, so it's on the low size. Compare that with the Blackwell AI "superchip" announced yesterday that appears to the programmer as single GPU with 30TB of RAM.

Sohcahtoa82

2 replies

18h49m

2024-03-18 23:36:35 UTC

They don't want to cannibalize sales of the super-expensive GPUs dedicated to ML/AI.

5090 likely won't have more than 32 GB, if even that much.

karolist

0 replies

11h52m

2024-03-19 06:32:45 UTC

Even 32GB would be great for a gaming card, any more and you're never seeing on sale as it will be bought by truckloads for AI, so of course they're not gonna balloon the VRAM. I suspect we'd still be at 16GB but they launched 3090 on Sep 2020 with 24GB, before all this craze really, and lowering is bad optics now.

Tenoke

0 replies

18h11m

2024-03-19 00:14:09 UTC

I made a Manifold market[0] on the amount of ram a 5090 will have, and while pretty much nobody has participated, I just checked and the market is amusingly at the 32GB you've also quoted. Just like you, I hope it will be more but I fear it will be even less.

0. https://manifold.markets/Tenoke/how-much-vram-will-nvidia-50...

karolist

1 replies

11h58m

2024-03-19 06:27:21 UTC

You can add multiple, but practically speaking you're better off with used 3090s which you get 2 for the price of one 4090.

I have 3090 Ti and I can run Q4 quant 33b models at 30t/s with 8k context. A 4090 would allow me to do the same but with ~45t/s, both inference speeds are more than fast enough for people so 3090 is the usual choice. In my tests on runpod, H100 with 80GB memory is around the same speed as 3090, so slower than a 4090.

ynniv

0 replies

4h10m

2024-03-19 14:15:31 UTC

Don't forget the 24GB P40, which is a third the speed but also a third the cost if a 3090 (both used).

jokethrowaway

1 replies

19h32m

2024-03-18 22:53:23 UTC

What can't you run? Unquantised large text models are the only thing I can't run

Stable diffusion, stable video, text models, audio models, I never had issues with anything yet

michaelt

0 replies

5h4m

2024-03-19 13:21:18 UTC

The 4090 is in a bit of a funny space for LLMs.

There's a lot of open weights activity around 7B/13B models which the 4090 will run with ease. But you could can run those OK on much cheaper cards like the 4070Ti (which is of course why they're popular).

And there's a lot of open weights activity around 70B and 8x7B models which are state-of-the-art - but too big to fit on a 4090. There's not much activity around 30B models, which are too big to be mainstream and too small to be cutting edge.

If you're specifically looking to QLoRA fine-tune a 7B/13B model a 4090 can do that - but if you want to go bigger than that you'll end up using a cloud multi-gpu machine anyway.

LoganDark

1 replies

19h39m

2024-03-18 22:45:45 UTC

4090 has more VRAM than most computers have system RAM. Surprised this is considered "low RAM" in any way except for relative to datacenter cards and top-spec ASi.

samplatt

0 replies

14h42m

2024-03-19 03:43:43 UTC

You're comparing RAM amounts to other RAM amounts without considering requirements. 24GB is more than (most) current games would ever require, but is considered a uncomfortably-constrictive minimum for most industrial work.

Traditional CPU-bound physics/simulation models have typically wanted all the RAM they could get; the more RAM the more accurate the model. The same is true for AI models.

I can max out 24GB just using spreadsheets and databases, let alone my 3D work or anything computational.

blackoil

0 replies

2h8m

2024-03-19 16:16:58 UTC

It is targeted to gamers, that professionals are buying. They should be buying A6000 which has 48GB.

monkeynotes

1 replies

4h54m

2024-03-19 13:31:20 UTC

Yeah, this is to be expected with early adoption. This stuff comes out of the lab and it's not perfect. The key thing to evaluate is the trajectory and pace of development. Much of what folks challenged ChatGPT with a year ago is long lost in the dust. Go look at stable diffusion this time last year. Dall-E couldn't do words and hands, it nails that 90% of the time in my experience today.

remotefonts

0 replies

4h42m

2024-03-19 13:43:35 UTC

About words, Dall-e is nor even close to nail it 90% of the time. Not even 50%. Maybe they nerf it when you request a logo from it, but that was my experience in the last few days.

GistNoesis

1 replies

19h42m

2024-03-18 22:43:37 UTC

I managed to get it working with a 4090. You need to adjust the parameter decoding_t of the sample function in simple_video_sample.py to a lower value (decoding_t = 5 works fine for me). I also needed to install imageio==2.19.3 and imageio-ffmpeg

kouteiheika

0 replies

19h34m

2024-03-18 22:51:26 UTC

Ah, yep! You're right! It works now!

whywhywhywhy

0 replies

3h2m

2024-03-19 15:23:09 UTC

Dunno why the defaults for this stuff isn't the base performance, feel I always have to tweak the batch size down on all the base scripts even with 24gb cos everything assumes 48gb

Filligree

29 replies

22h3m

2024-03-18 20:22:39 UTC

If the animations shown are representative, then the mesh output may very well be good enough to use in a 3d printer.

Looking forward to experimenting with this.

jsheard

21 replies

21h25m

2024-03-18 20:59:49 UTC

With previous attempts at this problem the shaded examples could be quite misleading because details that appeared to be geometric were actually just painted over the surface as part of the texture, so when you took that texture away you just had a melted looking blob with nowhere near as much detail as you thought. I'd reserve judgement until we see some unshaded meshes.

What they show in the demo: https://i.imgur.com/9bZNTcd.jpeg

What comes out of the 3D printer: https://i.imgur.com/MZrzsfh.png

SV_BubbleTime

9 replies

20h50m

2024-03-18 21:34:49 UTC

It’s always been this. None of these ever show the untextured model.

When I see a demo where they are showing wireframes I know it’ll be good enough.

jsheard

7 replies

20h30m

2024-03-18 21:55:31 UTC

Seems like a tougher nut to crack than image generation was, since there isn't a bajillion high quality 3D models lying around on the internet to use as training data, everyone is trying to do 3D model generation as a second-order system using images as the training data again. The things that make 3D assets good, the tiny geometric details that are hard to infer without many input views of the same object, the quality of the mesh topology and UV mapping, rigging and skinning for animation, reducing materials down to PBR channels that can be fed into a renderer and so on aren't represented in the input training data, so the model is expected to make far more logical leaps than image generators do.

refulgentis

2 replies

20h27m

2024-03-18 21:58:11 UTC

It almost seems easier, in that you have an arbitrary # of real world objects to scan and the hardware is heavily commoditized (IIRC iPhones have this built in at highres now?)

polygamous_bat

1 replies

19h26m

2024-03-18 22:59:27 UTC

How is building a dataset easier than using a prebuilt dataset?

refulgentis

0 replies

19h9m

2024-03-18 23:16:26 UTC

In context, the conversation was beyond a dichotomy - thankfully. Having only 2 choices leaves conversation at people insisting one is better, and becomes an argument about definitions where people take turns alternating being "right" from the viewpoint of a neutral observer.

It's proposing a solution to the author's observation that everyone is doing it in second order fashion and missing a significant amount of necessary data.

The implication is that rather than doing it the hard way via the already-obtained 2nd order dataset, it'll be easier to get a new dataset, and getting that dataset will be significantly easier that it was to get the second-order dataset, as you don't need to worry about aesthetic variety as much as teaching what level of detail is needed in the mesh for it to be "real"

wincy

1 replies

14h48m

2024-03-19 03:37:00 UTC

I know where I could get several hundred terabytes (maybe an exabyte? It’s constantly growing) of ultra high quality STL files designed for 3D printing. I just don’t have the storage or the knowledge of how to turn those into a model that outputs new STL files.

I’d imagine it’d require a ton of tagging, although I have a good idea of how I could leverage existing APIs to tag it mostly automatically by generating three still image thumbnails of the content, then feeding that through CLIP, and verifying that all two or three agree on what it’s an STL of, and manually tag the ones that fail that test.

supermatt

0 replies

13h25m

2024-03-19 04:59:49 UTC

There’s a pretty big difference between hundreds of terabytes and an exabyte. Maybe you meant petabyte?

derefr

0 replies

17h0m

2024-03-19 01:25:19 UTC

since there isn't a bajillion high quality 3D models lying around on the internet to use as training data

There aren't a bajillion high-quality 3D models of everything, but there are an unbounded number of high-quality 3D models of some things, due to the existence of procedural mesh systems for things like foliage.

You could, at the very least, train an ML model to translate images of jungles into 3D meshes of the trees composing them right now.

Although I wonder if having a few very-well-understood object types like these, to serve as a base, would be enough to allow such a model to deduce more generalized rules of optics, such that it could then be trained on other object categories with much smaller training sets...

clbrmbr

0 replies

8h1m

2024-03-19 10:24:15 UTC

Couldn’t a deep network learn the latent 3D representation just on video input?

bobba27

0 replies

19h27m

2024-03-18 22:57:59 UTC

Yes. But it is still promising. Things are getting incrementally better.

(I dream of the day when this can be used to automatically create paper-craft templates.)

strich

4 replies

17h38m

2024-03-19 00:47:01 UTC

There exists software to reproject texture normals back on to a high poly model. So this problem does have a solution for anyone interested.

jsheard

3 replies

17h34m

2024-03-19 00:51:08 UTC

That's assuming your generator produces a normal map, the ones I've seen do not, the only texture channel they output is color. That being the one channel that a model trained on images is naturally equipped to produce.

pksebben

1 replies

14h57m

2024-03-19 03:28:16 UTC

I may be speaking out of ignorance here, but couldn't you use photogrammetry techniques to translate these to a higher resolution mesh?

zo1

0 replies

8h18m

2024-03-19 10:06:56 UTC

Only if you have multiple images of the same areas so that you can extract actual position. And there is no guarantee that multiple pictures of the same model have the same detail, much less in a manner that can be triangulated with accuracy. A lot of the photogrammetry algorithms discard points that don't match certain error-bars.

So yes, there might be a wooden frame in the middle of that window, but does it match the math on both angles of it? Doubt it.

huytersd

0 replies

14h9m

2024-03-19 04:15:52 UTC

You can generate pretty reliable texture depth maps from just an image. It’s going to be trash if you’re trying to generate the depth for the entire 3D model but I presume it’s going to go a good job with just texture. Then you just use a displacement based on the depth map.

euazOn

2 replies

19h32m

2024-03-18 22:53:32 UTC

Therefore, what is the main usecase of this model? Generating cheap 3D assets for videogames?

jsheard

1 replies

19h15m

2024-03-18 23:10:19 UTC

I don't think they have a specific use-case for this model, they're throwing ideas at the wall again in the hopes some of them stick and eventually turn into another product. The paper doesn't discuss any of the problems that would need to be solved in order to easily generate game-ready assets so I think it's safe to assume that it currently doesn't.

For games at the very least you need to consider polygon budget, getting reasonably good UVs, and generating materials which fit into a PBR shader pipeline, at least if it's going to work with rendering pipelines as we know them today (as opposed to rendering neural representations directly, which is a thing people are trying to do but is totally unproven in production).

pksebben

0 replies

14h54m

2024-03-19 03:30:52 UTC

I'd be willing to bet you could create a diffusion model to map unrefined meshes to UV-fixed and remeshed surfaces. If you had a large enough library of good meshes you just programmatically mess 'em up and use that as the dataset.

Oioioioiio

2 replies

9h5m

2024-03-19 09:20:29 UTC

There are AI models who can create proper meshes though.

dgellow

1 replies

5h13m

2024-03-19 13:11:49 UTC

Which ones?

Oioioioiio

0 replies

3h53m

2024-03-19 14:31:59 UTC

This for example: https://research.nvidia.com/labs/toronto-ai/flexicubes/

neom

6 replies

21h51m

2024-03-18 20:34:29 UTC

I don't know much about 3D printing, would be very interested in learning more about this idea if you'd be so kind as to expand on it. Could I have AI spend all day auto scanning what teens are doing on instagram, auto generate toys based on it, auto generate advertisements for the toys, auto 3D print on demand?

maicro

1 replies

21h32m

2024-03-18 20:53:18 UTC

OP is suggesting that this (AI model? I honestly am behind on the terminology) could replace one of the common steps of 3D printing - specifically, the step where you create a digital representation of the physical object you would want to end up with.

There are other steps to 3D printing in general, though; a super rough outline:

- Model generation

- "Slicing" - processing the 3D model into instructions that the 3D printer can handle, as well as adding any support structures or other modifications to make it printable

- Printing - the actual printing process

- Post-processing - depending on the 3D printing technology used, the desired resulting product, and the specific model/slicing settings, this can be as simple as "remove from bed and use" to "carefully snip off support structures, let cure in a UV chamber for X minutes, sand and fill, then paint"

As I said before, this AI model specifically would cover 3D model generation. If you were to use a printing technology that doesn't require support structures, and handles color directly in the printing process (I think powder bed fusion is the only real option here?), the entire process should be fairly automatable - a human might be needed to remove the part from the printer, but there might not be much post-processing to do.

The rest of your desired workflow is a bit more nebulous - I don't know how you would handle "scanning what teens are doing on instagram", at least in a way that would let you generate toys from the information; generating and posting the advertisement shouldn't be too hard - have a standardish template that you fill in with a render from the model, and the description; printing on demand again is possible, though you'll likely need a human to remove the part, check it for quality and ship it. You could automate the latter, but that would probably be more trouble than it's worth.

neom

0 replies

21h21m

2024-03-18 21:04:01 UTC

Interesting, to be clear I don't think this is a good idea and it's kinda my nightmare post capitalism hell. I just think it's interesting this could be done now.

On finding out what teens want, that part is somewhat easy-ish, I guess you'd need a couple of agents, one that is scanning teen blogs for stories and then converting them to key words, then another agent that takes the key words (#taylorswift #HaileyBieberChiaPudding #latestkdrama etc) into Instagram, after a while your recommend page will turn into a pretty accurate representation of what teens are into, then just have an agent look at those images and generate difs of them. I doubt it would work for a bunch of reasons, but it's an interesting thought experiment! Thanks!

SirSourdough

1 replies

21h37m

2024-03-18 20:47:48 UTC

Hypothetically, sure, assuming the parent comment that these meshes are sufficient for modelling is correct and that you can find any teens who want a non-digital toy.

I think a good hobbyist application for this would be something like modelling figurines for games, which is already a pretty popular 3D printing application. This would allow people with limited modelling skills to bring fantastical, unique characters to life “easily”.

Filligree

0 replies

21h24m

2024-03-18 21:00:54 UTC

Pretty much. We're already generating images of monsters and characters for a D&D campaign; being able to print those in 3D would be pretty amazing.

CobrastanJorji

1 replies

21h37m

2024-03-18 20:48:36 UTC

I think their suggestion was more "I have a photo of a cool horse, and now I would like a 3D model of that same horse."

Another way of looking at it, 3D artists often begin projects by taking reference images of their subject from multiple angles, then very manually turning that into a 3D model. That step could potentially be greatly sped up with an algorithm like this one. The artist could (hopefully) then focus on cleanup, rigging, etc, and have a quality asset in significantly less time.

bobba27

0 replies

19h24m

2024-03-18 23:00:55 UTC

The question is whether this actually "creates a 3d model based on the picture", or if it "finds an existing model that looks similar to the picture and texture map it".

ionwake

9 replies

21h35m

2024-03-18 20:50:22 UTC

Im sorry for dumb lazy question. But would the input require more than one image? Is there a demo url to test this? I think it might jsut be time to buy a 3d printer.

EDIT> Does "single image inputs" mean more than one image?

kylebenzle

5 replies

21h22m

2024-03-18 21:03:34 UTC

Single image means one image.

ionwake

3 replies

21h18m

2024-03-18 21:07:20 UTC

lol cmon guys don't be too hard on me it does say "inputs"

stavros

1 replies

20h33m

2024-03-18 21:52:13 UTC

I do see how "single image inputs" can be conflated with "multiple inputs of a single image each time", as opposed to "video".

ionwake

0 replies

19h59m

2024-03-18 22:26:29 UTC

TBH I always look at the worst case scenario. I was worried it meant it need 3 images inputted as a single image at direct steps of the process, so requiring different angles. I wasn't sure, but thought best to check. I feel like it would have been clearer to have said something like " generates a 3d models from a single image". ( not exact wording but you catch my drift ). Sorry I am over analysing but all feedback is good right?

ganeshkrishnan

0 replies

19h45m

2024-03-18 22:40:14 UTC

Describe in single words only the good things that come into your mind about... your mother.

dartos

0 replies

21h21m

2024-03-18 21:04:05 UTC

Can confirm the word single means 1

simonw

1 replies

21h21m

2024-03-18 21:04:08 UTC

It's just a single image. It guesses the shape of the bits it can't see based on vast amounts of training data.

ionwake

0 replies

21h19m

2024-03-18 21:06:01 UTC

Amazing! Thank you

exodust

0 replies

10h53m

2024-03-19 07:31:54 UTC

I have an even lazier question after failing to speed-read the article.

Does this output an actual 3D mesh? Or does it only output a 3d-looking rendered animation?

ddtaylor

3 replies

21h21m

2024-03-18 21:04:39 UTC

Does anyone know what hardware inference can run on or memory requirements?

kouteiheika

1 replies

19h47m

2024-03-18 22:38:08 UTC

It crashes with an out-of-memory error on my 24GB 4090, so at least when it comes to their sample script the answer is "a lot". Maybe it's just an inefficient implementation though.

dragonwriter

0 replies

18h7m

2024-03-19 00:18:03 UTC

Pretty much every initial Stability release has been inefficient and has resources drop a lot when optimized for real consumer hardware community engines appeared for running the model.

OTOH, with their shift to a less open licensing structure, community tooling probably won’t emerge with the same level of energy.

Mathnerd314

0 replies

20h8m

2024-03-18 22:17:24 UTC

In the repo the model weights file is 9.37GB, whereas sdxl turbo is 13.9GB, and I don't see any mention of huge context windows, so probably it just needs a decent graphics card.

airstrike

3 replies

21h32m

2024-03-18 20:53:29 UTC

that demo animation is so clever and satisfying

amelius

1 replies

21h11m

2024-03-18 21:14:23 UTC

But it doesn't look very realistic, tbh.

dreadlordbone

0 replies

20h21m

2024-03-18 22:04:41 UTC

it doesn't break Euclidian space at least

itsgrimetime

0 replies

18h21m

2024-03-19 00:04:14 UTC

I can’t get them to play

leesec

2 replies

15h34m

2024-03-19 02:51:23 UTC

I wonder when Emad will be outed as a fed or a fraud. He's certainly leaving a trail of nasty behavior in the industry.

preommr

1 replies

15h31m

2024-03-19 02:54:04 UTC

elaborate?

esafak

0 replies

14h49m

2024-03-19 03:36:14 UTC

https://en.wikipedia.org/wiki/Emad_Mostaque#Controversy_and_...

issung

2 replies

18h3m

2024-03-19 00:22:09 UTC

Stable Video 3D (SV3D) is a generative model based on Stable Video Diffusion that takes in a still image of an object as a conditioning frame, and generates an orbital video of that object.

So can it actually output a 3d model? Or just images of what it thinks the object would look like from other angles?

krebby

0 replies

17h53m

2024-03-19 00:32:34 UTC

The reference video (https://youtu.be/Zqw4-1LcfWg) says they use a NeRF / structure from motion and then create a mesh with marching cubes from the generated radiance field. This is how most soa text-to-object generators work now as well

2StepsOutOfLine

0 replies

16h0m

2024-03-19 02:25:29 UTC

I'm also struggling to find any examples of how to actually get a 3D model output. Very few references to this capability outside of the blog post.

dubin

2 replies

17h13m

2024-03-19 01:12:29 UTC

I'd like to play around with something like this, but from my understanding my machine (Macbook, 2021 M1) isn't nearly powerful enough (right?). Are there remote/cloud environments where I can run models like this?

ilaksh

1 replies

15h41m

2024-03-19 02:44:11 UTC

I suggest just using Stability's API. You aren't allowed to use it locally for commercial use anyway.

You could set something up on RunPod or AWS, but I doubt it's worth the effort.

dubin

0 replies

3h27m

2024-03-19 14:57:59 UTC

Awesome, thank you!

It does look like SV3D is not a part of the API currently, but only a matter of time I imagine.

throwaway743

1 replies

12h52m

2024-03-19 05:32:56 UTC

Anyone know of anything that'll auto rig/add weights?

ImHereToVote

0 replies

8h32m

2024-03-19 09:52:56 UTC

There are numerous tools that auto-rig humanoid figures. The obvious one: https://www.mixamo.com/#/

thrdbndndn

1 replies

16h27m

2024-03-19 01:58:37 UTC

The emphasis here is Single Image, but can this model generate with multiple images too?

We know that a single image of an object physically can't cover all the sides of it, so it's all guesswork in AI. This is totally fine for certain scenario, but in lots of other cases, it's trivial to have multiple images of the same object, and if that offers higher fidelity, it's totally worth it.

I'm aware there are many algorithms or AI models that already do that. I'm asking about Stability's one specifically because if they have impressive Single Image result, surely their multi-image results would also be much better than state-of-the-art?

pksebben

0 replies

15h1m

2024-03-19 03:23:50 UTC

If it's not there yet, I'm willing to bet it will be soon enough given folks hacking it apart and injecting their own solutions.

londons_explore

1 replies

19h11m

2024-03-18 23:14:06 UTC

All the examples resemble plastic children's toys...

How would it handle other objects? (People, fabrics, buildings, plants, mountains, mechanical parts, etc)

programjames

0 replies

19h1m

2024-03-18 23:24:14 UTC

It's hard to get camera position tracking for random objects, so it looks like they used simulations. There's probably a lot more plastic children's toy models in Blender than people, fabrics, buildings, &c.

canadiantim

1 replies

20h23m

2024-03-18 22:01:53 UTC

I can't wait until we can use something like this for architectural design

whywhywhywhy

0 replies

3h1m

2024-03-19 15:24:11 UTC

SDXL+Controlnet and then feeding it just blocked out depth maps are probably more useful for that.

dheera

0 replies

15h21m

2024-03-19 03:04:03 UTC

They compare against Zero123-XL, but they should compare against MVDream instead. MVDream is quite good. If you fiddle with the loss you can get even better results.

abdellah123

0 replies

14h50m

2024-03-19 03:35:44 UTC

Did you write the blog post using AI ?