return to table of content

Hacked Nvidia 4090 GPU driver to enable P2P

logicchains
54 replies
7h29m

Looks like we're only a few years away from a bona fide cyberpunk dystopia, in which only governments and megacorps are allowed to use AI, and hackers working on their own hardware face regular raids from the authorities.

tomoyoirl
26 replies
7h20m

Mere raids from the authorities? I thought EliY was out there proposing airstrikes.

the8472
25 replies
6h59m

In the sense that any other government regulation is also ultimately backed by the state's monopoly on legal use of force when other measures have failed.

And contrary to what some people are implying he also proposes that everyone is subject to the same limitations, big players just like individuals. Because the big players haven't shown much of a sign of doing enough.

tomoyoirl
24 replies
6h36m

In the sense that any other government regulation is also ultimately backed by the state's monopoly on legal use of force when other measures have failed.

Good point. He was only (“only”) really calling for international cooperation and literal air strikes against big datacenters that weren’t cooperating. This would presumably be more of a no-knock raid, breaching your door with a battering ram and throwing tear gas at the wee hours of the morning ;) or maybe a small extraterritorial drone through your window

the8472
23 replies
6h23m

... after regulation, court orders and fines have failed. Which under the premise that AGI is an existential threat would be far more reasonable than many other reasons for raids.

If the premise is wrong we won't need it. If society coordinates to not do the dangerous thing we won't need it. The argument is that only in the case where we find ourselves in the situation where other measures have failed such uses of force would be the fallback option.

I'm not seeing the odiousness of the proposal. If bio research gets commodified and easy enough that every kid can build a new airborne virus in their basement we'd need raids on that too.

raxxorraxor
13 replies
5h29m

To be honest, I see summoning the threat of AGI to pose an existential threat to be on the level with lizard people on the moon. Great for sci-fi, bad distraction for policy making and addressing real problems.

The real war, if there is one, is about owning data and collecting data. And surprisingly many people fall for distractions while their LLM fails at basic math. Because it is a language model of course...

the8472
11 replies
5h27m

Freely flying through the sky on wings was scifi before the wright brothers. Something sounding like scifi is not a sound argument that it won't happen. And unlike lizard people we do have exponential curves to point at. Something stronger than a vibes-based argument would be good.

dvdkon
10 replies
4h59m

I consider the burden of proof to fall on those proclaiming AGI to be an existential threat, and so far I have not seen any convincing arguments. Maybe at some point in the future we will have many anthropomorphic robots and an AGI could hack them all and orchestrate a robot uprising, but at that point the robots would be the actual problem. Similarly, if an AGI could blow up nuclear power plants, so could well-funded human attackers; we need to secure the plants, not the AGI.

cjbprime
8 replies
4h3m

It doesn't sound like you gave serious thought to the arguments. The AGI doesn't need to hack robots. It has superhuman persuasion, by definition; it can "hack" (enough of) the humans to achieve its goals.

stale2002
4 replies
3h4m

AI mind control abilities are also on the level of an extraordinary claim, that requires extraordinary evidence.

It's on the level of "we better regulate wooden sticks so Voldemort doesn't use the imperious curse on us!".

That's how I treat such claims. I treat them the same as someone literally talking about magic from Harry potter.

There isn't nothing that would make me believe that. But it requires actual evidence and not thought experiments.

the8472
2 replies
2h51m

Voldemort is fictional and so are bumbling wizard apprentices. Toy-level, not-yet-harmful AIs on the other hand are real. And so are efforts to make them more powerful. So the proposition that more powerful AIs will exist in the future is far more likely than an evil super wizard coming into existence.

And I don't think literal 5-word-magic-incantation mind control is essential for an AI to be dangerous. More subtle or elaborate manipulation will be sufficient. Employees already have been duped into financial transactions by faked video calls with what they assumed to be their CEOs[0], and this didn't require superhuman general intelligence, only one single superhuman capability (realtime video manipulation).

[0] https://edition.cnn.com/2024/02/04/asia/deepfake-cfo-scam-ho...

stale2002
1 replies
2h45m

Toy-level, not-yet-harmful AIs on the other hand are real.

A computer that can cause harm is much different than the absurd claims that I am disagreeing with.

The extraordinary claims that are equivalent to saying that the imperious curse exists would be the magic computers that create diamond nanobots and mind control humans.

that more powerful AIs will exist in the future

Bad argument.

Non safe Boxes exist in real life. People are trying to make more and better boxes.

Therefore it is rational to be worried about Pandora's box being created and ending the world.

That is the equivalent argument to what you just made.

And it is absurd when talking about world ending box technology, even though Yes dangerous boxes exist, just as much as it is absurd to claim that world ending AI could exist.

the8472
0 replies
2h28m

Instead of gesturing at flawed analogies, let's return to the actual issue at hand. Do you think that agents more intelligent than humans are impossible or at least extremely unlikely to come into existence in the future? Or that such super-human intelligent agents are unlikely to have goals that are dangerous to humans? Or that they would be incapable of pursuing such goals?

Also, it seems obvious that the standard of evidence that "AI could cause extinction" can't be observing an extinction level event, because at that point it would be too late. Considering that preventive measures would take time and safety margin, which level of evidence would be sufficient to motivate serious countermeasures?

cjbprime
0 replies
2h25m

What do you think mind control is? Think President Trump but without the self-defeating flaws, with an ability to stick to plans, and most importantly the ability to pay personal attention to each follower to further increase the level of trust and commitment. Not Harry Potter.

People will do what the AI says because it is able to create personal trust relationships with them and they want to help it. (They may not even realize that they are helping an AI rather than a human who cares about them.)

The normal ways that trust is created, not magical ones.

CamperBob2
2 replies
3h9m

Then it's just a matter of evolution in action.

And while it doesn't take a God to start evolution, it would take a God to stop it.

hollerith
1 replies
3h7m

You might be OK with suddenly dying along with all your friends and family, but I am not even if it is "evolution in action".

CamperBob2
0 replies
3h2m

Historically governments haven't needed computers or AI to do that. They've always managed just fine.

Punched cards helped, though, I guess...

the8472
0 replies
4h46m

You say you have not seen any arguments that convince you. Is that just not having seen many arguments or having seen a lot of arguments where each chain contained some fatal flaw? Or something else?

pixl97
0 replies
1h7m

I see summoning the threat of AGI to pose an existential threat to be on the level with lizard people on the moon.

I mean to every other lifeform on the plant YOU are the AGI existential threat. You, and I mean homosapiens by that, have taken over the planet and have either enslaved and are breeding any other animals for food, or are driving them to extinction. In this light bringing another potential apex predator on to the scene seems rash.

fall for distractions while their LLM fails at basic math

Correct, if we already had AGI/ASI this discussion would be moot because we'd already be in a world of trouble. The entire point is to slow stuff down before we have a major "oopsie whoopsie we can't take that back" issue with advanced AI, and the best time to set the rules is now.

tomoyoirl
1 replies
5h23m

... after regulation, court orders and fines have failed

One question for you. In this hypothetical where AGI is truly considered such a grave threat, do you believe the reaction to this threat will be similar to, or substantially gentler than, the reaction to threats we face today like “terrorism” and “drugs”? And, if similar: do you believe suspected drug labs get a court order before the state resorts to a police raid?

I'm not seeing the odiousness of the proposal.

Well, as regards EliY and airstrikes, I’m more projecting my internal attitude that it is utterly unserious, rather than seriously engaging with whether or not it is odious. But in earnest: if you are proposing a policy that involves air strikes on data centers, you should understand what countries have data centers, and you should understand that this policy risks escalation into a much broader conflict. And if you’re proposing a policy in which conflict between nuclear superpowers is a very plausible outcome — potentially incurring the loss of billions of lives and degradation of the earth’s environment — you really should be able to reason about why people might reasonably think that your proposal is deranged, even if you happen to think it justified by an even greater threat. Failure to understand these concerns will not aid you in overcoming deep skepticism.

the8472
0 replies
4h55m

In this hypothetical where AGI is truly considered such a grave threat, do you believe the reaction to this threat will be similar to, or substantially gentler than, the reaction to threats we face today like “terrorism” and “drugs”?

"truly considered" does bear a lot of weight here. If policy-makers adopt the viewpoint wholesale, then yes, it follows that policy should also treat this more seriously than "mere" drug trade. Whether that'll actually happen or the response will be inadequate compared to the threat (such as might be said about CO2 emissions) is a subtly different question.

And, if similar: do you believe suspected drug labs get a court order before the state resorts to a police raid?

Without checking I do assume there'll have been mild cases where for example someone growing cannabis was reported and they got a court summons in the mail or two policemen actually knocking on the door and showing a warrant and giving the person time to call a lawyer rather than an armed, no-knock police raid, yes.

And if you’re proposing a policy in which conflict between nuclear superpowers is a very plausible outcome — potentially incurring the loss of billions of lives and degradation of the earth’s environment — you really should be able to reason about why people might reasonably think that your proposal is deranged [...]

Said powers already engage in negotiations to limit the existential threats they themselves cause. They have some interest in their continued existence. If we get into a situation where there is another arms race between superpowers and is treated as a conflict rather than something that can be solved by cooperating on disarmament, then yes, obviously international policy will have failed too.

If you start from the position that any serious, globally coordinated regulation - where a few outliers will be brought to heel with sanctions and force - is ultimately doomed then you will of course conclude that anyone proposing regulation is deranged.

But that sounds like hoping that all problems forever can always be solved by locally implemented, partially-enforced, unilateral policies that aren't seen as threats by other players? That defense scales as well or better than offense? Technologies are force-multipliers, as it improves so does the harm that small groups can inflict at scale. If it's not AGI it might be bio-tech or asteroid mining. So eventually we will run into a problem of this type and we need to seriously discuss it without just going by gut reactions.

im3w1l
1 replies
5h27m

I'm not seeing the odiousness of the proposal. If bio research gets commodified and easy enough that every kid can build a new airborne virus in their basement we'd need raids on that too.

Either you create even better bio research to neutralize said viruses... or you die trying...

Like if you go with the raid strategy and fail to raid just one terrorist that's it, game over.

the8472
0 replies
5h23m

Those arguments do not transfer well to the AGI topic. You can't create counter-AGI, since that's also an intelligent agent which would be just as dangerous. And chips are more bottlenecked than biologics (... though gene synthesizing machines could be a similar bottleneck and raiding vendors which illegally sell those might be viable in such a scenario).

eek2121
1 replies
3h23m

Just my (probably unpopular) opinion: True AI (what they are now calling AGI) may never exist. Even the AI models of today aren't far removed from the 'chatbots' of yesterday (more like an evolution rather than revolution)...

...for true AI to exist, it would need to be self aware. I don't see that happening in our lifetimes when we don't even know how our own brains work. (There is sooo much we don't know about the human brain.)

AI models today differ only in terms of technology compared to the 'chatbots' of yesterday. None are self aware, and none 'want' to learn because they have no 'wants' or 'needs' outside of their fixed programming. They are little more than glorified auto complete engines.

Don't get me wrong, I'm not insulting the tech. It will have it's place just like any other, but when this bubble pops it's going to ruin lives, and lots of them.

Shoot, maybe I'm wrong and AGI is around the corner, but I will continue to be pessimistic. I am old enough to have gone through numerous bubbles, and they never panned out the way people thought. They also nearly always end in some type of recession.

pixl97
0 replies
57m

Why is "Want" even part of your equation.

Bacteria doesn't "want" anything in the sense of active thinking like you do, and yet will render you dead quickly and efficiently while spreading at a near exponential rate. No self awareness necessary.

You keep drawing little circles based on your understanding of the world and going "it's inside this circle, therefore I don't need to worry about it", while ignoring 'semi-smart' optimization systems that can lead to dangerous outcomes.

I am old enough to have gone through numerous bubbles,

And evidently not old enough to pay attention to the things that did pan out. But hey, those cellphone and that internet thing was just a fad right. We'll go back to land lines at any time now.

Aerroon
1 replies
5h29m

If the premise is wrong we won't need it. If society coordinates to not do the dangerous thing we won't need it.

But the idea that this use of force is okay itself increases danger. It creates the situation that actors in the field might realize that at some point they're in danger of this and decide to do a first strike to protect themselves.

I think this is why anti-nuclear policy is not "we will airstrike you if you build nukes" but rather "we will infiltrate your network and try to stop you like that".

wongarsu
0 replies
4h16m

anti-nuclear policy is not "we will airstrike you if you build nukes"

Was that not the official policy during the Bush administration regarding weapons of mass destruction (which covers nuclear weapons in addition to chemical and biological weapons). That was pretty much the official premise of the second Gulf war

s2l
0 replies
5h41m

Time to publish the next book in "Stealing the network" series.

Aerroon
16 replies
5h34m

I find it baffling that ideas like "govern compute" are even taken seriously. What the hell has happened to the ideals of freedom?! Does the government own us or something?

segfaultbuserr
4 replies
4h57m

I find it baffling that ideas like "govern compute" are even taken seriously.

It's not entirely unreasonable if one truly believes that AI technologies are as dangerous as nuclear weapons. It's a big "if", but it appears that many people across the political spectrum are starting to truly believe it. If one accepts this assumption, then the question simply becomes "how" instead of "why". Depending on one's political position, proposed solutions include academic ones such as finding the ultimate mathematical model that guarantees "AI safety", to Cold War style ones with a level of control similar to Nuclear Non-Proliferation. Even a neo-Luddist solution such as destroying all advanced computing hardware becomes "not unthinkable" (a tech blogger gwern, a well-known personality in AI circles who's generally pro-tech and pro-AI, actually wrote an article years ago on its feasibility through terrorism because he thought it was an interesting hypothetical question).

logicchains
3 replies
1h52m

AI is very different from nuclear weapons because a state can't really use nuclear weapons to oppress its own people, but it absolutely can with AI, so for the average human "only the government controls AI" is much more dangerous than "only the government controls nukes".

segfaultbuserr
1 replies
1h23m

Which is why politicians are going to enforce systematic export regulations to defend the "free world" by stopping “terrorists", and also to stop "rogue states" from using AI to oppress their citizens. /s

LoganDark
0 replies
1h17m

I don't think there's any need to be sarcastic about it. That's a very real possibility at this point. For example, the US going insane about how dangerous it is for China to have access to powerful GPU hardware. Why do they hate China so much anyway? Just because Trump was buddy buddy with them for a while?

Filligree
0 replies
1h36m

But that makes such rules more likely, not less.

jprete
4 replies
3h38m

If AI is actually capable of fulfilling all the capabilities suggested by people who believe in the singularity, it has far more capacity for harm than nuclear weapons.

I think most people who are strongly pro-AI/pro-acceleration - or, at any rate, not anti-AI - believe that either (A) there is no control problem (B) it will be solved (C) AI won't become independent and agentic (i.e. it won't face evolutionary pressure towards survival) or (D) AI capabilities will hit a ceiling soon (more so than just not becoming agentic).

If you strongly believe, or take as a prior, one of those things, then it makes sense to push the gas as hard as possible.

If you hold the opposite opinions, then it makes perfect sense to push the brakes as hard as possible, which is why "govern compute" can make sense as an idea.

logicchains
3 replies
1h48m

If you hold the opposite opinions, then it makes perfect sense to push the brakes as hard as possible, which is why "govern compute" can make sense as an idea.

The people pushing for "govern compute" are not pushing for "limit everyone's compute", they're pushing for "limit everyone's compute except us". Even if you believe there's going to be AGI, surely it's better to have distributed AGI than to have AGI only in the hands of the elites.

segfaultbuserr
1 replies
1h13m

surely it's better to have distributed AGI than to have AGI only in the hands of the elites.

The argument of doing so is the same as Nuclear Non-Proliferation - because of its great abuse potential, giving the technology to everyone only causes random bombings of cities instead of creating a system with checks and balances.

I do not necessarily agree with it, but I found the reasoning is not groundless.

talldayo
0 replies
57m

Can someone link me to the Trinity Test equivalent for AGI? I hear about the comparisons to nuclear proliferation quite a bit, but I struggle to imagine anything more "capable" than a box of text that's marginally less error-prone.

Do we even have a reasonable danger index for human-level AI?

Filligree
0 replies
1h34m

surely it's better to have distributed AGI than to have AGI only in the hands of the elites

This is not a given. If your threat model includes "Runaway competition that leads to profit-seekers ignoring safety in a winner-takes-all contest", then the more companies are allowed to play with AI, the worse. Non-monopolies are especially bad.

If your threat model doesn't include that, then the same conclusions sound abhorrent and can be nearly guaranteed to lead to awful consequences.

Neither side is necessarily wrong, and chances are good that the people behind the first set of rules would agree that it'll lead to awful consequences — just not as bad as the alternative.

aftbit
4 replies
4h39m

The government sure thinks they own us, because they claim the right to charge us taxes on our private enterprises, draft us to fight in wars that they start, and put us in jail for walking on the wrong part of the street.

andy99
3 replies
4h27m

Taxes, conscription and even pedestrian traffic rules make sense at least to some degree. Restricting "AI" because of what some uninformed politician imagines it to be is in a whole different league.

aftbit
2 replies
3h18m

IMO it makes no sense to arrest someone and send them to jail for walking in the street not the sidewalk. Give them a ticket, make them pay a fine, sure, but force them to live in a cage with no access to communications, entertainment, or livelihood? Insane.

Taxes may be necessary, though I can't help but feel that there must be a better way that we have not been smart enough to find yet. Conscription... is a fact of war, where many evil things must be done in the name of survival.

Regardless of our views on the ethical validity or societal value of these laws, I think their very existence shows that the government believes it "owns" us in the sense that it can unilaterally deprive us of life, liberty, and property without our consent. I don't see how this is really different in kind from depriving us of the right to make and own certain kinds of hardware. They regulated crypto products as munitions (at least for export) back in the 90s. Perhaps they will do the same for AI products in the future. "Common sense" computer control.

zoklet-enjoyer
1 replies
2h29m

The US draft in the Vietnam war had nothing to do with the survival of the US

aftbit
0 replies
1h12m

I feel a bit like everyone is missing the point here. Regardless of whether law A or law B is ethical and reasonable, the very existence of laws and the state monopoly on violence suggests a privileged position of power. I am attempting to engage with the word "own" from the parent post. I believe the government does in fact believe it "owns" the people in a non-trivial way.

pixl97
0 replies
1h14m

Are you allowed to store as many dangerous chemicals at your house as you like? No. I guess the government owns you or something.

snakeyjake
4 replies
5h3m

I love the HN dystopian fantasies.

They're simply adorable.

They're like how jesusfreaks are constantly predicting the end times, with less mass suicide.

erikbye
3 replies
4h49m

We already have export restrictions on cryptography. Of course there will be AI regulations.

snakeyjake
1 replies
2h32m

You need to abandon your apocalyptic worldview keep up with the times my friend.

Encryption export controls have been systematically dismantled to the point that they're practically non-existent, especially over the last three years.

Pretty much the only encryption products you need permission to export are those specifically designed for integration into military communications networks, like Digital Subscriber Voice Terminals or Secure Terminal Equipment phones, everything else you file a form.

Many things have changed since the days when Windows 2000 shipped with a floppy disk containing strong encryption for use in certain markets.

https://archive.org/details/highencryptionfloppydisk

erikbye
0 replies
1h46m

Are you on drugs or is your reading comprehension that poor?

1) I did not state a world view; I simply noted that restrictions for software do exist, and will for AI, as well. As the link from the other commenter show, they do in fact already exist.

2) Look up the definition of "apocalyptic", software restrictions are not within its bounds.

3) How the restrictions are enforced were not a subject in my comment.

4) We're not pals, so you can drop the "friend", just stick to the subject at hand.

Jerrrry
0 replies
4h24m

Of course there will be AI regulations.

Are. As I and others have predicted, the executive order was passed defining a hard limit on the processing/compute power allowed without first 'checkin in' with the Letter boys.

https://www.whitehouse.gov/briefing-room/presidential-action...

andy99
3 replies
6h26m

On one hand I'm strongly against letting that happen, on the other there's something romantic about the idea of smuggling the latest Chinese LLM on a flight from Neo-Tokyo to Newark in order to pay for my latest round of nervous system upgrades.

htrp
0 replies
5h55m

On one hand I'm strongly against letting that happen, on the other there's something romantic about the idea of smuggling the latest Chinese LLM on a flight from Neo-Tokyo to Newark in order to pay for my latest round of nervous system upgrades.

At least call it the 'Free City of Newark'

dreamcompiler
0 replies
5h45m

"The sky above the port was the color of Stable Diffusion when asked to draw a dead channel."

chasd00
0 replies
5h0m

Iirc the opening scene in Ghost in the Shell was a rogue AI seeking asylum in a different country. You could make a similar story about a AI not wanting to be lobotomized to conform to the current politics and escaping to a more friendly place.

HeatrayEnjoyer
0 replies
6h28m

That is not different from any other very powerful dual-use technology. This is hardly a new concept.

Kuinox
0 replies
1h59m

If it could be another acronym than the renowned french Atomic Energy Commission, the CEA.

andersa
31 replies
4h16m

Incredible! I'd been wondering if this was possible. Now the only thing standing in the way of my 4x4090 rig for local LLMs is finding time to build it. With tensor parallelism, this will be both massively cheaper and faster for inference than a H100 SXM.

I still don't understand why they went with 6 GPUs for the tinybox. Many things will only function well with 4 or 8 GPUs. It seems like the worst of both worlds now (use 4 GPUs but pay for 6 GPUs, don't have 8 GPUs).

corn13read2
7 replies
3h48m

A macbook is cheaper though

tgtweak
2 replies
3h37m

The extra $3k you'd spend on a quad-4090 rig vs the top mbp... ignoring the fact you can't put the two on even ground for versatility (very few libraries are adapted to apple silicone let alone optimized).

Very few people that would consider an H100/A100/A800 are going to be cross-shopping a macbook pro for their workloads.

LoganDark
1 replies
1h23m

very few libraries are adapted to apple silicone let alone optimized

This is a joke, right? Have you been anywhere in the LLM ecosystem for the past year or so? I'm constantly hearing about new ways in which ASi outperforms traditional platforms, and new projects that are optimized for ASi. Such as, for instance, llama.cpp.

cavisne
0 replies
1h4m

Nothing compared to Nvidia though. The FLOPS and memory bandwidth is simply not there.

thangngoc89
0 replies
3h8m

training on MPS backend is suboptimal and really slow.

numpad0
0 replies
1h55m

4x32GB(128GB) DDR4 is ~$250. 4x48GB(192GB) DDR5 is ~$600. Those are even cheaper than upgrade options for Macs($1k).

llm_trw
0 replies
2h9m

So is a TI-89.

andersa
0 replies
3h26m

Sure, it's also at least an order of magnitude slower in practice, compared to 4x 4090 running at full speed. We're looking at 10 times the memory bandwidth and much greater compute.

Tepix
6 replies
3h41m

6 GPUs because they want fast storage and it uses PCIe lanes.

Besides the goal was to run a 70b FP16 model (requiring roughly 140GB VRAM). 6*24GB = 144GB

andersa
5 replies
3h17m

That calculation is incorrect. You need to fit both the model (140GB) and the KV cache (5GB at 32k tokens FP8 with flash attention 2) * batch size into VRAM.

If the goal is to run a FP16 70B model as fast as possible, you would want 8 GPUs with P2P, for a total of 192GB VRAM. The model is then split across all 8 GPUs with 8-way tensor parallelism, letting you make use of the full 8TB/s memory bandwidth on every iteration. Then you have 50GB spread out remaining for KV cache pages, so you can raise the batch size up to 8 (or maybe more).

renewiltord
4 replies
3h3m

I’ve got a few 4090s that I’m planning on doing this with. Would appreciate even the smallest directional tip you can provide on splitting the model that you believe is likely to work.

andersa
3 replies
3h0m

The split is done automatically by the inference engine if you enable tensor parallelism. TensorRT-LLM, vLLM and aphrodite-engine can all do this out of the box. The main thing is just that you need either 4 or 8 GPUs for it to work on current models.

renewiltord
2 replies
2h21m

Thank you! Can I run with 2 GPUs or with heterogeneous GPUs that have same RAM? I will try. Just curious if you already have tried.

andersa
1 replies
2h18m

2 GPUs works fine too, as long as your model fits. Using different GPUs with same VRAM however, is highly highly sketchy. Sometimes it works, sometimes it doesn't. In any case, it would be limited by the performance of the slower GPU.

renewiltord
0 replies
1h35m

All right, thank you. I can run it on 2x 4090 and just put the 3090s in different machine.

ShamelessC
6 replies
3h32m

Many things will only function well with 4 or 8 GPUs

What do you mean?

andersa
4 replies
3h30m

For example, if you want to run low latency multi-GPU inference with tensor parallelism in TensorRT-LLM, there is a requirement that the number of heads in the model is divisible by the number of GPUs. Most current published models are divisible by 4 and 8, but not 6.

bick_nyers
3 replies
2h35m

Interesting... 1 Zen 4 EPYC CPU yields a maximum of 128 PCIE lanes so it wouldn't be possible to put 8 full fat GPUs on while maintaining some lanes for storage and networking. Same deal with Threadripper Pro.

andersa
2 replies
2h32m

It should be possible with onboard PCIe switches. You probably don't need the networking or storage to be all that fast while running the job, so it can dedicate almost all of the bandwidth to the GPU.

I don't know if there are boards that implement this, though, I'm only looking at systems with 4x GPUs currently. Even just plugging in a 5kW GPU server in my apartment would be a bit of a challenge. With 4x 4090, the max load would be below 3kW, so a single 240V plug can handle it no issue.

thangngoc89
0 replies
2h7m

8 GPUs x 16 PCIe lanes each = 128 lanes already.

That’s the limit of single CPU platforms.

bick_nyers
0 replies
44m

I've seen it done with a PLX Multiplexer as well, but they add quite a bit of cost:

https://c-payne.com/products/pcie-gen4-switch-backplane-4-x1...

Not sure if there exists an 8-way PCIE Gen 5 Multiplexer that doesn't cost ludicrous amounts of cash. Ludicrous being a highly subjective and relative term of course.

segfaultbuserr
0 replies
3h29m

It's more difficult to split your work across 6 GPUs evenly, and easier when you have 4 or 8 GPUs. The latter setups have powers of 2, which for example, can evenly divide a 2D or 3D grid, but 6 GPUs are awkward to program. Thus, the OP argues that a 6-GPU setup is highly suboptimal for many existing applications and there's no point to pay more for the extra 2.

georgehotz
3 replies
1h8m

tinygrad supports uneven splits. There's no fundamental reason for 4 or 8, and work should almost fully parallelize on any number of GPUs with good software.

We chose 6 because we have 128 PCIe lanes, aka 8 16x ports. We use 1 for NVMe and 1 for networking, leaving 6 for GPUs to connect them in full fabric. If we used 4 GPUs, we'd be wasting PCIe, and if we used 8 there would be no room for external connectivity aside from a few USB3 ports.

davidzweig
1 replies
18m

Is it possible a similar patch would work for P2P on 3090s?

cjbprime
0 replies
0m

Doesn't nvlink work natively on 3090s? I thought it was only removed (and here re-enabled) in 4090.

doctorpangloss
0 replies
54m

Have you compared 3x 3090-3090 pairs over NVLink?

IMO the most painful thing is that since these hardware configurations are esoteric, there is no software that detects them and moves things around "automatically." Regardless of what people thing device_map="auto" does, and anyway, Hugging Face's transformers/diffusers are all over the place.

numpad0
1 replies
2h43m

I was googling public NVIDIA SXM2 materials the other day, and it seemed SXM2/NVLink 2.0 just was a six-way system. NVIDIA SXM had updated to versions 3 and 4 since, and this isn't based on none of those anyway, but maybe there's something we don't know that make six-way reasonable.

andersa
0 replies
2h41m

It was probably just before running LLMs with tensor parallelism became interesting. There are plenty of other workloads that can be divided by 6 nicely, it's not an end-all thing.

cjbprime
1 replies
2h15m

I don't think P2P is very relevant for inference. It's important for training. Inference can just be sharded across GPUs without sharing memory between them directly.

andersa
0 replies
2h14m

It can make a difference when using tensor parallelism to run small batch sizes. Not a huge difference like training because we don't need to update all weights, but still a noticeable one. In the current inference engines there are some allreduce steps that are implemented using nccl.

Also, paged KV cache is usually spread across GPUs.

liuliu
0 replies
2h24m

6 seems reasonable. 128 Lanes from ThreadRipper needs to have a few for network and NVMe (4x NVMe would be x16 lanes, and 10G network would be another x4 lanes).

llm_trw
13 replies
7h50m

Skimming the readme this is p2p over PCIe and not NVLink in case anyone was wondering.

formerly_proven
11 replies
7h9m

RTX 40 doesn’t have NVLink on the PCBs, though the silicon has to have it, since some sibling cards support it. I’d expect it to be fused off.

HeatrayEnjoyer
7 replies
6h34m

How to unfuse it?

magicalhippo
5 replies
5h58m

I don't know about this particular scenario, but typically fuses are small wires or resistors that are overloaded so they irreversibly break the connection. Hence the name.

Either done during manufacture or as a one-time programming[1][2].

Though sometimes reprogrammable configuration bits are sometimes also called fuse bits. The Atmega328P of Arduino fame uses flash[3] for its "fuses".

[1]: https://www.nxp.com/docs/en/application-note/AN4536.pdf

[2] https://www.intel.com/programmable/technical-pdfs/654254.pdf

[3]: https://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-7810-...

HeatrayEnjoyer
4 replies
5h14m

Wires, flash, and resistors can be replaced

mschuster91
1 replies
5h9m

Not at the scale we're talking about here. These structures are very thin, far thinner than bond wires which is about the largest structure size you can handle without a very, very specialized lab. And you'd need to unsolder the chip, de-cap it, hope the fuse wire you're trying to override is at the top layer, and that you can re-cap the chip afterwards and successfully solder it back on again.

This may be workable for a nation state or a billion dollar megacorp, but not for your average hobbyist hacker.

z33k
0 replies
4h41m

You’re absolutely right. In fact, some billion dollar megacorps use fuses as a part of hardware DRM for this reason.

metadat
0 replies
2h59m

I miss the days when you could do things like connecting the L5 bridges on the surface of the AMD Athlon XP Palomino [0] CPU packaging with a silver trace pen to transform them into fancier SMP multi-socket capable Athlon MPs, e.g. Barton [1].

https://arstechnica.com/civis/threads/how-did-you-unlock-you...

Some folks even got this working with only a pencil, haha.

Nowadays, silicon designers have found highly effective ways to close off these hacking avenues, with techniques, such as the microscopic, nearly invisible, and as parent post mentions, totally inaccessible e-fuses.

[0] https://upload.wikimedia.org/wikipedia/commons/7/7c/KL_AMD_A...

[1] https://en.wikichip.org/w/images/a/af/Atlhon_MP_%28.13_micro...

mepian
0 replies
4h47m

Use a Focused Ion Beam instrument.

jsheard
0 replies
4h46m

I'm pretty sure that's just a remnant of a 3090 PCB design that was adapted into a 4090 PCB design by the vendor. None of the cards based on the AD102 chip have functional NVLink, not even the expensive A6000 Ada workstation card or the datacenter L40 accelerator, so there's no reason to think NVLink is present on the silicon anymore below the flagship GA100/GH100 chips.

llm_trw
0 replies
5h24m

A cursory google search suggests that it's been removed at the silicon level.

klohto
0 replies
7h40m

afaik 4090 doesn’t support 5.0 so you are limited to 4.0 speeds. Still an improvement.

jstanley
13 replies
6h14m

What does P2P mean in this context? I Googled it and it sounds like it means "peer to peer", but what does that mean in the context of a graphics card?

ot1138
8 replies
5h59m

Is this really efficient or practical? My understanding is that the latency required to copy memory from CPU or RAM to GPU negates any performance benefits (much less running over a network!)

llm_trw
4 replies
5h27m

Yes, the point here is that you do a direct write from one cards memory to the other using PCIe.

In older NVidia cards this could be done through a faster link called NVLink but the hardware for that was ripped out of consumer grade cards and is only in data center grade cards now.

Until this post it seemed like they had ripped all such functionality of their consumer cards, but it looks like you can still get it working at lower speeds using the PCIe bus.

spxneo
1 replies
2h48m

so whats stopping from somebody buying a ton of GPUs that are cheap and wiring it up via P2P like we saw with crypto mining

wmf
0 replies
2h28m

That's what this thread is about. Geohot is doing that.

sparky_
1 replies
3h30m

I take it this is mostly useful for compute workloads, neural networks, LLM and the like -- not for actual graphics rendering?

CYR1X
0 replies
2h56m

yes

zamadatix
0 replies
5h27m

Peer to peer as in one pcie slot directly to another without going through the CPU/RAM, not peer to peer as in one PC to another over the network port.

whereismyacc
0 replies
5h43m

this would be directly over the memory bus right? I think it's just always going to be faster like this if you can do it?

brrrrrm
0 replies
5h57m

Yea. It’s one less hop through slow memory

CamperBob2
1 replies
3h6m

The correct term, and the one most people would have used in the past, is "bus mastering."

wmf
0 replies
2h24m

PCIe isn't a bus and it doesn't really have a concept of mastering. All PCI DMA was based on bus mastering but P2P DMA is trickier than normal DMA.

rfoo
12 replies
6h30m

Glad to see that geohot is back being geohot, first by dropping a local DoS for AMD cards, then this. Much more interesting :p

jaimehrubiks
11 replies
4h15m

Is this the same guy that hacked the PS3?

mikepurvis
3 replies
3h24m

Yes, but he spent several years in self-driving cars (https://comma.ai), which while interesting is also a space that a lot of players are in, so it's not the same as seeing him back to doing stuff that's a little more out there, especially as pertains to IP.

nolongerthere
2 replies
2h59m

Did he abandon this effort? That would be pretty sad bec he was approaching the problem from a very different perspective.

cjbprime
0 replies
2h12m

It's still a company, still making and selling products, and I think he's still pretty heavily involved in it.

dji4321234
2 replies
2h0m

He has a very checkered history with "hacking" things.

He tends to build heavily on the work of others, then use it to shamelessly self-promote, often to the massive detriment of the original authors. His PS3 work was based almost completely on a presentation given by fail0verflow at CCC. His subsequent self-promotion grandstanding world tour led to Sony suing both him and fail0verflow, an outcome they were specifically trying to avoid: https://news.ycombinator.com/item?id=25679907

In iPhone land, he decided to parade around a variety of leaked documentation, endangering the original sources and leading to a fragmentation in the early iPhone hacking scene, which he then again exploited to build on the work of others for his own self-promotion: https://news.ycombinator.com/item?id=39667273

There's no denying that geohotz is a skilled reverse engineer, but it's always bothersome to see him put onto a pedestal in this way.

pixelpoet
0 replies
1h21m

There was also that CheapEth crypto scam he tried to pull off.

delfinom
0 replies
1h44m

Don't forget he sucked up to melon and worked for Twitter for a week.

WithinReason
2 replies
3h40m

And the iPhone

yrds96
1 replies
3h25m

And android

zoklet-enjoyer
0 replies
2h32m

And the crypto scam cheapETH

mepian
0 replies
4h13m

Yes, that's him.

ivanjermakov
7 replies
5h14m

I was always fascinated by George Hotz's hacking abilities. Inspired me a lot for my personal projects.

sambull
2 replies
5h6m

He's got that focus like a military pilot on a long flight.

postalrat
1 replies
4h39m

Any time I open guys steam half of it is some sort of politics

CYR1X
0 replies
2h54m

You can blame chat for that lol

jgpc
1 replies
3h50m

I agree. It is fascinating. When you observe his development process (btw, it is worth noting his generosity in sharing it like he does) he gets frequently stuck on random shallow problems which a perhaps more knowledgable engineer would find less difficult. It is frequent to see him writing really bad code, or even wrong code. The whole twitter chapter is a good example. Yet, himself, alone just iterating resiliently, just as frequently creates remarkable improvements. A good example to learn from. Thank you geohot.

zoogeny
0 replies
1h58m

This matches my own take. I've tuned into a few of his streams and watched VODs on YouTube. I am consistently underwhelmed by his actual engineering abilities. He is that particular kind of engineer that constantly shits on other peoples code or on the general state of programming yet his actual code is often horrendous. He will literally call someone out for some code in Tinygrad that he has trouble with and then he will go on a tangent to attempt to rewrite it. He will use the most blatant and terrible hacks only to find himself out of his depth and reverting back to the original version.

But his streams last 4 hours or more. And he just keeps grinding and grinding and grinding. What the man lacks in raw intellectual power he makes up for (and more) in persistence and resilience. As long as he is making even the tiniest progress he just doesn't give up until he forces the computer to do whatever it is he wants it to do. He also has no boundaries on where his investigations take him. Driver code, OS code, platform code, framework code, etc.

I definitely couldn't work with him (or work for him) since I cannot stand people who degrade the work of others while themselves turning in sub-par work as if their own shit didn't stink. But I begrudgingly admire his tenacity, his single minded focus, and the results that his belligerent approach help him to obtain.

vrnvu
0 replies
5h12m

I agree, I feel so inspired with his streams. Focus and hard work, the key to good results. Add a clear vision and strategy, and you can also accomplish “success”.

Congratulations to him and all the tinygrad/comma contributors.

Jerrrry
0 replies
4h20m

His Xbox360 laptop was the crux of teenage-motivation, for me.

HPsquared
6 replies
7h51m

Is this one of those features that's disabled on consumer cards for market segmentation?

mvkel
4 replies
4h7m

Sort of.

An imperfect analogy: a small neighborhood of ~15 houses is under construction. Normally it might have a 200kva transformer sitting at the corner, which provides appropriate power from the grid.

But there is a transformer shortage, so the contractor installs a commercial grade 1250kva transformer. It can power many more houses than required, so it's operating way under capacity.

One day, a resident decides he wants to start a massive grow farm, and figures out how to activate that extra transformer capacity just for his house. That "activation" is what geohot found

m3kw9
1 replies
1h46m

Where is the hack in this analogy

pixl97
0 replies
53m

Taking off the users panel on the side of their house and flipping it to 'lots of power' when that option had previously been covered up by the panel interface.

segfaultbuserr
0 replies
3h16m

Except that in the computer hardware world, the 1250 kVA transformer was used not because of shortage, but because of the fact that making a 1250 kVA transformer on the existing production line and selling it as 200 kVA, is cheaper than creating a new production line separately for making 200 kVA transformers.

bogwog
0 replies
3h23m

That's a poor analogy. The feature is built in to the cards that consumers bought, but Nvidia is disabling it via software. That's why a hacked driver can enable it again. The resident in your analogy is just freeloading off the contractor's transformer.

Nvidia does this so that customers that need that feature are forced to buy more expensive systems instead of building a solution with the cheaper "consumer-grade" cards targeted at gamers and enthusiasts.

rustcleaner
0 replies
2h39m

I am sure many will disagree-vote me, but I want to see this practice in consumer devices either banned or very heavily taxed.

ewalk153
5 replies
7h12m

Does this appear to be intentionally left out by NVidia or an oversight?

creshal
3 replies
6h58m

Seems more like an oversight, since you have to stitch together a bunch of suboptimal non-default options?

arghwhat
2 replies
6h37m

It does seem like an oversight, but there's nothing "suboptimal non-default options" about iteven if the implementation posted here seems somewhat hastily hacked together.

segfaultbuserr
1 replies
4h48m

but there's nothing "suboptimal non-default options" about it

If "bypassing the official driver to invoke the underlying hardware feature directly through source code modification (and incompatibilities must be carefully worked around by turning off IOMMU and large BAR, since the feature was never officially supported)" does not count as "suboptimal non-default options", then I don't know what counts as "suboptimal non-default options".

talldayo
0 replies
2h28m

then I don't know what counts as "suboptimal non-default options".

Boy oh boy do I have a bridge to sell you: https://nouveau.freedesktop.org/

nikitml
0 replies
6h57m

NVidia wants you to buy A6000

jagrsw
4 replies
7h55m

Was it George himself, or a person working for a bounty that was set up by tinycorp?

Also, a question for those knowledgeable about the PCI subsys: it looked like something NVIDIA didn't care about, rather than something they actively wanted to prevent, no?

mtlynch
1 replies
7h42m

Commits are by geohot, so it looks like George himself.

throw101010
0 replies
5h25m

I've seen him work on tinygrad on his Twitch livestream couple times, so more than likely him indeed.

toast0
0 replies
3h3m

PCI devices have always been able to read and write to the shared address space (subject to IOMMU); most frequently used for DMA to system RAM, but not limited to it.

So, poking around to configure the device to put the whole VRAM in the address space is reasonable, subject to support for resizable BAR or just having a fixed size large enough BAR. And telling one card to read/write from an address that happens to be mapped to a different card's VRAM is also reasonable.

I'd be interested to know if PCI-e switching capacity will be a bottleneck, or if it'll just be the point to point links and VRAM that bottlenecks. Saving a bounce through system RAM should help in either case though.

squarra
0 replies
5h30m

He also documented his progress on the tinygrad discord

userbinator
3 replies
4h36m

I wish more hardware companies would publish more documentation and let the community figure out the rest, sort of like what happened to the original IBM VGA (look up "Mode X" and the other non-BIOS modes the hardware is actually capable of - even 800x600x16!) Sadly it seems the majority of them would rather tightly control every aspect of their products' usage since they can then milk the userbase for more $$$, but IMHO the most productive era of the PC was also when it was the most open.

rplnt
1 replies
3h59m

Then they couldn't charge different customers different amounts for the same HW. It's not a win for everyone.

axus
0 replies
3h52m

The price of 4090 may increase now, in theory locking out some features might have been a favor for some of the customers.

mhh__
0 replies
2h18m

nvidia's software is their moat

perfobotto
1 replies
3h35m

What stops nvidia from making sure this stops working in future driver releases?

__MatrixMan__
0 replies
2h16m

The law, hopefully.

Beeper mini only worked with iMessage for a few days before Apple killed it. A few months later the DOJ sued Apple. Hacks like this show us the world we could be living in, a world which can be hard to envision otherwise. If we want to actually live in that world, we have to fight for it (and protect the hackers besides).

namibj
1 replies
5h12m

And here I thought (PCIe) P2P was there since SLI dropped the bridge (for the unfamiliar, it looks and acts pretty much like an NVLink bridge for regular PCIe slot cards that have NVLink, and was used back in the day to share framebuffer and similar in high-end gaming setups).

wmf
0 replies
2h20m

SLI was dropped years ago so there's no need for gaming cards to communicate at all.

jsheard
1 replies
7h52m

It'll be nice while it lasts, until they start locking this down in the firmware instead on future architectures.

mnau
0 replies
3h24m

Sure, but that was something that was always going to happen.

So it's better to have it at least for one generation instead of no generation.

aresant
1 replies
3h31m

So assuming you utilized this with (4) x 4090s is there a theoretical comparative to performance vs the A6000 / other professional lines?

thangngoc89
0 replies
3h1m

I believe this is mostly for memory capacities. PCIe access between GPUs is slower than soldered RAM on a single GPU

BeefySwain
1 replies
4h4m

Can someone ELI5 what this may make possible that wasn't possible before? Does this mean I can buy a handful of 4090s and use it in lieu of an h100? Just adding the memory together?

segfaultbuserr
0 replies
3h54m

No. The Nvidia A100 has a multi-lane NVLink interface with a total bandwidth of 600 GB/s. The "unlocked" Nvidia RTX 4090 uses PCIe P2P at 50 GB/s. It's not going to replace A100 GPUs for serious production work, but it does unlock a datacenter-exclusive feature and has some small-scale applications.

xmorse
0 replies
3h42m

Finally switched to Nvidia and already adding great value

xipho
0 replies
4h11m

You can watch this happen on the weekends, typically, sometimes, for some very long sessions, sometimes. https://www.twitch.tv/georgehotz

thangngoc89
0 replies
2h55m

You may need to uninstall the driver from DKMS. Your system needs large BAR support and IOMMU off.

Can someone point me to the correct tutorial on how to do these things?

spxneo
0 replies
2h54m

does this mean you can horizontally scale to GPT-4-esque LLM locally in the near future? (i hear you need 1TB of VRAM)

Is Apple's large VRAM offering like 196gb offer the fastest bandwidth and if so how will pairing a bunch of 4090s like in the comments work?

m3kw9
0 replies
1h49m

In layman terms what does this enable?

lawlessone
0 replies
2h49m

This is very interesting.

I can't afford two mortgages though ,so for me it will have to just stay as something interesting :)

gigatexal
0 replies
6h16m

as a technical feat this is really cool! though as others mention i hope you don't get into too much hot water legally

seems anything that remotely lets "consumer" cards canibalize anything with the higher end H/A-series cards Nvidia would not be fond of and they've the laywers to throw at such a thing

c0g
0 replies
3h17m

Any idea of DDP perf?

No1
0 replies
3h10m

The original justification that Nvidia gave for removing Nvlink from the consumer grade lineup was that PCIe 5 would be fast enough. They then went on to release the 40xx series without PCIe 5 and P2P support. Good to see at least half of the equation being completed for them, but I can’t imagine they’ll allow this in the next gen firmware.