return to table of content

Show HN: 80% faster, 50% less memory, 0% loss of accuracy Llama finetuning

apsec112
36 replies
17h24m

I haven't run the code, but... how is this even possible? I've done PyTorch profiling on QLoRA Llama-2-70B fine tunes, and the runtime is dominated by the large matrix multiplies in the MLP layers, plus a bit for attention. Under the hood, this repo calls the same torch.matmul() for MLP and flash_attn_func() for attention as HuggingFace does. So how can it be that much faster? They have a few Triton kernels, but there appears to be no Triton on MLP or attention and that's most of the bottleneck.

TheGeminon
21 replies
16h46m

There is a more detailed explanation at https://unsloth.ai/introducing

apsec112
20 replies
16h36m

That... doesn't really explain how they can get such a high number? Standard FLOP efficiency on fine-tuning big models is like 30-40%. How can you get 750%?

danielhanchen
19 replies
15h15m

Hey! Great question! That's what I'm confused about as well!

So in GPUs the goal is to saturate the GPU with matrix multiplies instead of data movement. I'll write a more detailed blog but approximately:

1. Flash Attention v2 reduces the time taken by 17% or so

2. RoPE Triton kernels: -7.1%

3. RMS Layernorm in Triton: -3.1%

4. Cross Entropy in Triton: -1%

5. Manual autograd for MLP: -4%

6. Manual QKV autograd: -2%

7. Manual O autograd: -2%

8. Smart cache evictions and reduced data duplications etc: -30%

9. And other tricks in the Max and Pro versions makes it 30x faster

You can see it's just tricks in each step, which accumulate together to make to go faster.

I'll write up a blog post to detail it all in the future!!!

demosthanos
18 replies
13h2m

And other tricks in the Max and Pro versions makes it 30x faster

This feels like the collecting underpants meme. Phase 1: Get to the same performance as other methods. Phase 2: ???. Phase 3: Now you're at 750%!

You may or may not actually have succeeded at what you claim to, but you're not being very persuasive. I realize that you're trying to turn these tricks into a profit and revealing them would destroy that possibility, but you're going to have a really hard time persuading people to pay for a product that does something that enormous teams of PhDs at BigTech haven't been able to pull off on the basis of "trust me".

danielhanchen
17 replies
12h32m

I agree fully - what do you suggest then? OSS the entire code base and using AGPL3? I tried that with https://github.com/danielhanchen/hyperlearn to no avail - we couldn't even monetize it at all, so I just OSSed everything.

I listed all the research articles and methods in Hyperlearn which in the end were gobbled up by other packages.

We still have to cover life expenses and stuff sadly as a startup.

Do you have any suggestions how we could go about this? We thought maybe an actual training / inference platform, and not even OSSing any code, but we decided against this, so we OSSed some code.

Any suggestions are welcome!

wsxiaoys
7 replies
12h9m

Wow, this is a great topic. I don't really have specific suggestions, but I'd like to contribute some thoughts on the matter.

Monetizing anything isn't inherently problematic; the challenge lies in defining what should be paid for and what should be offered for free.

In the realm of open-source products and SaaS, the common practice is to provide free self-hosting options while charging for cloud hosting or enterprise-specific features, such as access control and authentication integrations.

However, the landscape becomes significantly more challenging for LLMOps (assuming you are still focusing on training as a major aspect of your business, which can be categorized as LLMOps).

Historically, there haven't been many success stories in this area (with exceptions like wand.ai, which focusing on tracking experiments). I believe this difficulty arises from the largely ad-hoc nature of training and fine-tuning processes, making standardization a challenge, coupled with the infrequency of these tasks.

That being said, training/finetuning is a valuable technique. However, transforming it into a company that offers products is really challenging. Successful examples in this realm typically depend heavily on solution customization or consulting-oriented business models.

danielhanchen
6 replies
11h49m

Thanks for the points! I agree monetization in the LLM Ops space is hard and complex. Agreed fully on customizing solutions or consulting.

Yep self hosting solutions like Redhat, or DBs like MongoDB or Gitlab's dashboard style approach could work - the issue is now as you mentioned we offer training and finetuning.

We do plan to offer inference as well, plus the data gathering process, and the final prompt engineering side - but we thought why not have a shot?

It's possible best to make a training and inference platform - maybe some sort of personal ChatGPT training for the public - everyone can train their own personal ChatGPT not via ChatGPT's in context learning or RAG, but coupled with actual fast 30x finetuning, a personal bot can truly be possible.

Thaks for the suggestions!

_boffin_
3 replies
10h36m

You have companies that are spending good money on fine-tuning and will start spending money on fine-tuning. It seems like it would almost be easier to just go directly to these companies by looking at their blog posts--they're telling you that they're doing it in some way or another. I know Plaid and friends are doing it.

It's costing them x. you can shave y off. you can get improvements to market faster and cheaper.

danielhanchen
2 replies
9h39m

Interesting points! I shall try this with my bro!!

I was thinking along the lines of say the cost of A100s or H100s * electricity cost and engineering costs then how much we save, and some discounting factor.

rmbyrro
1 replies
6h42m

I think the time savings will be more appealing.

It allows for fast iteration and shorter go-to-market, which can generate virtually infinite value, as opposed to saving electricity, which is a limited game.

danielhanchen
0 replies
5h36m

Fair point - I forgot to mention the time savings LOLL!!!

IanOzsvald
1 replies
4h16m

You may want to look sideways to companies such as hedge funds. They have DNN teams and experiment with LLMs, you may find interesting optimisation opportunities with such teams. Charge according to opportunity that you open up, not electricity saved!

danielhanchen
0 replies
3h20m

Interesting! Hedge funds - very interesting.

Oh no yep your right on time saved and what opportunities it gives them not just the electricity and capital costs :))

You can now experiment 30 different models instead of 1 - if you have 100 GPUs, we magically made it 3000!

welzel
1 replies
8h35m

Finding a OOS business model is non-trivial.

Maybe you should talk to https://goodsnooze.gumroad.com/l/macwhisper to get some inspiration?

People are paying for convenience.

as for the technology itself: the B2B market is super-super early and i understand everybody is in goldrush mode, however 98% of all startups will not survive the next 3-5 years.

From the demand site: Companies are still sleeping, you can see very very very few proof of concept implementation, but basically nothing goes to production.

The rate of innovation is extremely high with LLM, making it a bad investment for a company.

My idea: OSS everything, become an expert in the field, learn how to sell, survive from consulting services. Don´t build products, do paid projects instead.

Focus all your energy to understand customer needs and building your target audience.

Be ready when the time is right to build a startup around LLM.

Don´t waste time building technology, develop your business instead.

danielhanchen
0 replies
8h3m

Hmm interesting take on things - I just thought consulting would fall into the trap of Dunbar's number https://en.wikipedia.org/wiki/Dunbar%27s_number - plus consulting requires more effort so given 24 hours in a day, you can only grow so much.

It sounds like consultants will become freelancers in the future - but LLMs itself might take over the consultant's job as well.

But on that note - that's why with my bro, we decided Unsloth was out 1st product release - we're going to be releasing tonnes of new other products! (Coincidentally a data science consultant as well!)

rmbyrro
1 replies
6h46m

I think a training / inference platform that shares scale efficiencies would be very attractive. I'd use it for sure.

Problem with most platforms is they keep ALL scale efficiencies for themselves, which scares away big projects. They end up with only small users, which don't make unicorns in this case.

Finetuned LLMs is the future for most enterprise applications. Not every shop can possibly set up its own LLM team. If you abstract that away and let them know they'll pay less (per unit) as they scale up, it'd be a juicy proposal.

danielhanchen
0 replies
5h57m

Fair points!! I agree fully on the platform approach since many people have already mentioned about it and if it's super duper efficient and affordable, people are willing to pay

mdekkers
1 replies
9h20m

You don’t need to explain every detail of how you do what you do, your product should speak for itself - outcomes count. If you go to a restaurant, you don’t demand to stand in the kitchen and inspect each ingredient and process, right?

I appreciate this probably isn’t a popular HN opinion, but as you say, you need to make a living. If you have produced something novel that is working, put the gaspedal down and monetise the absolute living daylights out of it as long as you can. Because that is what everyone with _money_ is doing. You don’t see OpenAI opening all their research and tricks now, do you?

Do your thing, buddy, and make your money. All the best with your startup, and don’t get distracted by the people clamouring for your recipes.

danielhanchen
0 replies
9h7m

Thanks! That's what I was trying to convery just I couldn't say it like that! :)

Sadly OpenAI did in fact open source everything, but now revenue is king - I'm sure they will open source stuff in the future once the time is right.

But thanks a lot - it means a lot - highly appreciate it!!!

matthewcford
1 replies
3h58m

Emphasizing that you have already done performance optimisation for ML algorithms would make me trust this a lot more, esp as you then open-sourced it.

danielhanchen
0 replies
3h21m

Fair point!! I shall keep pointing this out from now on! :)

theptip
0 replies
53m

If your sauce is algorithmic (both current and future edge) then you cannot be OSS and profitable. Google will open source all sorts of things, but never their recommender algos.

Your best bet is probably a SaaS training platform (I suspect inference is a harder business, as you need to serve high uptime APIs; I guess you have more forgiving SLAs for training batches). Sell to medium-large companies (big enough to need training, not big enough to have an established in-house platform), and if you need to bootstrap at all you can probably do profitable consulting-type work without giving up your core IP, since you can hand off the trained model weights without handing out all of your trade secrets.

Folks around here are going to gripe about this; HN has a contingent of FOSS enthusiasts but these people are not going to give you a dollar, they are not your customers. FOSS is great but you are under no obligation to give away your life work.

Honestly where you have landed (opening up some of your work) is more generous with your time than most people would be; people should be thanking you instead of complaining that it’s not more open. I think giving out enough OSS for people to realize you are the real deal while keeping the biggest wins closed is a good marketing strategy.

bradfox2
9 replies
14h1m

These are significant claims locked behind a paid 'pro' version. Red flags.

danielhanchen
8 replies
13h55m

Sorry about that - I'm super new to pricing and stuff so it might seem off since I'm literally making the plans with my bro as we go along.

If you don't believe the timings, I was the author of Hyperlearn https://github.com/danielhanchen/hyperlearn which makes ML faster - I also listed the papers which cite the algos.

I also used to work at NVIDIA making TSNE 2000x faster on GPUs and some other algos like Randomized SVD, sparse matrix multiplies etc.

If you have any suggestions on a more appropriate pricing strategy - I'm all ears!!

I really don't know much about pricing and the open core model, so I'm making stuff up literally.

bugglebeetle
3 replies
9h30m

If this is all legit, you’re best trying to get an in somehow with a16z. They’re throwing money left and right at people doing this kind of stuff.

danielhanchen
2 replies
9h25m

Oh my A16z? Do you have any contacts? :))

But for now - our goal is somehow to get revenue ourselves via some cool AI products, and trying to shrink the expenses to 0 (like via our fast training methods)

angrais
1 replies
5h53m

One approach you could take is to license the code (or simply the tricks) to big tech companies. They can use the tricks, but must pay you x amount. You can provide technical support for implementation and benchmarking.

That's how I would make profit from what you're doing as many big tech companies have already achieved (and more) of what you claim.

I know this as I work in such a company. However, I'd bet they'd pay a fair amount for new solutions that differ from their own.

danielhanchen
0 replies
5h31m

Hmm the main issue sounds like the market capture is limited - I highly disagree bigtech companies have achieved what we already have - maybe OpenAI or other AI companies might have done it - but it's all "might" have.

I worked myself in the past at NVIDIA making algos faster, so it's not a done deal big tech companies have all the tips and tricks. They have the best hardware, but software not so much.

The issue with licensing code is your revenue capture is minimal - maybe a training platform which provides everyone and not just big tech companies a cheap and efficient implementation sounds much better.

The issue with licensing is how much do you charge? How do you monitor usage? Etc

bradfox2
1 replies
13h21m

Are the 30x claims comparing full update training vs qlora/4bit weights?

danielhanchen
0 replies
12h36m

Sorry about the confusion - the 30X is comparing QLora with QLora - all benchmarking is QLoRa bsz=2, ga=4

MacsHeadroom
1 replies
11h2m

Pro says single GPU some places and multi-GPU others. I really hope it is multi-GPU because basically every enthusiast making finetunes is using 2-4 3090s locally or renting 8x GPU boxes on Vast for finetuning. If multi-GPU is enterprise only you will miss out on basically the largest customer segment as far as I can tell.

danielhanchen
0 replies
10h48m

Apologies on the confusion - I'm still trying to flesh stuff out on the differentating factors with my bro - I'll update it once we're fully sure - sorry again!

WhitneyLand
3 replies
15h8m

They say it’s due a custom optimized version of autograd which is sort of a key calculus component. They also mention simple things like function inlining or memory optimizations. It seems plausible these things could be optimized.

Whatever advantage they have I don’t see how they would be able to to keep it for long as part of their closed source “pro” version.

If it’s low hanging fruit the open source equivalents are bound to snipe them before long.

danielhanchen
2 replies
14h44m

There's more as well!! I agree hypothetically in a few years people will catch up - we're well aware of that! But we made this into a startup, and so this was just one product we were releasing! We're going to be making many AI products in the coming weeks!

WhitneyLand
1 replies
14h6m

That sounds really great, we need all the advancement we can get in this field.

Best of luck and all success to you!

danielhanchen
0 replies
14h2m

Thanks a bunch!! Super appreciate it!

squigz
21 replies
12h34m

What is the motivation behind open sourcing code at all if you're not going to open source it all? I don't mean this to be an asshole, as I truly want to understand the motivation.

danielhanchen
14 replies
12h9m

Good point - the main issue is we encountered this exact issue with our old package Hyperlearn (https://github.com/danielhanchen/hyperlearn).

I OSSed all the code to the community - I'm actually an extremely open person and I love contributing to the OSS community.

The issue was the package got gobbled up by other startups and big tech companies with no credit - I didn't want any cash from it, but it stung and hurt really bad hearing other startups and companies claim it was them who made it faster, whilst it was actually my work. It hurt really bad - as an OSS person, I don't want money, but just some recognition for the work.

I also used to accept and help everyone with coding up their startup's software, but I never got paid or even any thanks - sadly I didn't expect the world to be such a hostile place.

So after a sad awakening, I decided with my brother instead of OSSing everything, we would first OSS something which is still very good - 5X faster training is already very reasonable.

I'm all open to other suggestions on how we should approach this though! There are no evil intentions - in fact I insisted we OSS EVERYTHING even the 30x faster algos, but after a level headed discussion with my brother - we still have to pay life expenses no?

If you have other ways we can go about this - I'm all ears!! We're literally making stuff up as we go along!

kristopolous
7 replies
12h7m

There's a variety of licenses some of which protect you from this problem. Maybe those would be appropriate?

danielhanchen
6 replies
12h4m

Sadly I've tried GPL3 for Hyperlearn and AGPL3 - those licenses only work for UI or Database based OSS - eg MongoDB, Gitlab etc.

I still haven't figured out how to open source algorithms which don't have an UI or database - maybe a training or inference platform?

But if we made a training / inference platform, then there won't be any OSS code.

We're currently stuck in the middle - do you have any suggestions on this?

Zuiii
3 replies
11h38m

I still haven't figured out how to open source algorithms

You're not supposed to be able to restrict algorithms (i.e. math), and copyright (e.g. open source licenses) is definitely not the right tool. If you want to do it anyway, You'll have better chances by abusing the patent system.

danielhanchen
2 replies
11h18m

Hmmm patents - doesn't that also showcase everything in the open though?

Zuiii
1 replies
11h10m

Yes, but that's the only way I know of to legally stop others from making money off "ideas". Copyright isn't a good tool for this as oracle recently found out.

danielhanchen
0 replies
10h47m

Good point! I'll have a chat with my bro - but thanks for your suggestions - highly appreciate it!

Jayakumark
1 replies
11h40m

Sell binaries without code for individuals/enterprises who can train on private data , Get some GPUs and sell training platforms similar to together.ai for enterprises with data privacy guarantee. just make it super simple , give a google like sheet to fill or dump a folder with text files and you can use ml to make it in structured format. User gives

Input - > excel, txt etc

Select model base -> mystal, llama2

output --> fine trained model.

Give a fixed cost based on data you got .. say it will take $50 to train on this ? for 25 hours of GPU with your markup.

User Pays the bill and get PEFT llama model out.

The big problem i see in Finetuning for GPU Poor is no one gives an estimate on how much it will cost for your data on a given model.

have a calculator for x words or Bytes and y epochs, here is the cost we are estimating..

Once you make your $x million in terms of 50x your pay for the time you invested, you can open source it, but we are greedy when it comes to money.. so opensource model at your will. In the mean time hope no one catches your technique.

danielhanchen
0 replies
11h19m

Ye a training platform!

Ye for hobbyists - it's hard to price.

But I agree some sort of platform for training could in theory work

nikolayasdf123
3 replies
8h56m

gobbled up by other startups and big tech companies with no credit

true. saw this myself too for my projects. big tech / startups would use your project internally and give you nothing back. sadly happens often.

MyFirstSass
1 replies
4h55m

I feel like free open source without paying is for tinkerers, startups, smaller orgs, but as soon as you become a "big company" with wealthy owners you need to either contribute or pay up.

This also incentives contributing to open source, because you'd potentially get compensated if the code was used by some big corporation.

Why isn't this a thing?

danielhanchen
0 replies
4h35m

I also wish this was the case!

danielhanchen
0 replies
8h37m

:( Very sad

squigz
1 replies
9h57m

Thanks for your response, and for open sourcing things at all :)

The issue was the package got gobbled up by other startups and big tech companies with no credit

I understand that this is frustrating, but are you open sourcing things for recognition? The fact is companies are going to use 100s of open source packages and aren't going to credit them all, or even any.

At any rate, I appreciate how difficult your position must be, and wish you luck.

danielhanchen
0 replies
9h22m

Thanks! :)

I agree, but generally as a researcher / OSS even a nice citation is nice :) There are some cool people who do just that, but some do not.

But thanks once again!

ShamelessC
3 replies
12h26m

This is, I believe, part of the Y Combinator playbook. You want to get as much growth as possible using any means you can think of then flip the pricing (or tracking) on as soon as it peaks. Alternatively you can sell your company or go public.

The company here does seem less sinister however - don't think they've accepted VC investment yet? Could be wrong.

danielhanchen
2 replies
12h7m

No we don't have any funding! I'm grateful to my family to supporting us till now.

We're open to VC investment now, except we truly believe in bootstrapping it - we have tonnes of other products in the pipeline like a Recession predictor, a Data Science Consultant, our own trained chatbot via DPO and our fully cleaned datasets and more!

ShamelessC
1 replies
11h0m

Refreshing to see, especially on this forum! Best of luck.

danielhanchen
0 replies
10h47m

:) Thanks!

sbierwagen
1 replies
12h32m

Shareware. See the open source preview, pay a million dollars for the full version.

danielhanchen
0 replies
12h3m

I wouldn't characterize it exactly as "shareware", since the OSS version is fully functional, trains 5x faster on a single GPU.

The paid version just makes it train 30x faster on multi GPU platforms - it's more of an open core approach.

kristopolous
7 replies
18h15m

It'd be great to have a chronicle of all these efforts. I lost track of the variations quite a long time ago.

It'd be quite a lift unless we're just willing to just accept the self reported metrics as golden. And even then, they're always qualified by hardware and usage scope. Making it good enough to be useful is the hard part. CI/CD pipeline with a bunch of machine configurations and benchmarks along with a reasonable way to communicate them...

If anyone's up for it you'd legitimately become indispensable.

danielhanchen
6 replies
15h21m

Hey there! Yes that's exactly what I thought! I was writing up a blog post at https://colab.research.google.com/drive/1AOuhMVILE06mD-Go7-R... which step by step shows every change I did plus gave timings / mem savings.

I'll post it once its completed if you're interested!

kristopolous
5 replies
14h55m

Thanks. There's been a few substantial and laudable efforts which are much appreciated but what I'm suggesting is an actual continuous infrastructure, like how those benchmarking sites have software for people to run on their machines that phone home so that people who make new benchmarks or new variations can submit them and refine the results.

For instance, are any of your prompting tests in say, Korean? What about winograd schema challenges in languages other than English? Japanese for instance, comes with its own unique set of context ambiguities that do not appear in English. I'm sure dozens of languages are similar. It'd be nice to have user contributable tests to cover the breadth of use cases here.

A great optimization that moves a score let's say from 95% -> 5% on "winograd-persian" may be fine or may be a show stopper, depends on what you care about.

That's why it's gotta be normalized, future-proof, and crowdsourced.

danielhanchen
4 replies
14h53m

Ohhh interesting! So like ML Perf or Faster CPython https://github.com/faster-cpython/benchmarking-public?

kristopolous
3 replies
14h35m

Sure. This thing I propose might already exist in a nascent form.

I haven't checked in a few months and the way things are moving now...

danielhanchen
2 replies
14h22m

Interesting!! I'll talk with my brother about this - sounds very cool!

ptd
1 replies
8h48m

Hey Daniel, I would love to help out on this. I'm learning about LLMs and this benchmarking project sounds like a fun way to further my knowledge and skills. I sent you a message on LinkedIn.

Cheers!

danielhanchen
0 replies
8h31m

Cool lets chat on Linkedin! :)

yunohn
5 replies
14h5m

This seems very interesting, but I’m very confused why you have gated the /max/imum speedup version for enterprise-only? It would make more sense to have only the Free and Paid plans differing in performance, and Enterprise getting support/etc.

danielhanchen
4 replies
14h2m

Good point - we thought about this - we're still figuring out the pricing as we go along - so all suggestions are welcome!

I'm all very new to this, so I'm literally making stuff as I go along!!

Apologies!

yunohn
3 replies
13h49m

No worries, just my 2c!

Given the current state of AI performance, I’d imagine that those without 100s of GPUs but looking to maximise the performance of what they do have, would be a great demographic for the Paid plan.

danielhanchen
2 replies
12h30m

Interesting! I did have some chats from interested people who have say 2 or 4 or 8 GPUs who are willing to somehow buy the Pro plan - the issue is now how do we price it, do we make a training platform? etc

hirako2000
1 replies
5h29m

Offer a (pricy) license to existing training platform providers. It will still make save a lot on infra costs.

danielhanchen
0 replies
4h50m

What do you think about ourselves offering a platform?

foota
5 replies
16h36m

There are a number of papers on optimizing xor chains for cauchy reed Solomon computation, it sounds like a superficially similar problem?

danielhanchen
4 replies
15h9m

Hey! Oh noo we don't do any XOR chain optimizations or Cauchy Reed Solomon computations - I had to wikipedia it and learn about them - mroe reading to do!

It's mainly trying to reduce data movement, maximize FLOP utilization via matrix multiplies, and reducing FLOPs via manually deriving backpropagation equations + more!

foota
3 replies
12h8m

I realize they're different, but it seems like there are some similarities. In both, you have some chain of operations you need to do, and you want to minimize their cost and optimize for cache benefits etc.,

This is a pretty good overview paper: https://arxiv.org/pdf/2108.02692.pdf (they claim to be better than the rest -- I haven't evaluated their claims in practice yet)

The techniques are mostly about ordering and common subexpression elimination for XOR operations and choosing "better" matrices to do the computation with. CRS codes can be used with many different matrices, but it turns out some are better than others for effiency (XOR operations are encoded by 1s in the CRS matrix, so if you choose a CRS matrix with fewer of them while meeting the requirements, you can do less operations and get the same results).

danielhanchen
2 replies
11h48m

OHHH yes yes - sorry you're correct - apologies I'm not too familiar with this - thanks for the paper I'll read it through!!

Yep so the OSS version does have the reordering of operations - I'll write a blog post about it in the coming days!! We also have other tricks up our sleeve to make it go even faster!!

foota
1 replies
9h1m

Ah actually, https://www.usenix.org/system/files/fast19-zhou.pdf is probably a better reference (it includes the discussion of "bitmatrix normalization", the mentioned process of reducing the number of xors necessary, aside from scheduling and common subexpression elimination).

Hope it's potentially helpful :)

danielhanchen
0 replies
8h38m

OOOOO thankss!! 100% I'm reading this - thanks a bunch for the references! Highly appreciate it!

0x6c6f6c
4 replies
13h59m

I see this mentions 2018+ GPUs, but I'm curious why this wouldn't work on say a 1080Ti. My cursory look at the hardware specs shows this has support for CUDA8+ and this states 7.5

Would someone be able to tell me more about this?

danielhanchen
2 replies
12h31m

Sorry about 1080 Ti - sadly Triton and Xformers support Cuda 7.0, so unless if OpenAI and Meta makes it support Cuda 6.0, we sadly don't support it.

The main reason is because from Turing, Tensor Cores got provided, and so the matrix matmuls are all different in Tensor Cores

3abiton
1 replies
8h18m

Any source recommendations on howto find out the minimum hw requirementsto train LLM with Xbit and YB parameters?

danielhanchen
0 replies
7h58m

Great question - I'm actually not sure - I'll probably write up some minimum requirements - I know QLoRA's paper https://arxiv.org/pdf/2305.14314.pdf has an approximate calculation of the weights size, but not the gradient updates

zargon
0 replies
13h8m

1080 Ti has CUDA capability version 6.1.

xrd
3 replies
15h19m

OK, this is my favorite .ai domain. And, the logo snaps!

danielhanchen
2 replies
15h12m

Thanks!! My brother came up with the name!! "Un-sloth" where sloth is the slow cute animal, and it also means lazy and slow, and "un" reverses it!

juunpp
1 replies
14h48m

SSL_ERROR_NO_CYPHER_OVERLAP when checking your site.

danielhanchen
0 replies
14h42m

Do you mean http://www.unsloth.ai/ doesn't work? My brother is working on it as we speak!

Edit: it seems like http://unsloth.ai works - we're talking with our web hosting people to fix it

nikolayasdf123
3 replies
8h54m

monetising software tooling and libraries is notoriously hard. and amount of heat you getting here is huge. hope you find a way!

danielhanchen
2 replies
8h39m

Thanks!! I agree fully - I'm still trying to work it out with my brother on how we can effectively monetize it - probably products is the way to go like how OpenAI does it?

But thanks a lot again!

rightbyte
1 replies
5h5m

Sell the patches to Zuckerberg so he can FOSS it to piss MS off.

danielhanchen
0 replies
3h18m

Loll - I guess now the question is at what price? :)

neilmovva
3 replies
10h5m

promising results, excited to try it out!

question on the perf benchmarks: why do all the results with 2 GPUs & DDP take longer than the single GPU case? Both benchmarks do the same amount of work, one training epoch, so this negative scaling is surprising.

danielhanchen
2 replies
9h34m

So there's 2 main reasons:

1. DDP itself has an overhead since it has to synchronize gradients at each training step since GPU0 and GPU1 has to give gradients to GPU0.

2. Huggingface seems to not be optimized well for DDP mainly due to inefficient data movement - we fixed that - interestingly - even on 1 GPU it's faster.

neilmovva
1 replies
8h25m

I agree that synchronization causes overhead, so 2x GPUs won't achieve the ideal 0.5x total runtime. But here, taking your Alpaca benchmark as an example, we are seeing 2x GPUs get 3.6x runtime with Huggingface, or 1.15x with Unsloth Max.

In other words, every benchmark, in either HF or Unsloth, is slower in absolute terms when going from 1 to 2 GPUs. That makes me think something is wrong with the test.

Could you share your benchmark code?

danielhanchen
0 replies
7h59m

You can refer to QLoRA's official finetuning notebook https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zb... for your reference!! Obviously I can't provide the code we have, but if you use the same datasets and the same settings (bsz = 2, ga = 4, max_grad_norm = 0.3, num_epochs = 1, seed = 3407, max_seq_len = 2048) you should be able to replicate it.

gentleman11
3 replies
14h26m

Isnt llama proprietary? Why not dine tune one of the truly open models instead?

selfhoster11
1 replies
12h6m

There's a tremendous open source ecosystem in place for Llama already. This is a huge deal for beginners and model integrators alike.

Not to mention that other than beyond 7B models, your options drastically taper off. Mistral and most open base model projects only have models available up to 7 billion parameters or so, which is quite tiny if you are used to the relative ease of using un-finetuned GPT-4 to carry out your tasks of choice.

So what options are there? Falcon 40B and MPT-30B - sure, the weights license is all right, but many in the community have reservations about those models' underperformance, as you can get much more bang for your buck with newer models, in an equal number of weights in a newer base model. Subjectively speaking, it could be a waste of time.

Falcon 180B and Yi 34B weights are both issued under non-free licenses, just like Llama 2.

Is Llama 2 proprietary? For the vast majority of people, for the vast majority of purposes, no. I'm not a lawyer, but I think that Meta would be quite unlikely to do more than cut off your HuggingFace access to the repo where new models will be distributed.

danielhanchen
0 replies
11h58m

Thanks to Meta for open sourcing Llama!!! Ye sadly the HFF leaderboard doesn't have a high opinion for Falcon. MPT's long context via Alibi did work, just less so when compared to RoPE scaling.

All thanks to Llama - the LLM community is now vibrant and alive!

danielhanchen
0 replies
14h21m

Llama is generally OSS, except for some gating to large companies - but as a first try I made it work for Llama since the architecture is replicated in other models like Mistral or Yi.

graphe
1 replies
15h40m

Somewhat related: is it worth it to use a P100 or P40? I was gonna get one but seems like pascal isn't being supported by more and more projects.

danielhanchen
0 replies
14h49m

P100 - oh my so Xformers for Flash Attention I think does support it, but Triton supports Compute Capability 7.0+, whilst a P100 is 6.0 :(

So technically the code can run, but I'll have to edit it to remove the Triton changes.

aerioux
1 replies
9h51m

If this technology is generalizable to more LLM architectures, y'all can start messaging venture capitalists with a demo and they can help on the rest (pricing models, customers, etc).

danielhanchen
0 replies
9h34m

Working on making it work on all archs!! Probably some sort of automatic dispatcher and automatic autograd engine!

TuringNYC
1 replies
15h34m

wow i wish i could do this with all my M1 pro max neural cores

danielhanchen
0 replies
15h11m

M1 support is probably coming in the future if there is enough support - could you make an Issue at https://github.com/unslothai/unsloth - that would be much appreciated!

I think there are a few Redditors from /r/localllama who also requested this, but for now first priority is getting Mistral support!!

peytoncasper
0 replies
2h10m

Can I suggest that you ignore all the criticism about the pricing on this thread and immediately find a sales rep or SE that has worked at an early stage DB company and begin cold calling high end customers with thousands of GPUs.

B2B deals at 200-300k+ are your best bet at selling this IMO.