feels like it could be nice to abide by the license terms https://bria.ai/bria-huggingface-model-license-agreement/
1.1 License. > BRIA grants Customer a time-limited, non-exclusive, non-sublicensable, personal and non-transferable right and license to install, deploy and use the Foundation Model for the sole purpose of evaluating and examining the Foundation Model. > The functionality of the Foundation Model is limited. Accordingly, Customer are not permitted to utilize the Foundation Model for purposes other than the testing and evaluation thereof.
1.2.Restrictions. Customer may not: > 1.2.2. sell, rent, lease, sublicense, distribute or lend the Foundation Model to others, in whole or in part, or host the Foundation Model for access or use by others.
The Foundation Model made available through Hugging Face is intended for internal evaluation purposes and/or demonstration to potential customers only.
The repo doesn’t include the model.
Does the site not distribute it?
It doesn't, except that it runs it. There's no download link or code playground for running arbitrary code on it, so while technically it transfers the model to the computer where it's running (I think) it's not usually considered the same as distributing it.
Yeah, that doesn't sound right to me.
What's the point of running it in WebGPU then?
I think it's either running the model in the browser or a small part of it there. Maybe it's downloading parts of the model on the fly. But I kinda doubt it's all running on the server except for some simple RPC calls to the browser's WebGL.
Anyone can easily do a online/offline binary check for web apps like these:
1. Load the page
2. Disconnect from the internet
3. Try to use the app without reconnecting
Well, my question is about where it lies within the gray area between fully online and fully offline, so that wouldn't work.
Edit: Good call! It's fully offline - I disabled the network in Chrome and it worked. Says it's 176MB. I think it must be downloading part of the model, all at once, but that's just a guess.
The 176MB is in storage which makes me think that my browser will hold onto it for a while. That's quite a lot. My browser really should provide a disk clearing tool that's more like OmniDiskSweeper than Clear History. If for instance it showed just the ones over 20MB, and my profile was using 1GB, at most it would be 50, a manageable amount to go through and clear the ones I don't need.
Yeah, this is why I think browsers need to start bundling some foundational models for websites to use. It's too unscalable if many websites start trying to store a significantly sized model each.
Google has started addressing this. I hope it becomes part of web standards soon.
https://developer.chrome.com/docs/ai/built-in
"Since these models aren't shared across websites, each site has to download them on page load. This is an impractical solution for developers and users"
The browser bundles might become quite large, but at least websites won't be.
As long as there’s a way to disable it. I don’t want my disk space wasted by a browser with AI stuff I won’t use.
What's the point of running it in WebGPU then?
Use client resources instead of server resources.
Pretty sure downloading it to your browser counts as distributing it, legally speaking.
AYAL?
Sure!
A lot of these AI licenses are a lot more restrictive than old school open source licenses were.
My company runs a bunch of similar web-based services and plan to do a background remover at some stage, but as far as I know there's no current models with a sufficiently permissive license that can also feasibly download & run in browsers.
Meta's second Segment Anything Model (SAM2) has an Apache license. It only does segmenting, and needs additional elbow grease to distill it for browsers, so it's not turnkey, but it's freely licensed.
Yeah, that one seems to be the closest so far. Not sure if it would be easier to create a background removal model from scratch (since that's a more simple operation than segmentation) or distill it.
I got pretty far down that path during Covid for a feature of my saas, but limited to specific product categories on solid-ish backgrounds. Like with a lot of things, it’s easy to get good, and takes forever to get great.
AI model weights are probably not even copyrightable.
Surely they would at least be protected by Database Rights in the EU (not the US):
https://en.wikipedia.org/wiki/Database_right
Those require the "database" in question to be readable and for every single element to be so too. Model weights don't satisfy that requirement.
Keep in mind that whether or not a model can be copyrighted at all is still an open question.
Everyone publishing AI model is actually acting as if they owned copyright over it and as such are sharing it with a license, but there's no legal basis for such claim at this point, it's all about pretending and hoping the law will be changed later on to make their claim valid.
Train on copyrighted material
Claim fair use
Release model
Claim copyright
Infinite copyright!
It's kind of silly to complain about not abiding by the model license when these models are trained on content not explicitly licensed for AI training.
You might say that the models were legally trained since no law mandates consent for AI training. But no law says that models are copyrightable either.
At some point the worlds going to need a Richard Stallman of AI who builds up a foundation that is usable and not in the total control of major corporations. With reasonable licensing. OpenAI was supposed to fit that mold.
It is a 2024 model, for comparison https://github.com/danielgatis/rembg/ uses U2-Net which is open source from 2022. There is also https://github.com/ZhengPeng7/BiRefNet (another 2024 model, also open source), it's not too late to switch.