hi friends! burkay from fal.ai here. would like to clarify that the model is NOT built by fal. all credit should go to Black Forest Labs (https://blackforestlabs.ai/) which is a new co by the OG stable diffusion team.
what we did at fal is take the model and run it on our inference engine optimized to run these kinds of models really really fast. feel free to give it a shot on the playgrounds. https://fal.ai/models/fal-ai/flux/dev
The playground is a drag. After accepting being forced to sign up, attach my GitHub, and hand over my email address, I entered the desired prompt and waited with anticipation.. Only to see a black screen and how much it's going to cost per megapixel.
Bummer. After seeing what was generated in the blog post I was excited to try it! Now feeling disappointed.
I was hoping it'd be more like https://play.go.dev.
Good luck.
https://replicate.com/black-forest-labs/flux-dev is working very nicely. No sign-up.
My go-to test for these tools so far has been the seven horned, seven eyed lamb mentioned in the Book of Revelation. Every tool I've tried has failed at this task.
Ah. I try the following:
Every human I've ever described this to has no problem picturing what I mean. It's a classic comic trope. AIs still struggle.
This gets interesting. One approach that I've used with image generation before is to find an image of the sort that I want, and have Dall-e describe it... and then modify the prompt that it provides to be one with the elements that I want.
The first attempt at this based on https://reductress.com/post/my-boyfriends-are-always-two-kid... ... really misunderstood the image. This may also be part of the problem.
I then went to the image from https://www.reddit.com/r/DnD/comments/c6fdw4/oc_introducing_...And that provided:
Working off of that idea of the totem formation ... "Create an image featuring three children in a totem pole formation that are trying to conceal their nature in a single oversized trench coat."That produced https://imgur.com/a/Of9FsJl
I suspect the orange beard came from the previous part in the session. But that might be an approach to take in trying to describe it in a way that can be used.
Current generation image generators don’t understand text like instructions as you’re trying to do, describing an object then placing it then setting the scene.
It’s more like a giant telescope of many lenses (the latents from the prompts) and you’re adjusting the lenses to bring a possible reality of many into focus.
It looks like imgur is blocking Mullvad VPN connections
But AIs learn and therefore create in exactly the same way as humans, ostensibly on the same data. How can this be possible? /s
A rough rule of thumb is that if a text-generator AI model of some size would struggle to understand your sentence, then an image-generator model a couple of times the size or even bigger would also struggle.
The intelligence just doesn't "fit" in there.
Personally I'm curious to see what would happen if someone burnt $100M of compute time on training a truly enormous image generator model, something the same-ish size as GPT4...
I mean, you can use a fork to make whipped cream, but it won't be easy and it's not the right tool for the job. Does that mean that the fork is useless?
I never said it was useless, just that it fails at this specific problem. One of my complaints with many of these image generation tools is that there's not much communication as to what should be expected from them, nor do they explain the areas where they're expected to succeed or fail.
Recently Claude began to allow generation of SVG drawings, and asking it to draw a unicorn and later add extra tails or horns worked correctly.
A fork exists in physical space and it's pretty intuitive to understand what it can do. These models exist within digital space and are incredibly opaque by comparison.
"Recently Claude began to allow generation of SVG drawings, and asking it to draw a unicorn and later add extra tails or horns worked correctly."
That sounds interesting! Were the results somewhat clean and clear SVG or rather a mess that just looked decent?
Here's the screenshot [0] that was shared with me. It's obviously pretty basic, but Claude understood the correct location for where the horns and tails should be located. This looks like a clear iterative improvement over older models.
[0] https://imgur.com/Cc5uJNg
Can you share the exact prompt you used?
Sure. Normally I try a few variants, but "lamb with seven horns" was what I tried when I made that post.
For what it's worth, I've previously asked in the Stable Diffusion Discord server for help generating a "lamb with seven horns and seven eyes" but the members there were also unsuccessful.
That classic “bowl of ramen without chopsticks “ also fails. Haven’t seen any get that right yet either.
Negation is a known weak spot. Aren't you just retesting that again and again? Does it tell you much beyond that?
Thanks this one actually works, pretty amazing.
Remarkably better than the "DrawThings" iPhone app (my only reference point).
You should expect Flux model support in the next a few days coming to the app.
Is there a noscript/basic (x)html prompt?
also no signup for the optimized endpoint on fal https://fal.ai/models/fal-ai/flux/schnell
Thanks for the link, and this model looks really good! I think I can tweak the output to make a nice logo [0] for my project!
[0] https://github.com/bbkane/envelope/issues/44
I was inspired by the SD3 problems to use this prompt:
"a woman lying on her back wearing a blouse and shorts."
But it wouldn't render the image - i instead got a NSFW warning. That's one way to hide the fact that it cannot render it properly i guess...
PS: after a few tries it rendered "a woman lying on her back" correctly.
You also might want to "clarify" that it is not open source (and neither are any of the other "open source" models). If you want to call it something, try "open weights", although the usage restrictions make even that a HUGE FUCKING STRETCH.
Also, everybody should remember that these models are not copyrightable and you should never agree to any license for them...
It would be nice here if you give some examples of what you call open source model. Please ;) Because the impression is that these things do not exist, it's just a dream which does not deserve such a nice term..
As far as I know, none have been released. And it doesn't even really make sense, because, as I said, the models aren't copyrightable to begin with and therefore aren't licensable either.
However, plenty of open source software exists. The fact that open source models don't exist doesn't excuse attempts to falsely claim the prestige of the phrase "open source".
You are wrong about that. It's a file with numbers. Which makes it a database or dataset and very much protected by copyright. That's why licenses are needed. For the phone book, things like open street maps, and indeed AI models.
The fact that many people (myself included) routinely download and use models distributed under OSI approved licenses (Apache V2, MIT, etc.) makes that statement verifiably wrong. And yes, I do check the license of stuff that I use as I work with companies that care about such matters.
Now you know better.
Not every collection of numbers is a database, and a database is not the same thing as a dataset.
Databases have limited copyright-like protection in some places. Under TRIPS, that extends to only databases that are "creative by virtue of the selection or arrangement of their contents" or something along those lines. In the US they talk specifically about curation.
ML models do not meet either requirement by any reasonable interpretation.
The "source code" of an ML model is most reasonably interpreted as including all of the training data, which are never, ever available.
Now you know better.
[On edit: By the way, the people creating these works had better hope they're outside copyright, because if not, each one of them is a derivative work of (at least some large and almost impossible to identify subset of) its training data, so they need licenses from all the copyright holders of that training material, which few of them have or can get.]
If we stop unnecessarily anthropomorphizing software, I think it is plainly obvious these are derivative works. You take the training material, run it through a piece of software, and it produces an output based on that input. Just because the black box in the middle is big and fancy doesn't mean that somehow the output isn't a result of the input.
However, transformativeness is a factor in whether or not there is a fair-use exception for the derivative work. And these models are highly transformative, so this is a strong argument for their fair-use.
This is only true in jurisdictions that follow the sweat of the brow doctrine, where effort alone without creativity is considered enough for copyright. In other places, such as the USA, collections of facts are not copyrightable and a minimal amount of creativity is required for something to qualify as copyrightable. The phone book is an example that is often used, actually, to demonstrate the difference.
https://en.wikipedia.org/wiki/Sweat_of_the_brow
What criteria for copyright protection are they missing?
I'm personally comfortable calling a model "open source" if the license is compatible with the https://opensource.org/ definition.
The Llama models aren't. Some of the Mistral models are (the Apache 2 ones). Microsoft Phi-3 is - it's MIT.
Open source must include source material so that another can reproduce that the model. I would expect that to be a minimum.
I agree, but that can't happen with the vast majority of these models because they're trained on unlicensed data so they can't slap an open source license on the training data and distribute it.
I've decided to draw my personal line at Open Source Initiative compliance for the license they release the model itself under.
I respect the opinion that it's not truly open source unless they release the training data as well, but I've decided not to make that part of my own personal litmus test here.
My reasoning is that knowing something is "open source" helps me decide what I legally can or cannot do with it when building my own software. Not having access to the training data downs affect my legal rights, it just affects my ability to recompile myself. And I don't have millions of dollars of GPUs so that isn't so important to me, personally.
Tough beans? There's lots of actual software that can't be open source because it embeds stuff with incompatible restrictions, but nobody tries to redefine "open source" because of that.
... and, on a vaguely similar-flavored note, you'd better hope that the models you're using end up found to be noninfringing or fair use or something with respect to those "unlicensed data", because otherwise you're in a world of hurt. It's actually a lot easier to argue that the models aren't copyrightable than it is to argue that they're not derivative of the input.
You're allowed to draw your personal line about what you'll use anywhere you want, but that doesn't mean that you should try to redefine "open source" or support anybody who does.
It's certainly not true that models are not copyrightable; databases have copyright protection if creativity was involved in creating them.
That said, I don't think outputs of the model are derivative works of it, any more than the model is a derivative of its training data, so it's not clear to me they can actually enforce what you do with them.
Are you talking about https://en.wikipedia.org/wiki/Database_right or plain old copyright?
I'm no IP lawyer, but I've always thought that copyright put "requirements" on the artefact (i.e the threshold of originality), not the process.
In my jurisdiction we have database rights, meaning that you get IP protections for the artefact based on the work put into the process. For example a database of distances between adress pairs or something is probably not copyrightable, but can be protected under database rights if enough work was done to compile the data.
EDIT: Saw in another place in thread speaking about the https://en.wikipedia.org/wiki/Sweat_of_the_brow doctrine, relates to Database rights. (Neither of which notably are not applicable in the U.S)
These models are a product of much more creativity than simply a list of phone numbers in a phone book. I don't see how they wouldn't meet the modicum of creativity required for US copyright protection.
The software that creates the model is the product of creativity. The model itself is the product of mechanically applying that software to datasets that are (a) assembled with minimal, if any creativity, and (b) definitely not assembled with any eye to the specific form of the resulting model. The whole point is to get the software to form the model without you having to worry about what the result is going to look like. So you can't turn around and claim that the model is a creative work because of the choice of training data.
The only thing that's really specified about the model itself is its architecture, which is (1) dictated by function, and (2) usually deeply stereotyped.
A personal bugbear is the AI fascination with calling themselves open source, virtue signalling I guess. Open weights is exactly right. Source code and arguably more important datasets are both required to replicate the work, which is more in the spirit of open source (and science). I think Meta is especially egregious here, given their history.
Never underestimate the value of getting hordes of unpaid workers to refine your product. (See also React, others)
I'd prefer "false advertising" - it's more direct and without the culture war baggage.
"Open source" is perceived as a virtue, and their claim is false. Thus false virtue claim. Or... virtue signaling.
Indeed. The data is the main "information source" from which the model is trained.
I get the sentiment, but one of their models, albeit the worst one, is licensed under Apache without usage restrictions. The source to run the models is also open source.
When I read "open source" i thought they actually are doing open source instead of "open weights" this time. Surely they would expect to be called out on hackernews if they label it incorrectly...
Thanks for pointing that out @Hizonener
The name is a bit unfortunate given that Julia's most popular ML library is called Flux. See: https://fluxml.ai.
This library is quite well known, 3rd most starred project in Julia: https://juliapackages.com/packages?sort=stars.
It has been around since, at least, 2016: https://github.com/FluxML/Flux.jl/graphs/code-frequency.
There was a looong distracting thread a month ago about something similar, niche language, might have been Julia, had a package with the same name as $NEW_THING.
I hope this one doesn't stir as much discussion. It has 4000 stars, there isnt a large mass of people who view the world through the lens of "Flux is ML library". No one will end up in a "who is on first?" discussion because of it. If this line of argument is held sacrosanct, it ends up in an infinite loop until everyone gives up and starts using UUIDs.
Like the Go language that existed before Google Go.
It's named "Go!".
https://en.wikipedia.org/wiki/Go!_(programming_language)
Disclosure: I work at Google but not on the Go team.
Eagerly waiting for this to happen in the medication names space. :)
I think we've generally run out of names to give projects and need to start reusing names. Maybe use letters to disambiguate them.
Flux A is the ML library
Flux B is the T2I model
Flux C is the React library
Flux D is the physics concept of power per unit area
Flux E is the goo you put on solder
Don't forget Fl.ux, which was a very popular way to make "night shift" happen for more than a decade
f.lux
And flux https://fluxcd.io/ and flux https://formulae.brew.sh/formula/flux
i would give them a break, so many things exist in the tech sector that being completely original is basically impossible, unless you name your thing something nonsensical
also search engines are context aware, if your search history is full of julia questions, it will know what you're searching for
Also Flux is a now obsolete application architecture for ReactJS.
Congrats Burkay - the model is very impressive. One area I’d like to see improved in a flux v2 is knowledge of artist styles. Flux cannot respond to requests asking for paintings in the style of David Hockney, Norman Rockwell, Edgar Degas, — it seems to have no fine art training at all.
I’d bet that fine art training would further improve the compositional skills of the model, plus it would open up a range of uses that are (to me at least) a bit more interesting than just illustrations.
Have those artists given permission for their styles to be slurped up into a model?
Give me a sec, I will contact Edgar Degas with my telegraph.
Truly an API call I would pay for
Florentine art schools would like a word - they’ve been teaching painters by having them copy masters since the 16th century.
Does it respond to any names? I noticed SD3 removed all names to prevent recreating famous people but as a side effect lost the very powerful ability to infer styles from artist names too.
It's "just" another diffusion model, although a very good one. Those people are probably in there even if its text encoder doesn't know about them. So you can find them with textual inversion.
I'd suggest re-wording the blog post intro, it reads as if it was created by Fal.
Specific phrases to change:
(from the title)
This section also comes across as if you created it
Reads as if you're the creator
Thanks for the feedback! Made some updates.
Way better, nice
thanks for hosting the model! i created an account to try it out, you started emailing me with “important notice: low account balance - action required” and now it seems like there’s no way for me to unsubscribe or delete my account. is that the case? thanks!
It would be nice to understand limits of the free tier. I couldn't find that anywhere. I see pricing, but I'm generating images without swiping my credit card.
If it's unlimited or "throttled for abuse," say that. Right now, I don't know if I can try it six times or experiment to my heart's desire.
The unsubscribe links in your emails don't work
If you are using the dev model, the licence isn't open source.