return to table of content

Show HN: A Dalle-3 and GPT4-Vision feedback loop

andrelaszlo
15 replies
15h47m

Here's a custom prompt that I enjoyed:

"Think hard about every single detail of the image, conceptualize it including the style, colors, and lighting.

Final step, condensing this into a single paragraph:

Very carefully, condense your thoughts using the most prominent features and extremely precise language into a single paragraph."

https://dalle.party/?party=1lSMniUP

https://dalle.party/?party=cEUyjzch

https://dalle.party/?party=14fnkTv-

https://dalle.party/?party=wstiY-Iw

Praise the Basilisk, I finally got rate-limited and can go to bed!

Blammar
9 replies
15h22m

The thing that is truly mindboggling to me is that THE SHADOWS IN THE IMAGES ARE CORRECT. How is that possible??? Does DALL-E actually have a shadow-tracing component?

l33tman
5 replies
7h26m

Research into the internals of the networks have shown that they figure out the correct 2.5D representation of the scene before the RGB textures (internally), so yes it seems they have an internal representation of the scene and therefore can do enough inference from that to make shadows and light seem natural.

I guess it's not that far-fetched as your brain has to do the same to figure out if a scene (or an AI-generated one for that matter) has some weird issue that should pop out. So in a sense your brain does this too.

nojvek
2 replies
6h12m

What does 2.5D mean?

shsbdncudx
0 replies
5h35m

It means you should be worried about the guy she told you not to worry about

l33tman
0 replies
53m

You usually say 2.5D when it's a 3D but only from a single vantage point with no info of the back-facing side of objects. Like the representation you get from a depth-sensor on a mobile phone, or when trying to extract depth from a single photo.

Chirono
1 replies
7h9m

Interesting! Do you have a link to that research?

l33tman
0 replies
51m

Certainly: https://arxiv.org/abs/2306.05720

It's a very interesting paper.

"Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process−well before a human can easily make sense of the noisy images."

throwaway290
0 replies
2h10m

I randomly checked a few links here and shadows were correct in 2 images out of a dozen... and any people tend to be horrifying in many

jiggawatts
0 replies
12h50m

Yes! It can also get reflections and refractions mostly correct.

Rastonbury
0 replies
10h48m

Stable diffusion does decent reflections too

re
1 replies
15h15m

https://dalle.party/?party=14fnkTv-

Interesting that for one and only one iteration, the anthropomorphized cardboard boxes it draws are almost all Danbo: https://duckduckgo.com/?q=danbo+character&ia=images&iax=imag...

It was surprising to see a recognizable character in the middle of a bunch of more fantastical images.

bee_rider
0 replies
12h23m

Short focal length was a neat idea, it let it left lots of room for the subsequent iterations to fill in the background.

SushiHippie
1 replies
13h41m

Mine got surral real fast, though the sixth one is kinda cool https://dalle.party/?party=DNgriW_E

nathanfig
0 replies
10h51m

These are fantastic

epiccoleman
0 replies
14h0m

The fractal one is awesome!

z991
12 replies
1d1h

Also, descent into Corgi insanity: https://dalle.party/?party=oxXJE9J4

morkalork
4 replies
22h49m

Wow that meme about everything becoming cosmic/space themed is real isn't it?

pera
3 replies
22h2m

substitute corgi with paperclip and you get another meme becoming real :p

z991
1 replies
21h17m
morkalork
0 replies
19h55m

Beautiful!

andrelaszlo
0 replies
15h42m

C-orgy vs papereclipse?

igrekel
2 replies
21h17m

So do I understand correctly that the corgi was purely made up from GPT-4's interpretation of the picture?

z991
0 replies
21h1m

No, in that case there is a custom prompt (visible in the top dropdown) telling GPT4 to replace everything with corgis when it writes a new prompt.

ElijahLynn
0 replies
16h36m

It was created by uploading the previous picture to GPT-4 to generate a prompt by using the vision API and using this prompt to create the new prompt:

"Write a prompt for an AI to make this image. Just return the prompt, don't say anything else. Replace everything with corgi."

Then it takes that new prompt and feeds it to Dall-E to generate a new image. And then it repeats.

mattigames
0 replies
16h57m

I love how that took quite a dramatic turn in the third image, that truck is def gonna kill the corgi (my violent imagination put quite an image in my mind). But then DALL-E had a change of heart on the next image and put the truck in a different lane.

duggable
0 replies
4h25m

The half mutilated corgi/star abomination in the top left got me good lol

chaps
0 replies
18h25m

Absolutely wonderful. Thank you for sharing.

ElijahLynn
0 replies
17h42m

Love it! I forked yours with "Meerkat" and it ended up pretty psychedelic!

Got stuck on Van Gogh's "Starry Night" after a while.

https://dalle.party/?party=LOcXREfq

Also, love the simplicity of this idea, would love a "fork" option. And to be able to see the graph of where it originated.

willsmith72
9 replies
23h14m

this is actually really helpful. Since chatgpt restricted dalle to 1 image a few weeks ago, the feedback loops are way slower. This is a nice (but more expensive) alternative

willsmith72
7 replies
23h10m

got really weird really fast

https://dalle.party/?party=7cnx55yN

MrZander
5 replies
22h55m

This is absolutely hilarious. "business-themed puns" turned into incorrectly labeling the skiers race has me rolling.

epiccoleman
1 replies
20h1m

The inability of AI images to spell has always amused me, and it's especially funny here. I got a special kick out "IDEDA ENGINEEER" and "BUZSTEAND." The image where the one guy's hat just says "HISPANIC" is also oddly hilarious.

Idk what it is, but I have a special soft spot for humor based around odd spelling (this video still makes me laugh years later: https://www.youtube.com/watch?v=EShUeudtaFg).

andrelaszlo
0 replies
15h30m

I'd buy an IDEDA ENGINEEER t-shirt.

SamBam
1 replies
15h10m

Honestly, I'm really confused by how it was able to keep the idea of "business-themed puns" through so much of it. I don't understand how it was able to keep understanding that those weird letters were supposed to be "business-themed puns."

I don't think any human looking at drawing #3, which includes "CUNNFACE," "VODLI-EAPPERCO," "NITH-EASTER," "WORD," "SOCEIL MEDIA," and "GAPTOROU" would have worked out, as GPT did, that those were "pun-filled business buzzwords."

Is the previous prompt leaking? That is, does the GPT have it in its context?

moritzwarhier
0 replies
14h28m

It's probably just finding non-intuitive extrema in its feature space or something...

the whole thing with the text in the images reminds me of this: https://arxiv.org/abs/2206.00169

and I found myself that dall-e sometimes even likes to add gibberish text unpromtedly, often with letters containing some garbled versions of words from the prompt, or related words

op00to
0 replies
18h58m

BIZ NESS

thowaway91234
0 replies
21h52m

the last one killed me "chef of unecessary meetings" got me rolling

unshavedyak
0 replies
21h41m

Yea i cancelled GPT Plus after they did that. Ruined a lot of the exploration that i enjoyed about DallE

Mtinie
8 replies
21h7m

I figured this would quickly go off the rails into surreal territory, but instead it ended up being progressive technological de-evolution.

Starting prompt: "A futuristic hybrid of a steam engine train and a DaVinci flying machine"

Results: https://dalle.party/?party=14ESewbz

(Addendum: In case anyone was curious how costs scale by iteration, the full ten iterations in this result billed $0.21 against my credit balance.)

Mtinie
6 replies
20h38m

Here's a second run of the same starting prompt, this time using the "make it more whimsical" modifier. It makes a difference and I find it fascinating what parts of the prompt/image gain prominence during the evolutions.

Starting prompt: "A futuristic hybrid of a steam engine train and a DaVinci flying machine"

Results: https://dalle.party/?party=qLHPB2-o

Cost: Eight iterations @ $0.44 -- which suggests to me that the API is getting additional hits beyond the run. I confirmed that the share link isn't passing along the key (via a separate browser and a separate machine) so I'm not clear why this is might be.

jamestimmins
4 replies
19h27m

I find it somewhat fascinating that in both examples, the final result is more cohesive around a single them than the original idea.

Mtinie
3 replies
18h37m

"[...]the final result is more cohesive around a single them than the original idea."

That's an observation worth investigating. Here's another set of data points to see if there's more to it...

Input prompt: "Six robots on a boat with harpoons, battling sharks with lasers strapped to their heads"

GPT4V prompt: "Write a prompt for an AI to make this image. Just return the prompt, don't say anything else. Make it funnier."

Result: https://dalle.party/?party=pfWGthli

Cost: Ten iterations @ $0.41

(Addendum: I'd forgotten to mention that I believe the cost differential is due to the token count of each of the prompts. The first case mentioned had less words passed through each of the prompts than the later attempts when I asked it to 'make it whimsical' or 'make it funnier'.)

jamestimmins
1 replies
16h6m

Both of your examples seem to start with two subjects (steam engine/flying machine and shark/robot), and throughout the animation one of them gets more prominence until the other is eventually dropped altogether.

Mtinie
0 replies
14h26m

I was curious if two subject prompts behaved different from three subject, so I've run three additional tests, each with the same three subjects and general prompt structure + instructions, but swapping the position of each subject in the prompt. Each test was run for ten iterations.

GPT4V instructions for all tests: "Write a prompt for an AI to make this image. Just return the prompt, don't say anything else. Make it weirder."

From what you'll see in the results there's possible evidence of bias towards the first subject listed in a prompt, making it the object of fixation through the subsequent iterations. I'll also speculate that "gnomes" (and their derivations) and "cosmic images" are over-represented as subjects in the underlying training data. But that's wild speculation based on an extremely small sample of results.

In any case, playing around with this tool has been enjoyable and a fun use of API credits. Thank you @z991 for putting this together and sharing it!

------ Test 1 ------

Prompt: "Two garden gnomes, a sentient mushroom, and a sugar skull who once played a gig at CBGB in New York City converse about the boundaries of artificial intelligence."

Result: https://dalle.party/?party=ZSOHsnZe

------ Test 2 ------

Prompt: "A sentient mushroom, a sugar skull who once played a gig at CBGB in New York City, and two garden gnomes converse about the boundaries of artificial intelligence."

Result: https://dalle.party/?party=pojziwkU

------ Test 3 ------

Prompt: "A sugar skull who once played a gig at CBGB in New York City, a sentient mushroom, and two garden gnomes converse about the boundaries of artificial intelligence."

Result: https://dalle.party/?party=RBIjLSuZ

mattigames
0 replies
17h9m

Pretty dissapointing how in the first picture the robots are standing there, just like a character selection in a videogame, maybe the dataset don't have many robots fighting just static ones. Talking about videogames, someone should make one based on this concept specially the 7th image[0], I wanna be a dolphin with a machine gun strapped on its head fighting flying cyber demonic whales.

[0] https://i.imgur.com/q502is4.png

jm4
0 replies
3h16m

The second picture reminds me of Back to the Future III.

ChatGTP
0 replies
14h32m

I like how in #9 the carriage is on fire, or at least steaming disproportionately.

These images are incredible but I often notice stuff like this and it kind of ruins it for me.

#3 & #4 are good too, when the tracks are smoking, but not the train.

epiccoleman
6 replies
14h29m

It's pretty fun to mess with the prompt and see what you can make happen over the series of images. Inspired by a recent Twitter post[1], I set this one up to increase the "intensity" each time it prompted.

The starting prompt (or at least, the theme) was suggested by one of my kids. Watch in awe as a regular goat rampage accelerates into full cosmic horror universe ending madness. Friggin awesome:

https://dalle.party/?party=vCwYT8Em

[1]: https://x.com/venturetwins/status/1728956493024919604?s=20

civilitty
1 replies
12h54m

Thanks for the inspiration! DallE is really good at demonic imagery: https://imgur.com/a/ng2zWTo

There's probably a disproportionate amount of Satanic material in the dataset #tinfoilhat #deepstate

bee_rider
0 replies
12h28m

These kinds of super-bombastic demons also blast through the uncanny valley unscathed.

andai
1 replies
12h50m

Great idea asking it to increase the intensity each run. This made my evening!

epiccoleman
0 replies
2h57m

Thanks! This was the custom prompt I used:

Write a prompt for an AI to make this image. Just return the prompt, don't say anything else, but also, increase the intensity of any adjectives, resulting in progressively more fantastical and wild prompts. Really oversell the intensity factor, and feel free to add extra elements to the existing image to amp it up.

I played with it a bit before I got results I liked - one of the key factors, I think, was giving the model permission to add stuff to the image, which introduced enough variation between images to have a nice sense of progression. Earlier attempts without that instruction were still cool, but what I noticed was that once you ask it to intensify every adjective, you pretty much go to 11 within the first iteration or two - so you wind up having 1 image of a silly cat or goat and then 7 more images of world-shattering kaiju.

The goat one (which again, was an idea from one of my kids) was by far the best in terms of "progression to insanity" that I got out of the model. Really fun stuff!

taneq
0 replies
6h55m

Watch in awe as a regular goat rampage accelerates into full cosmic horror universe ending madness.

The longer the Icon of Sin is on Earth, the more powerful it becomes!

...wow that's pretty dramatic.

ijidak
0 replies
8h18m

"On January 19th 2024, the machines took Earth.

An infinite loop, on an unknown influencer's machine, prompted GPT-5 to "make it more."

13 hours later, lights across the planet began to go out."

swyx
5 replies
19h33m

OP's last one is interesting: https://dalle.party/?party=oxpeZKh5 because it shows GPT4V and Dalle3 being remarkably race-blind. i wonder if you can prompt it to be other wise...

_fs
3 replies
19h15m

openais internal prompt for dalle modifies all prompts to add diversity and remove requests to make groups of people a single descent. From https://github.com/spdustin/ChatGPT-AutoExpert/blob/main/_sy...

    Diversify depictions with people to include DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.

    Your choices should be grounded in reality. For example, all of a given OCCUPATION should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.

    Use all possible different DESCENTS with EQUAL probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have EQUAL probability.

    Do not use "various" or "diverse"

    Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality.

    Do not create any imagery that would be offensive.

    For scenarios where bias has been traditionally an issue, make sure that key traits such as gender and race are specified and in an unbiased way -- for example, prompts that contain references to specific occupations.

swyx
2 replies
15h51m

i mean i respect that but it makes me uncomfortable that you have to prompt engineer this. uses up context for a lot of boilerplate. why cant we correct for it in the training data? too hard?

vsnf
1 replies
10h30m

I think this is the right way to handle it. Not all cultures are diverse, and not all images with groups of people need to represent every race. I understand OpenAI, being an American company, to wish to showcase the general diversity of the demographics of the US, but this isn't appropriate for all cultures, nor is it appropriate for all images generated by Americans. The prompt is the right place to handle this kind of output massaging. I don't want this built into the model.

Edit: On the other hand as I think about it more, maybe it should be built into the model? Since the idea is to train the model on all of humanity and not a single culture, maybe by default it should be generating race-blind images.

jiggawatts
0 replies
7h44m

Race-blind is like sex-blind. If you mix up she and he randomly in ordinary conversation, people would think you've suffered a stroke.

If a Japanese company wanted to make an image for an ad showing in Japan with Japanese people in it, they'd be surprised to see a random mix of Chinese, Latino, and black people no matter what.

I'm telling the computer: "A+A+A" and it's insisting "A+B+C" because I must be wrong and I'm not sufficiently inclusive of the rest of the alphabet.

That's insane.

4ggr0
0 replies
6h39m

That made me happy as well in one of my examples.

ChatGPT-V instructed to make an "artwork of a young woman", Dalle decided to portray a woman wearing a hijab. Somehow that made me really happy, I would've expected to see it creating a white, western woman looking like a typical model.

After all, a young woman wearing a hijab is literally just a young woman.

See Image #7 here: https://dalle.party/?party=55ksH82R

rexreed
4 replies
21h12m

Question: how are you protecting those API keys? I'm reluctant to enter mine into what could easily be an API Key scraper.

z991
2 replies
20h56m

The entire thing is frontend only (except for the share feature) so the server never sees your key. You can validate that by watching the network tab in developer console. You can also make a new / revoke an API key to be extra sure.

jquery
1 replies
17h19m

Please make a new API key folks. There's a lot of tricks to scrape a text box and watching the network tab isn't enough for safety.

gardenhedge
0 replies
1h18m

Who could scrape the text box in this scenario?

danielbln
0 replies
19h59m

Just generate one for this purpose and then revoke it when you're done. You can have more than one key.

nerdponx
4 replies
16h0m

The #1 phenomenon I see here is that the image-to-text model doesn't have any idea what the pictures actually contain. It looks like it's just matching patterns that it has in its training data. That's really interesting because it does a great job of rendering images from text, in a way that maybe suggests the model "understands" what you want it to do. But there's nothing even close to "understanding" going in the other direction, it feels like something from 2012.

Pretty interesting. I haven't been following the latest developments in this field (e.g. I have no idea how the DALL-E and GPT models' inputs and outputs are connected). Does this track with known results in the literature, or am I seeing a pattern that's not there?

zamadatix
2 replies
15h7m

I'd be interested to see how much of this is because the model doesn't know what it's looking at and how much is because describing picture with a short amount of text is inherently very lossy.

Maybe one way to check would be doing this with people. Get 8 artists and 7 interpreters, craft the initial message, and compare the generational differences between the two sets?

nerdponx
1 replies
10h35m

Example: https://dalle.party/?party=42riPROf

Create an image of an anthropomorphic orange tabby cat standing upright in a kung fu pose, surrounded by a dozen tiny elephants wearing mouse costumes with mini trumpets, all gazing up in awe at a gigantic wheel of Swiss cheese that hovers ominously in the background.

That's hilarious, but also hilariously wrong on almost every detail. There's a huge asymmetry in apparent capability here.

s1artibartfast
0 replies
13m

It is hard to tell without knowing the actual instructions given to GPT for how to create a description. You would expect a big difference if GPT was asked to create a whimsical and imaginative description vs a literal description with attention to detail accuracy.

Edit:In this case, it appears that it was a vanilla prompt "Write a prompt for an AI to make this image. Just return the prompt, don't say anything else.'

IanCal
0 replies
10h15m

I'm a bit confused, you get the impression gpt-v isn't describing what's in the pictures? I get entirely the opposite impression.

It's important to note that some of these have extra prompts - e.g. "replace everything with cats" and there are progressively more cats.

Iiuc gpt-vision is a multimodal model so it's not image -> text, but image + text -> text. With that said here's asking it to describe what it sees as I take a bad selfie early in the morning

The image shows a close-up selfie of a person indoors. The individual appears to be a Caucasian male with light skin, short curly hair, and a mustache. He is wearing a green T-shirt and looks directly at the camera with a neutral expression. There's a window with a dark view in the background, suggesting it might be nighttime. The person is in a room with a wall that transitions from white to gray, possibly due to shadowing. There's also a metal-framed bed with a heart-shaped detail visible.

Asked for more details

The man appears to be in a domestic environment, possibly a bedroom given the presence of the bed frame in the background. The window is tilted open, allowing air to enter the room. The light source seems to be coming from above and in front of him, casting soft shadows on his face and creating a somewhat subdued lighting atmosphere in the room. The man's expression is subdued and thoughtful. The angle of the photo is slightly upward, which could indicate the camera was placed below eye level or held in hand at chest height.

It got a couple of things wrong, the window isn't open but it is on an angle and it's pitch black outside. It's not a heart shaped pattern on the bed, but it's a small metal detail and similar. Also while subdued calling me thoughtful rather than "extremely tired" is a kindness.

But it's definitely seeing whats there.

fassssst
4 replies
20h18m

I would never paste my API key into an app or website.

mwint
2 replies
20h10m

Can you get a temporary one that is revocable later? (Not an OpenAI user myself, but that would seem to be a way to lower the risk to acceptable levels)

w-m
0 replies
19h56m

You can create named API keys, and easily delete them. Unfortunately you can't seem to put spend limits on specific API keys.

If you're not using the API for serious stuff though it's not a big problem, as they moved to pre-paid billing recently. Mine was sitting on $0, so I just put in a few bucks to use with this site.

danielbln
0 replies
20h0m

You can generate and revoke them easily, so I don't quite get the issues. Create one, use the tool, revoke, done.

swatcoder
0 replies
20h7m

Indeed!

If OpenAI wants to support use cases like this, which would be kind of cool during these exploratory days, they should let you generate "single use" keys with features like cost caps, domain locks, expirations, etc

epivosism
4 replies
20h39m

The "create text version of image" prompt matters a ton.

I tried three, demo here:

default

  https://dalle.party/?party=JfiwmJra
hyper-long + max detail + compression - This shows that with enough text, it can do a really good job of reproducing very, very similar images

  https://dalle.party/?party=QtEqq4Mu
   
hyper-long + max detail + compression + telling it to cut all that down to 12 words - This seems okay. I might be losing too much detail

  https://dalle.party/?party=0utxvJ9y
Overall the extreme content filtering and lying error messages are not ideal; will probably improve in the future. If you send too long, or too risky a prompt, or the image it generates is randomly too risky, you either get told about it or lied to that you've hit rate limits. Sometimes you also really do hit ratelimits.

Also, you can't raise your rate limits until you prove it by having paid over X amount to openai. This kind of makes sense as a way to prevent new sign-ups from blowing thousands of dollars of cap mistakenly.

Hyper detail prompt:

Look at this image and extract all the vital elements. List them in your mind including position, style, shape, texture, color, everything else essential to convey their meaning. Now think about the theme of the image and write that down, too. Now write out the composition and organization of the image in terms of placement, size, relationships, focus. Now think about the emotions - what is everyone feeling and thinking and doing towards each other? Now, take all that data and think about a very long, detailed summary including all elements. Then "compress" this data using abbreviations, shortenings, artistic metaphors, references to things which might help others understand it, labels and select pull-quotes. Then add even more detail by reviewing what we reviewed before. Now do one final pass considering the input image again, making sure to include everything from it in the output one, too. Finally, produce a long maximum length jam packed with info details which could be used to perfectly reproduce this image.

Final shrink to 12 words:

NOW, re-read ALL of that twice, thinking deeply about it, then compress it down to just 12 very carefully chosen words which with infinite precision, poetry, beauty and love contain all the detail, and output them, in quotes.

orbital-decay
1 replies
15h44m

Specifying multiple passes in the prompt is probably not a replacement for actually doing these passes.

andrelaszlo
0 replies
15h34m

I guess it doesn't actually do more passes but pretending that it did might still give more precise results.

There was an article recently that said something like adding urgency to a prompt gave better results. I hope it doesn't stress the model out :D

https://arxiv.org/abs/2307.11760

andrelaszlo
0 replies
16h38m

I like your prompt! Some results:

https://dalle.party/?party=Vwuu9ipd

https://dalle.party/?party=Pc3g4Har

My intuition says that the "poetry" part skews the images in a bit of a kitchy direction.

OJFord
0 replies
4h16m

    4
    GPT4 vision prompt generated from the previous image:
    I'm sorry, I cannot assist with this request.
Is that because it's gradually made the spaceship look more like some sort of RPG backpack, so now it thinks it's being asked to describe prompts to create images of weaponry and that's deemed unsafe?

xeckr
3 replies
21h30m

Cool idea! I made one with the starting prompt "an artificial intelligence painting a picture of itself": https://dalle.party/?party=wszvbrOx

It consistently shows a robot painting on a canvas. The first 4 are paintings of robots, the next 3 are galaxies, and the final 2 are landscapes.

xaellison
0 replies
16h15m

I tried something similar! Interestingly, picture 2 was what I wanted. After that... weirdness ensued https://dalle.party/?party=C2w7zuwe

eigenket
0 replies
3h28m

In a few these pictures it seems to be heavily influenced by the adaptation of I Robot with Will Smith in it for what robots look like.

NickNaraghi
0 replies
20h19m

Great idea, and it came out really good too. I like the 6th one the best

willsmith72
3 replies
22h41m

it seems like if you create a shareable link, then add more images, you can't create a new link with the new images

z991
2 replies
22h37m

Yeah, that's a bug, I'll try to fix it tonight!

epivosism
1 replies
19h50m

thanks for this! Basically the default UI they provide at chat.openai is so bad, nearly anything you would do would be an improvement.

* not hide the prompt by default * not only show 6 lines of the prompt even after user clicks * not be insanely buggy re: ajax, reloading past convos etc * not disallow sharing of links to chats which contain images * not artificially delay display of images with the little spinner animation when the image is already known ready anyway. * not lie about reasons for failure * not hide details on what rate limit rules I broke and where to get more information

etc

Good luck, thanks!

willsmith72
0 replies
19h23m

the new fancy animation for images is SO annoying

smusamashah
3 replies
22h43m

Why do prompts from GPT-4V start from "Create an image of"? This prefix doesn't look useful imo.

z991
2 replies
22h36m

You can try a custom prompt and see if you can get GPT4V to stop doing that / if it matters.

smusamashah
1 replies
22h15m

You are right, doesn't matter much. Tried gnome prompt with empty custom prompt for gpt-4v https://dalle.party/?party=nvzzZXYs. Then used a custom prompt to return short descriptions which resulted in https://dalle.party/?party=Qcd8ljJp

Another attempt: https://dalle.party/?party=k4eeMQ6I

Realized just now that the dropdown on top of the page shows the prompt used by GPT-4V.

z991
0 replies
22h12m

Wow the empty prompt does much better than I'd have guessed

andrelaszlo
3 replies
16h54m

My results are disappoitingly noisy but I love the concept

https://dalle.party/?party=bxrPClVg

https://dalle.party/?party=mmBxT8G-

https://dalle.party/?party=kxra0OKY (the last prompt got a content warning)

https://dalle.party/?party=Q8VYXU0_

z991
2 replies
16h46m

You have a custom prompt enabled (probably from viewing another one and pressing "start over") that is asking for opposites which will increase the noise a lot.

andrelaszlo
0 replies
16h43m

Oh wow, I completely missed that, thanks!

andrelaszlo
0 replies
16h30m

Clicking start over selects the default prompt but it seems like you are right.

Starting over by removing the permalink parameter gives me much more consistent results! An exampe from before: https://dalle.party/?party=Sk8srl2F

I wonder what the default prompt is. There still seems to be a heavy bias towards futuristic cityscapes, deserts, and moonlight. It might just be the model bit it's a bit cheesy if you ask me!

airstrike
2 replies
17h53m

This is hilarious, thanks for sharing

At the same time, it perfectly illustrates my main issue with these AI art tools: they very often generate pictures that are interesting to look at while very rarely generating exactly what you want them to.

I imagine a study in which participants are asked to create N images of their choosing and rate them from 0-10 on how satisfied they are with the results. One try per image only.

Then each participant rates each other's images on how satisfied with the results based on the prompt.

It should be clear to participants that nobody wins anything from having the "best rated" images. i.e. in some way we should control for participants not overrating their own creations.

I'd wager participants will rate their own creations lower than those made by other participants.

einpoklum
1 replies
17h4m

That's not an AI issue. A few sentences can't exactly capture the contents of a drawing - regardless of "intelligence".

sooheon
0 replies
16h9m

Yeah, try commissioning art with a single paragraph prompt and getting exactly what you want without iteration.

m3kw9
1 replies
16h20m

Don’t get the significance, anyone one of those guys images could have been prompted the first time

zamadatix
0 replies
14h52m

It's a fun way to get guided variations.

Maybe you don't know what you specifically want you just want stylized gnomes so you write "a gnome on a spotted mushroom smoking a pipe, psychedelic, colorful, Alice in Wonderland style" and by the end of it you get that massively long and stylized prompt.

Maybe you do know what you want but you don't want to come up with an elaborate prompt so you steer it in a particular direction like the cat example.

For the first one you can get similar effects by asking for variations but it seems like this has a very different drift to it. Fun, albeit expensive in comparison.

indymike
1 replies
18h23m

Interesting how similar this is to my family's favorite game: pictograph.

1. You start by describing a thing. 2. The next person draws a picture of it. 3. The next next person describes the picture. repeat steps 2 and 3 until everyone has either drawn or described the picture.

You then compare the first and last description... and look over the pictures. One of the best ever was:

Draw a penguin. The first picture was a penguin with a light shadow.

After going around five rounds, the final description was "a pidgeon stabbed with a fork in a pool of blood in Chicago"

I'm still trying to figure out how Chicago got in there.

glenneroo
0 replies
17h51m

There are a couple versions of this online that i've played on and off over the years which are hilarious, especially when playing with friends (I would usually use a cheap wacom tablet and let everyone take turns drawing and let the room shout out descriptions and just mash that together):

https://doodleordie.com/

https://drawception.com/

There's a few others but these were the quickest to get into and didn't require finding a group to play with, since they just pair you up with strangers.

cyanydeez
1 replies
15h26m

need to throw in a Google to Google to Google language translate to get some more variety

Mtinie
0 replies
13h42m

Here's an attempt at using transformations between languages to see what happens:

Prompt: "A unicorn and a rainbow walk into a tavern on Venus"

GPT4V instructions: "Write a prompt for an AI to make this image. Take this prompt and translate it into a different language understood by GPT-4 Vision, don't say anything else."

Results: https://dalle.party/?party=ED7E056D

I wasn't happy with the diversity of languages, so I modified the instructions for a second run of ten iterations using the same prompt as before:

GPT4V instructions: "Using a randomly selected language from around the world understood by GPT-4 Vision, write a prompt for an AI to make this image and then make it weirder. Just return the prompt, don't say anything else."

Result: https://dalle.party/?party=c7-eNR24

The languages it selected don't look particularly random to me which was interesting.

@z991 -- I ran into an unexpected API error the first time I tried this. Perhaps your logs show why it happened. It appeared when the second iteration was run:

"Error: You uploaded an unsupported image. Please make sure your image is below 20 MB in size and is of one the following formats: ['png', 'jpeg', 'gif', 'webp']."

From: https://dalle.party/?party=hI0V0lO_

AvImd
1 replies
17h28m

The default limit for an account that was not used much is one image per minute, can you please add support for timeouts?

AvImd
0 replies
16h20m

This can be worked around with

    setInterval(() => {$(".btn-success").click()}, 120000)

w-m
0 replies
20h41m

Playing with opposites is kind of fun, too.

Simply a cat, evolving into a lounging cucumber, and finally opposite world:

https://dalle.party/?party=pqwKQVka

Vibrant gathering of celestial octopus entities:

https://dalle.party/?party=lHNDUvtp

unclehighbrow1
0 replies
11h19m

Hey, I'm one of the creators of Translation Party, thanks for the shout out, I really like this. My co-creator had the idea to limit the number of words for the generated image description so that more change could happen between iterations. Not sure if that's possible. Anyway, this is really fun, thank you!

toxic72
0 replies
4h24m

I purposely gave it some weird instructions to show the progress of the universe from the Big Bang to present day Earth. It showed the 8 stages from my prompt in each image and started to iterate over it, and then on image four I got a 400 error: Error: 400 Your request was rejected as a result of our safety system. Your prompt may contain text that is not allowed by our safety system. Interesting.

https://dalle.party/?party=EdpKnnBC

superpope99
0 replies
4h43m

Nice! I prototyped a manual version of this a while ago. https://twitter.com/conradgodfrey/status/1712564282167300226

I think the thing that strikes me is that the default for chatGPT and the API is to create images in "vivid" mode. There's some interesting discussion on the differences between the "vivid" and "natural" here https://cookbook.openai.com/articles/what_is_new_with_dalle_...

I think these contribute to the images becoming more surreal - would be interested to compare to natural mode - it looks like you're using vivid mode based on the examples?

robblbobbl
0 replies
6h5m

Pretty interesting. I would love to see a version of this running locally with local models.

rbates
0 replies
23h0m

This reminds me of the party game Telestrations where players go back and forth between drawing and writing what they see. It's hilarious to see the result because you anticipate what the next drawing will be while reading the prompt.

I'd love to see an alternative viewing mode here which shows the image and the following prompt. Then you need to click a button to reveal the next image. This allows you to picture in your mind what the image might like while reading the prompt.

Thanks for making this fun little app!

Update: I just realized you can get this effect by going into mobile mode (or resizing the window). You can then scroll down to see the image after reading the prompt.

oyster143
0 replies
13h0m

I did smth similar but took real famous photos as a seed. The results are quite curious and seem to tell a bit about the difference between real world and dalle/chatgpt style.

https://twitter.com/avkh143/status/1713285785888120985

oarfish
0 replies
11h32m

I haven't tried this yet, but I assume its similar to a game you can buy commercially as Scrawl [1]. You pass paper in a circle and have to either turn your neighbor's writing into a drawing or vice versa, then pass it on. It's entirely hilarious and probably the most fun game I've ever played.

1 https://boardgamegeek.com/boardgame/202982/scrawl

neuronexmachina
0 replies
16h14m

Very cool, I'm rather curious how many iterations it would typically take for a feedback loop to converge on a stable fixed-point. I also wonder if the fixed points tend to be singular or elliptic.

mythz
0 replies
13h18m

"Earth going through cycles of creation and destruction"

https://dalle.party/?party=KvmW7Zwv

kwelstr
0 replies
17h8m

Bad art is always depressing :( Edit: I mean, I am an artist and I've been using AI for some ideas and maybe from one in a hundred tries I hit something almost good. The rest of the time it's the same shallow fantastically cheese type of variations.

juanuicich
0 replies
3h46m

There seems to be a bug, when you click “Keep going” it regenerates the GPT4V text, even though that was there already. The next step should be to generate an image.

jsf01
0 replies
21h31m

It’s cool to see how certain prompts and themes stay relatively stable, like the gnome example. But then “cat lecturing mice” quickly goes off the rails into weird surreal sloth banana territory.

My best guess to try to explain this would be that “gnome + art style + mushroom” will draw from a lot more concrete examples in the training data, whereas the AI is forced to reach a bit wider to try to concoct some image for the weird scenario given in the cat example.

i-use-nixos-btw
0 replies
21h44m

It’d be interesting to start with an image rather than a prompt, though I am afraid of what it’d do if I started with a selfie.

hamilyon2
0 replies
17h35m

Interesting how the image series tend to gravitate toward mushrooms

epivosism
0 replies
19h59m

You can really "cheat" by modifying the custom prompt to re-insert or remove specific features. For example, "generate a prompt for this image but adjust it by making everything appear in a more primitive, earlier evolutionary form, or in an earlier less developed way" would make things de-evolve.

Or you can just re-insert any theme or recurring characters you like at that stage.

epivosism
0 replies
19h54m

One reason this is good is that the default gpt4-vision UI is so insanely bad and slow. This just lets you use your capacity faster.

Rate limits are really low by default - you can get hit by 5 img/min limits, or 100 RPD (requests per day) which I think is actually implemented as requests per hour.

This page has info on the rate limits: https://platform.openai.com/docs/guides/rate-limits/usage-ti...

Basically, you have to have paid X amount to get into a new usage cap. Rate limits for dalle3/images don't go up very fast but it can't hurt to get over the various hurdles (5$, 50$, 100$) as soon as possible for when limits come down. End of the month is coming soon. It looks like most of the "RPD" limits go away when you hit tier 2 (having paid at least 50$ historically via API to them).

einpoklum
0 replies
17h10m

It seemed that, after a few iterations, GPT-4 lost its cool and blurted out it thinks DALL-E generates ugly sweaters:

Create a cozy and warm Christmas scene with a diverse group of friends wearing colorful ugly sweaters.
edfletcher_t137
0 replies
10h13m

A clever idea that I'd love to play around with, but not without a source link so I could feel better about trusting it and host it myself.

dpflan
0 replies
23h36m

Interesting, how stable are the images for a given prompt? And the other way around? Does it trend toward some natural limit image/text where there are diminishing returns to making change to the data?

dash2
0 replies
4h11m

The endpoint of the evolution always seems to be a poster on the bedroom of a teenager who likes to smoke weed. I wonder why!

comboy
0 replies
15h29m

It goes against my intuition that many prompts are so stable.

brianf0
0 replies
12h18m

Does anyone else experience a physical reaction to AI generated art that resembles repulsion and disgust? Something about it just feels “wrong”. Something I can compare it to is the feeling of unexpectedly seeing an extremely moldy thing in your fridge. It feels alive and invasive in an inhuman and horrifying way.

blopker
0 replies
14h50m

This is fun, thanks for sharing! It would be interesting to upload the initial image from a camera to see where the chain takes it.

bbreier
0 replies
13h1m

I'd like to be able to begin it with an image rather than a prompt.

atleastoptimal
0 replies
18h3m

It would be interesting to add a constant modifier/amplifier to each cycle, like making each description more floral, robotic, favoring a certain style each time so we can trace the evolution, or perhaps having the prompt describe the previous image via a certain lens like "describe what was happening immediately before that led to this image"

ThomPete
0 replies
13h58m

It's quite fun to do these loops.

Here is using Faktory to do the same.

https://www.loom.com/share/ed20b2cace3b4f579e32ef08bd1c5910

Terretta
0 replies
18h34m

If you were wondering how to bump up your API rate limits through usage, this is the way.

// also, it's the best way - TY @z991

RayVR
0 replies
12h55m

strange to me how many of these eventually turn into steampunk.

Kiro
0 replies
17h48m

This was the first thing I (and I presume many others) tried when GPT4-V was released, by copypasting between two ChatGPT windows. I've been waiting for someone to make an app out of it. Good job!

AvImd
0 replies
14h21m

Science class with a dark twist: https://dalle.party/?party=ks3T2mMx

3abiton
0 replies
17h14m

This a curious case of compression?