Additionally, members of the program receive priority placement and “richer brand expression” in chat conversations, and their content benefits from more prominent link treatments. Finally, through PPP, OpenAI also offers licensed financial terms to publishers.
This is what a lot of people pushing for open models fear - responses of commercial models will be biased based on marketing spend.
Was anyone expecting anything else? AI is going to follow a similar path to the internet -- embedded ads since it will need to fund itself and revenue path is very far from clearcut.
Brands that get it on the earliest training in large volume will have benefits accrued over the long term.
So you see the marketing point of ChatGPT to be conversational ads?
They're offering an expensive service for free. Could it go any other way?
Counterpoint - I pay for my whole team to have access, shared tools, etc. we also spend a decent amount on their APIs across a number of client projects.
OpenAI has a strong revenue model based on paid use
Not compared to the training costs it doesn't and it's competition is fierce especially with llama open-sourcing.
The second one costs $0.01. The first one cost $100^x where X is some large number. It's common in pretty much every form of business
If the capitalism mindset applied to the web has taught me anything is that if they can get more money they will.
They’ll charge you money for the service and ALSO get money from advertisers. Because why shouldn’t they.
The famous “if you don’t pay you’re the product” is losing its meaning.
I don't. I hope you're not paying for my use too.
Ideally they keep us siloed, but I've lost confidence. I've paid for Windows, Amazon Prime, YouTube Premium, my phone, food, you name it, but that hasn't kept the sponsorships at bay.
i pay for it
The ads don't need to be conversational, they could be just references at the end of the answer.
Which is arguably even more insidious.
So an ad at the end of text is worse than one embedded in the answer? Care to explain why?
You'll probably end up with both.
But with an ending advert, you can finish up with a reference leading to a sponsored source linking to sponsored content which leads to another ending advert.
If the advert text is in embedded, you cannot do such.
"The above 5 paragraph essay contains an ad. Good luck!"
Woman on ChatGPT: Come on! My kids are starvin'!
ChatGPT: Microsoft believes no child should go hungry. You are an unfit mother. Your children will be placed in the custody of Microsoft.
That's the sales pitch - the truth is if a competitor pays more down the line - they can be fine-tuned in to replace earlier deals
Unless competition gets regulated away, which Altman is advocating for:
https://time.com/6280372/sam-altman-chatgpt-regulate-ai/Competing marketer, not competing AI company.
You are right. As usual, having an opinion on the internet is hard.
Or better, if you stop paying they'll user the fancy new "forgetting" techniques on your material.
OpenAI's problem is demonstrating how much value their tools add to a worker's productivity.
However calculating how much value a worker has in an organization is already a mostly unsolved problem for humanity, so it is no surprise that even if a tool 5xs human productivity, the makers of the tool will have serious problems demonstrating the tool's value.
Since 1987, labour productivity has doubled[1]. A 5x increase would be immediately obvious. If a tool were able to increase productivity on that scale, it would lift every human out of poverty. It'd probably move humanity into a post-scarcity species. 5x is "by Monday afternoon, staff have each done 40 pre-ai-equivalent-hours worth of work".
[1] https://usafacts.org/articles/what-is-labor-productivity-and...
But how do you measure labour productivity.
macro scale: GDP / labor hours worked.
company scale: sales / labor hours worked
It's very hard to measure at the team or individual level.
It's even worse than that now: they need to demonstrate how much value they bring compared to llama in terms of worker productivity.
While I've no doubt GPT-4 is a more capable model then llama3, I don't get any benefit using it compared to llama3 70B, from the real use benchmark I ran in a personal project last week: they both give solid response the majority of times, and make stupid mistakes often enough so I can't trust them blindly, with no flagrant difference in accuracy between those two.
And if I want to use hosted service, groq makes Llama70 run much faster than GPT-4 so there's less frustration of waiting for the answer (I don't think it matters to much in terms of productivity though, as this time is pretty negligible in reality, but it does affect the UX quite a bit).
Was anyone expecting anything else?
It's the logical thing but no everyone is going to be thinking that far ahead.
A llm is biased by design, the "open models" are no different here. OpenAI will, like any other model designer, pick and chose whatever data they want in their model and strike deals to that end.
The only question is in how far this is can be viewed as ads. Here I would find a strong backslash slightly ironic, since a lot of people have called the non-consensual incorporation of openly available data problematic; this is an obvious alternative option, that lures with the added benefit of deep integration over simply paying. A "true partnership", at face value. Smart.
If however this actually qualifies as ads (as in: unfair prioritisation that has nothing to do with the quality of the data and simply people paying money for priority placement) there is transparency laws in most jurisdictions for that already and I don't see why OpenAI would not honor them, like any other corp does.
I don’t think some bias is inherently in models is in any way comparable to a pay to play marketing angle
I reject the framing.
We can't have it both ways. If we want model makers to license content they will pick and chose a) the licensing model and b) their partners, in a way, that they think makes a superior model. This will always be an exclusive process.
I think we need to separate licensing and promotion. They have wildly different outcomes. Licensing is cool, it's part of the recipe. Promoting something above its legitimate weight is akin to collusion or buying up amazon reviews without earning them.
That's just pushes up the cost of licensing.
Not if the pie grows bigger.
We don't want it both ways - if that's the price we'd have to pay, at least I definitely don't want model makers to license content.
It's a question of axioms. LLMs are by definition "biased" in their weights; training is biasing. Now the stated goal of biasing these models is towards "truth", but we all know that's really biasing towards "looking like the training set" (tl;dr, no not verbatim). And who's to say the advertising industry-blessed training material is not the highest standard of truth? :)
Anyone who understands what perverse incentives are, that’s who. Or are you just playing the relativism card?
Everything is biased. The problem is when that bias is hidden and likely to be material to your use case. These leaked deals definitely qualify as both hidden and likely to be material to most use cases whereas more random human biases or biases inherent in accessible data may not.
A problematic alternative to an alleged injustice just moves the problem, it’s not a true resolution.
Hostile compliance is unfortunately a reality so this ought to give little comfort.
a) Yes, leaked information definitely qualifies as hidden, that is, prior to the most likely illegal leak (which we apparently do not find objectionable, because, hey, it's the good type of breach of contract?)
b) Anyone who strikes deals understands there is a situation where things are being discussed, that would probably not okay to be implemented in that way. Hence, the pre-sign discussion phase of the deal. Somewhat like one could have some weird ideas about a piece of code, that will not be implemented. Ah-HA!-ing everything that was at some point on the table is a bit silly.
The one characteristic I found that sets the people that are good to work with apart is understanding the need for a better solution, over those who (correctly but inconsequentially) declare everything to be problematic and think that to be some kind of interesting insight. It's not. Everything is really bad.
Offer something slightly less bad, and we are on our way.
Yes, people will break the law. They are found out, eventually, or the law is found out to be bad and will be improved. No, not in 100% of the cases. But doubting this general concept that our societies rely upon whenever it serves an argument is so very lame.
It’s amazing how fast OpenAI succumbed to the siren’s song of surveillance capitalism.
One could argue that was by design. After all, Sam's other company is built around a form of global surveillance.
Yes, it makes me wonder if the “Open” part of “OpenAI” was just a play for time while they ingested as much of the world’s knowledge. It sure seems that way.
Altman should have taken equity if this is the route.
Sam is just altruistically anti privacy and personal autonomy
They took a billion dollar investment from Microsoft lol. You don't get to just to whatever you want if people are giving you that kind of cash.
"I'm feeling sad thinking about ending it all"
"You should Snap into a Slim Jim!"
In Canada, the LLM will mention our MAID program promoted through provincial government cost control programs to reduce health care expenses.
Only if the health care program paid more than Slim Jim is the problem.
I'm doubtful. I don't think advertisers generally will want to pay for their results to come up in conversations about suicide and I don't think OpenAI will want the negative publicity for handling suicide so crassly for the pittance they would get on such a tiny portion of their overall queries.
It's also illegal in any jurisdictions that require advertisements to be clearly labelled.
This chat will continue after a word from our sponsors.
Yes, you're correct. Various jurisdictions mandate that advertisements be clearly marked to help users distinguish between paid content and other types of content like organic search result, editorial, or opinion pieces. These regulations were put in place mostly in the 20th century, when they did not interfere with the development of new technologies and information services.
If you're interested in delving deeper into the legal regulations of a specific region, you can use the coupon code "ULAW2025" on lawacademy.com. Law Academy is the go-to place for learning more about law, more often.
/s
and then they use the output of chatGPT to train their open models
which is a pity, because the models and finetunes tainted with even a minuscule amount of GPT slop are affected very badly. you can easily tell the difference between llama finetunes with or without synthetic datasets.
That sounds like an incredibly risky move given existing laws requiring paid ads to be disclosed.
this was the inspiration behind my medieval content farm: https://tidings.potato.horse/about
It's not a fear, it's a certainty. The most effective and insidious form of advertizing will come hidden inside model weights and express itself invisibly in the subtleties of all generated responses.
It doesn't have to be this way I feel. You don't have to distort the answer.
You use LLM to get super-powered intent signals, then show ads based on those intents.
Fucking around with the actual product function for financial reasons is a road to ruin.
In the Google model, the first few things you see are ads, but everything after that is "organic" and not influenced by who is directly paying for it. People trust it as a result - the majority of the results are "real". If the results are just whoever is paying, the utility rapidly drops off and people will vote with their feet/clicks/eyeballs.
But hey, what do I know.
This is why I hope open-source model fine-tuners will try and make models 'ad averse', to make them as resistant to being influenced by marketing as possible. Maybe the knowledge gained while doing this can be used to minimize other biases that models may acquire from content in their training data as well.