return to table of content

New models and developer products

Every day this video ages more and more poorly [1].

categories of startups that will be affected by these launches:

- vectorDB startups -> don't need embeddings anymore

- file processing startups -> don't need to process files anymore

- fine tuning startups -> can fine tune directly from the platform now, with GPT4 fine tuning coming

- cost reduction startups -> they literally lowered prices and increased rate limits

- structuring startups -> json mode and GPT4 turbo with better output matching

- vertical ai agent startups -> GPT marketplace

- anthropic/claude -> now GPT-turbo has 128k context window!

That being said, Sam Altman is an incredible founder for being able to have this close a watch on the market. Pretty much any "ai tooling" startup that was created in the past year was affected by this announcement.

For those asking: vectorDB, chunking, retrieval, and RAG are all implemented in a new stateful AI for you! No need to do it yourself anymore. [2] Exciting times to be a developer!

[1] https://youtu.be/smHw9kEwcgM

[2] https://openai.com/blog/new-models-and-developer-products-an...

If you want to be a start-up using AI, you have to be in another industry with access to data and a market that OpenAI/MS/Google can't or won't touch. Otherwise you end up eaten like above.

We just launched our AI-based API-Testing tool (https://ai.stepci.com), despite having competitors like GitHub Co-Pilot.

Why? Because they lack specificity. We're domain experts, we know how to prompt it correctly to get the best results for a given domain. The moat is having model do one task extremely well rather than do 100 things "alright"

If you just launched it is too soon to speak.

Of course! Today our assumption is that LLMs are commodities and our job is to get the most out of them for the type of problem we're solving (API Testing for us!)

Sorry to be blunt but they can be totally right, if you do not succeed and have to shut down your startup.

It certainly will be a fun experience. But our current belief is that LLMs are a commodity and the real value is in (application-specific) products built on top of them.

Time will tell

Domain specialization could be the moat, not only in the business domain but the sheer cost of deployment/refinement.

Check out Will Bennett's "Small language models and building defensibility" - https://will-bennett.beehiiv.com/p/small-language-models-and... (free email newsletter subscription required)

Writer.ai is quite successful, and is totally in another industry that Google+MS participate in.

a market that OpenAI/MS/Google can't or won't touch.

But also one that their terms of service, which are designed to exclude the markets that they can't or won't touch, don't make it impractical for you to service with their tools.

Even if you aren't eaten, the use case will just be copied and run on the same OpenAI models by competitors, having good prompts is not good enough a moat. They win either way

Or you can treat what OpenAI is doing like a commodity like AWS and leverage it to solve a meaningful problem.

I haven't been paying attention, why are embeddings not needed anymore?

OP is incorrect. Embeddings are still needed since (1) context windows can't contain all data and (2) data memorization and continuous retraining is not yet viable.

"yet"

It's also much slower. LLMs are generating text token at a time. That's not very good for search.

Pre-search tokenization however, probably a good fit for LLMs.

But the common use case of using a vector DB to pull in augmentation appears to now be handled by the Assistants API. I haven't dug into the details yet but it appears you can upload files and the contents will be used (likely with some sort of vector searching happening behind the scenes).

Retrieval: augments the assistant with knowledge from outside our models, such as proprietary domain data, product information or documents provided by your users. This means you don’t need to compute and store embeddings for your documents, or implement chunking and search algorithms. The Assistants API optimizes what retrieval technique to use based on our experience building knowledge retrieval in ChatGPT.

The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:

it either passes the file content in the prompt for short documents, or performs a vector search for longer documents Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.

Really cool to see the Assistants API's nuanced document retrieval methods. Do you index over the text besides chunking it up and generating embeddings? I'm curious about the indexing and the depth of analysis for longer docs, like assessing an author's tone chapter by chapter—vector search might have its limits there. Plus, the process to shape user queries into retrievable embeddings seems complex. Eager to hear more about these strategies, at least what you can spill!

I believe their API can be stateful now: https://openai.com/blog/new-models-and-developer-products-an...

Vector DBs should never have existed in the first place. I feel sorry for the agent startups though.

How does this absolve vectordbs

If you are using OpenAI, the new Assistants API looks like itnwill handle internally what you used to handle externally with a vector DB for RAG (and for some things, GPT-4-Turbo’s 128k context window will make it unnecessary entirely.) There are some other uses for Vector DBs than RAG for LLMs, and there are reasons people might use non-OpenAI LLMs with RAG, so there is still a role for VectorDBs, but it shrunk a lot with this.

It doesn't, but semantic search is a lot less relevant if you can squeeze 350 pages of text into the context.

- vectorDB startups -> don't need embeddings anymore

they don't provide embedings, but storage and query engines for embeddings, so still very relevant

- file processing startups -> don't need to process files anymore

curious what is that exactly?..

- vertical ai agent startups -> GPT marketplace

sure, those startups will be selling their agents on marketplace

they definitely do provide embeddings, https://openai.com/blog/new-models-and-developer-products-an... ctrl+f retrieval, "... won't need to ... compute or store embeddings"

I mean embeddingsDB startups don't provide embeddings. They provide databases which allows to store and query computed embeddings (e.g. computed by ChatGPT), so they are complimentary services.

HN is quite notorious for that Dropbox comment

I suspect that video is going to end up more notorious, it's even funnier given it's the VCs themselves

More context, please.

EDIT: I guess it's this:

https://news.ycombinator.com/item?id=8863#9224

that's the one

depends on how much developers are willing to embrace the risk of building everything on OpenAI and getting locked onto their platform.

What's stopping OpenAI from cranking up the inference pricing once they choke out the competition? That combined with the expanded context length makes it seem like they are trying to lead developers towards just throwing everything into context without much thought, which could be painful down the road

I suspect it is in OpenAI's interest to have their API as a loss leader for the foreseeable future, and keep margins slim once they've cornered the market. The playbook here isn't to lock in developers and jack up the API price, it's the marketplace play: attract developers, identify the highest-margin highest-volume vertical segments built atop the platform, then gobble them up with new software.

They can then either act as a distributor and take a marketplace fee or go full Amazon and start competing in their own marketplace.

Embeddings are still important (context windows can't contain all data + memorization and continuous retraining is not yet viable), and vertical AI agent startups can still lead on UX.

Context windows can't contain all data... yet.

Startups built around actual AI tools, like if one formed around automatic1111 or oogabooga, would be unaffected, but because so much VC money went to the wrong places in this space, a whole lot of people are about to be burned hard.

damn hahaha it's oobabooga not oogabooga

i'm excited for the open source, local inferencing tech to catchup. The bar's been raised.

There is not much info about retrieval/RAG in their docs at the moment - did you find any example on how is the retrieval supposed to work and how to give it access to a DB?

I’ve been keeping my eye on a YC startup for the last few months that I interviewed with this summer. They’ve been set back so many times. It looks like they’re just “ball chasing”. They started as a chatbot app before chatgpt launched. Then they were a RAG file processing app, then enterprise-hosted chat. I lost track of where they are now but they were certainly affected by this announcement.

You know you’re doing the wrong thing if you dread the OpenAI keynotes. Pick a niche, stop riding on OpenAI’s coat tails.

more startups should focus on foundation models, it's where the meat is. Ideally there won't be a need for any startup as the platform should be able to self-build whatever the customer wants.

Probably best not to make your company about features that a frontier AI company would have a high probability of adding in the next 6-12 months.

Well, if said startups were visionaries, the could've known better the business they're entering. On the other hand - there are plenty of VC-inflated balloons, making lots of noise, that everyone would be happy to see go. If you mean these startups - well, farewell.

There's plenty more to innovate, really, saying OpenAI killed startups it's like saying that PHP/Wordpress/NameIt killed small shops doing static HTML. or IBM killing the... typewriter companies. Well, as I said - they could've known better. Competition is not always to blame.

TBH those are low-hanging fruits for OpenAI. Much of the value still being captured by OpenAI's own model.

The sad thing is, GPT-4 is its own league in the whole LLM game, whatever those other startups are selling, it isn't competing with OpenAI.

Offtopic. I find it's amusing that we not only have "chatGPT" but now also "vectorDB". Apple's influence is really strong.

We don't want Open AI to win everything.

Where is the part about embeddings?

Why don’t you need embedding?

Checking hn and product hunt a few times a week gives you most of that awareness and I don’t need to remind you about the person behind hn ‘sama’ handle.

There will be a lot of startups who rely on marketing aggressively to boomer-led companies who don't know what email is and hoping their assistant never types OpenAI into Google for them.

None of those categories really fall under the second order category mentioned in the video. Using their analogy they all sound more like a mapping provider versus something like Uber.

Just because something is great doesn't mean that others can't compete. Even a secondary good product can easily be successful due to a company having invested too much, not being aware of openai (ai progress in general), due to some magic integration, etc.

If it would be only me, no one would buy azure or aws but just gcp.

Most of the products announced (and the price cuts) appear to be more about increasing lock-in to the OpenAI API platform, which is not surprising given increased competition in the space. The GPTs/GPT Agents and Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

I'm mixed on the presentation and will need to read the fine print on the API docs on all of these things, which have been updated just now: https://platform.openai.com/docs/api-reference

The pricing page has now updated as well: https://openai.com/pricing

Notably, the DALL-E 3 API is $0.04 per image which is an order of magnitude above everyone else in the space.

EDIT: One interesting observation with the new OpenAI pricing structure not mentioned during the keynote: finetuned ChatGPT 3.5 is now 3x of the cost of the base ChatGPT 3.5, down from 8x the cost. That makes finetuning a more compelling option.

It's a good strategy. For me, avoiding the moat means either a big drop in quality and just ending up in somebody elses moat, or a big drop in quality and a lot more money spent. I've looked into it and maybe the most practical end-to-end system for owning my own LLM is to run a couple of 3090s on a consumer motherboard at substantial running cost to keep them up 24/7 and that's not powerful enough to cut it and rather expensive simultaniously. For a bit more expense, you can get more quality and lower running costs and much slower processing from buying a 128gb/192gb apple silicon setup and that's much much much slower than the "Turbo" services that OpenAI offers.

I think the biggest thing pushing me away from OpenAI was they were subsidizing the chat experience much more than the API and this seems to reconcile that quite a bit. Quite simply OpenAI is sweetening the pot here too much for me to really ignore, this is a massively subsdizised service. I honestly don't feel the switching costs in the future will outweigh the benefits I'm getting now.

Everybody's got their own calculus about how competitive their space is and what this tech can do for them, but some might be best off dancing around lock-in by being careful about what they use from OpenAI and how tightly they integrate with it.

This is very early in the maturity cycle for this tech. The options that will be available for private inference and fine tuning, for cloud-gpu/timeshare inference and fine tuning, and for competing hosted solutions are going to vastly different as months go by. What looks like squeezing value out of OpenAI today might look a lot like technical debt and frustrating lock-in a year from now.

That's what they're hoping you chase after, and if your product is defined by this technology, maybe that's what you have to do. But if you're just thinking about feature opportunities for a more robust product, judiciousness could pay off better than rushing. For now.

While I agree with you, as a happy GPT4 plus customer, I'm worried about the inevitable enshittification downhill roll that will eventually ensue.

Once marketing gets in charge of product, it's doomed. And I can't think of a product startup that it hasn't happened to. Particularly with this type of growth, at some point, the suits start to out number the techies 10:1.

This is why openeness and healthy competition is primordial.

It's not marketing, it's economics.

If you set money on fire -- eventually there's a time when you need to stop doing that.

I think the parent is more talking about the other common situation where organizations start focusing on maximizing profits, rather than just working towards a basic profitability. Eg, Google Maps API pricing comes to mind.

Yes, OpenAI might be (we don't know how much) burning through their $5B capital/Azure credits now, but I think the `turbo` models are starting to addressing this as well. And $20/month from a large user base can also add up pretty quick.

You do see VC bloat up company sizes for what could have been a very profitable small to medium sized private business, without enshittefication, had they not hired dozens more people.

"What looks like squeezing value out of OpenAI today might look a lot like technical debt and frustrating lock-in a year from now."

Just wanted to highlight this as such a great, concise way to look at the Buy vs Build with pretty much any cloud service, thanks!

Seems right to me. That's why it's good to build with extendability in mind to allow switching easily in the future if needed.

Do you build on AWS AI services then? Or any other cloud provider? The outcome is the same, right? Technical lock in, cost risks, integration maintenance, etc.

The key line in my comment, for emphasis:

This is very early in the maturity cycle for this tech.

Think about what value you get out of the services and what migration might look like. If you are making simple completion or chat calls with a clever prompt, then migration will probably be trivial when the time comes. Those features are the commodity that everyone will be offering and you'll be able to shop around for the ideal solution as alternatives become competitive.

Alternately, if you're handing OpenAI a ton of data for them to opaquely digest for fine tuning with no egress tools, or having them accumulate lots of other critical data with no egress, you're obviously getting yourself locked in.

The more features you use, the more idiosyncratic those features are, the more non-transferable those features are, and the more deeply you integrate those features, the more risk you're taking. So you want to consider whether the reward is worth that risk.

Different projects will legitimately have different answers for that.

While everything you say mighy be true, it's also irrelevant.

Time to market is more important, you build users you get an edge, you can swap models later on (as long as you own the data).

Almost nobody sharecropping their way to an MVP has a product except at the sufferance of the company doing the work for them - regardless of which one they switch to later.

There's very little there there in most of these folks.

(For the few where there is, though, I agree with you.)

The Phind CEO talked in an interview about how their own model is already out ahead of ChatGPT 4 for the target use case of their search engine in some cases, and increasingly matching it on general searches: https://www.latent.space/p/phind#details

I use it instead of Bing Chat now for cases where I really need a search engine and Google is useless. Mainly because it's faster, but I also like not having to open another browser.

For me personally, being able to fine-tune the local LLM's at a much higher rank and training more layers is very useful for (somewhat unreliably) embedding information. AFAIK the OpenAI fine-tuning is more geared towards formatting the output.

As I understand it, fine tuning is never really about adding content. RAG and related techniques are likely cheaper/better if that’s what you want.

yup how I understand fine tuning is more about adding context and bigger picture. RAG is more about adding actual content. Good system probably needs both in long run.

A good strategy for who? Society? Customers? The future? Or just for making money for the owners?

Obviously it is good strategy, surely created from GPT.

ChatGPT only costs a few dollars, but I'm also "paying" for the service by contributing training data to OpenAI.

Getting access to this type of interaction data with (mostly) humans must be quite valuable asset.

Their products are incredible though. I’ve tried the alternatives and even Claude is not nearly as good as even ChatGPT. Claude gives an ethics lecture with every second reply, which costs me money each time and makes their product very difficult to (want to) embed.

What are you using it for? I want to know what people actually use these things for damn it !

I'd built a bot to use ChatGPT from Telegram (this was before the ChatGPT API), and currently building a tool to help make writing easier (https://www.penpersona.com). This is the API.

Apart from that, it's pretty much replaced 80% of my search engine usage, I can ask it to collate reviews for a product from reddit and other sites, get the critical reception of a book, etc. You don't have to go and read long posts and articles, have GPT do it for you. There's many other use cases like this. For the second part, I'm using a UI called Typing Mind (which also works with the API).

> currently building a tool to help make writing easier

That's cool!

> it's pretty much replaced 80% of my search engine usage

That's not cool. That's how you end up relying on nonexisting sources or other hallucinations.

As opposed to raw information surfaced by the search engine, which we all know is perfectly reliable, unbiased, and up to date?

That aside, this particular admonishment was worn out a couple of months after ChatGPT was released. It does not need to be repeated every time someone mentions doing something interesting with an LLM.

Retrieving the non-metadata titles of 45,000 various PDF, docx, etc. without a bunch of rules/regexs that would fail half the time.

“Derp derp, hallucinations”.

Eh, no, not in practice, not when the entire context and document is provided and the tools are used correctly.

Summarizing large documents. Finding relationships between two (or more) documents. Building a set of points bridging the gap between the documents. Correcting malformed text data.

Not everything is just data in a database or some structured format. Sometimes you have blobs of text from a user, or maybe you ran whisper on an audio/video file and now you just have a transcript blob… it’s never been easier to automate all of this stuff and get accurate results.

You can even have humans in the loop still to protect against hallucinations, or use one model to validate another (ask GPT to correct or flag issues with a whisper transcript)

Honestly the companies that completely ignore ethics are the only ones who are going to scoop up any market share outside of OpenAI.

Getting a chiding lecture every time you ask an AI to do something does absolutely nothing for the end user other than waste their time. "AI Safety" academics are memeing themselves out of the future of this tech and leaving the gate wide open for "unsafe" AI to flourish with this farcical behavior.

Mistral + 2 weeks of work from the community. Not as good, but private and free. It will trail OpenAI by 6-12 months in capabilities.

OpenAI offering 128k context is very appealing, however.

I tried some Mistral variants with larger context windows, and had very poor results… the model would often offer either an empty completion or a nonsensical completion, even though the content fit comfortably within the context window, and I was placing a direct question either at the beginning or end, and either with or without an explanation of the task and the content. Large contexts just felt broken. There are so many ways that we are more than “two weeks” from the open source solutions matching what OpenAI offers.

And that’s to say nothing of how far behind these smaller models are in terms of accuracy or instruction following.

For now, 6-12 months behind also isn’t good enough. In the uncertain case that this stays true, then a year from now the open models could be perfectly adequate for many use cases… but it’s very hard to predict the progression of these technologies.

Comparing a 7B parameter model to a 1.8T parameter model is kind of silly. Of course it's behind on accuracy, but it also takes 1% of the resources.

The person I replied to had decided to compare Mistral to what was launched, so I went along with their comparison and showed how I have been unsatisfied with it. But, these open models can certainly be fun to play with.

Regardless, where did you find 1.8T for GPT-4 Turbo? The Turbo model is the one with the 128K context size, and the Turbo models tend to have a much lower parameter count from what people can tell. Nobody outside of OpenAI even knows how many parameters regular GPT-4 has. 1.8T is one of several guesses I have seen people make, but the guesses vary significantly.

I’m also not convinced that parameter counts are everything, as your comment clearly implies, or that chinchilla scaling is fully understood. More research seems required to find the right balance: https://espadrine.github.io/blog/posts/chinchilla-s-death.ht...

It's an order of magnitude comparison.

Let's just agree it's 100x-300x more parameters, and let's assume the open ai folks are pretty smart and have a sense for the optimal number of tokens to train on.

This definitely. Andrej Karpathy himself mentions tuned weight initialisation in one of his lectures. The TinyGPT code he wrote goes through it.

Additionally explanations for the raw mathematics of log likelihoods and their loss ballparks.

Interesting low-level stuff. These researchers are the best of the best working for the company that can afford them working on the best models available.

Nah, it's training quality and context saturation.

Grab an 8K context model, tweak some internals and try to pass 32K context into it - it's still an 8K model and will go glitchy beyond 8K unless it's trained at higher context lengths.

Anthropic for example talk about the model's ability to spot words in the entire Great Gatsby novel loaded into context. It's a hint to how the model is trained.

Parameter counts are a unified metric, what seems to be important is embedding dimensionality to transfer information through the layers - and the layers themselves to both store and process the nuance of information.

I don't understand the lock-in argument here. Yes, if a competitor comes in there will be switching cost as everything is re-learned. However, from a code perspective, it is a function of the key and a relatively small API. New regulations outstanding, what is stoping someone from moving from OpenAI to Anthropic (for example) other than the cost of learning how to effectively utilize Anthropic for your use case?

OpenAI doesn't have some sort of egress feed for your database.

I sometimes wonder how much OpenAI pays for people to post arguments about how great they are on HN, because it looks like you are pretty much right. There isn't a ton about OpenAI that is actually sticky.

I most definitely am not paid by OpenAI and am very confused how my original (critical) comment could be seen as astroturfing.

> Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.

https://news.ycombinator.com/newsguidelines.html

OpenAI doesn't have some sort of egress feed for your database.

That's what they're trying to incentivize, especically with being able to upload files for their own implementation of RAG. You're not getting the vector representation of those files back, and switching to another provider will require rebuilding and testing that infrastructure.

Thats exactly what i thought. Smart strategy on OpenAI's part given that its extremely easy (and free) to do RAG with pgvector.

It's neither free nor performant.

The developer experience is lacking vs. other vector database providers and the performance doesn't match those that prioritize performance rather than devex. You're also spending time writing plumbing around postgres that isn't really transferrable work.

For some people already in the ecosystem it will make sense.

Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

I'd argue the opposite. The new "Threads" interface in the OpenAI admin section lets you see exactly how it's interpreting input/output specifically to address the black box effect.

Source: https://platform.openai.com/docs/api-reference/runs/listRunS... tells you exactly how it's stepping through the chain. Even more visibility than there used to be.

I agree that some parts of the process now seem more like “open”, but there is definitely a lot more magic in the new processing. Namely, threads can have an arbitrary length, and OpenAI automatically handles context window management for you. Their API now also handles retrieval of information from raw files, so you don’t need to worry about embeddings.

Lastly, you don’t even need any sort of database to keep track of threads and messages. The API is now stateful!

I think that most of these changes are exciting and make it a lot easier for people to get started. There is no doubt in my mind though that the API is now an even bigger blackbox, and lock-in is slightly increased depending on how you integrate with it.

I wouldn't say the black box issue is unique to OpenAI. I suspect nobody could explain certain behaviors, including them.

As for lock in, agreed completely.

The GPTs/GPT Agents and Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

This just rings hollow to me. We lost the fights for database portability, cloud portability, payments/billing portability, and other individual SaaS lock-in. I don't see why it'll be different this time around.

What do you mean "orders of magnitude above" for DALL-E? As far as I can see, Midjourney is $0.05 per image, and that's if you don't forget you have a subscription. I've ended up paying $10 per image.

A friend of mine is building Zep (https://www.getzep.com/), which seems to offer a lot of the Assistant + Retrieval functionality in a self-hostable and model-agnostic way. That type of project may the way around lock-in.

Also, DALL·E 3 "HD" is double the price at $0.08. I'm curious to play around with it once the API changes go live later today.

The docs say:

By default, images are generated at standard quality, but when using DALL·E 3 you can set quality: "hd" for enhanced detail. Square, standard quality images are the fastest to generate.

https://platform.openai.com/docs/guides/images/usage

I think it's more about finding places to add value than "lock in" per se. It seems they're adding value with improved developer experience and cost/performance rather than on the models themselves. Not necessarily nefarious attempts to lock in customers, but it may have the same outcome :)

Anything open about OpenAI starts and ends with the name

most of the products announced (and the price cuts) appear to be more about increasing lock-in to the OpenAI API platform

OpenAI is currently refusing far more enterprises than these products could "lock-in" even with 100% stickiness.

Makes it unlikely this is about lock-in or fighting churn when arguably, the best advertisement for GPT-4 is comparing its raw results to any other LLM.

If you said their goal was fomenting FOMO, I'd buy it. Curious, though, when they'll let the FOMO fulfillment rate go up by accepting revenue for servicing that demand.

In people's experience with these sorts of tools, have they assisted with maintainance of codebases? This might be directly, or indirectly via more readable, bette organized code.

The reason I ask is that these tools seem to excel in helping to write new code. In my experience I think there is an upper limit to the amount of code a single developer can maintain. Eventually you can't keep everything in your head, so maintaining it becomes more effort as you need to stop to familiarize yourself with something.

If these tools help to write more code, but do not assist with maintainance, I wonder if we're going to see masses of new code written really quickly, and then everything grinds to a halt, because no one has an intimate understanding of what was written?

My open source ai coding tool aider is unique in that it is designed to work with existing code bases. You can jump into an existing git repo and start aaking for changes, new features, etc.

https://github.com/paul-gauthier/aider

It helps gpt understand larger code bases by building a "repository map" based on analyzing the abstract syntax tree of all the code in the repo. This is all built using tree-sitter, the same tooling which powers code search and navigation on GitHub and in many popular IDEs.

https://aider.chat/docs/repomap.html

Related to the OpenAI announcement, I've been able to generate some preliminary code editing evaluations of the new GPT models. OpenAI is enforcing very low rate limits on the new GPT-4 model. I will update the results as quickly my rate limit allows.

https://news.ycombinator.com/item?id=38172621

Also, aider now supports these new models, including `gpt-4-1106-preview` with the massive 128k context window.

https://github.com/paul-gauthier/aider/releases/tag/v0.17.0

I do love aider, thanks for making it! I'd like an option to stop it from writing files everywhere, though, even if that means I have no history.

Thanks for trying aider! I'd like to better understand your concern about aider's support files. If you're able, maybe file an issue and I'd be happy to try and help make it work better for you.

https://github.com/paul-gauthier/aider/issues

Dude I love how passionate you are about this project. I see you in every GPT thread. Despite all this tech there are so few projects out there trying to make large-repo code editing/generation possible.

I think that's mostly because most people are skeptical about trusting people with IP.

If there was a way to prove that the data was not being funneled into openai's next models, sure, but where is the proof of that? A piece of paper that says you aren't allowed to do something, does not equate to proof of that thing not being done.

Personally I believe all code work should be Open Source by default, as that would ensure the lowest quality code gets filtered out and only the best code gets used for production, resulting in the most efficient solutions dominating (aka, less carbon emissions or children being abused or whatever the politicians say today).

So as long as IP exists, companies will continue to drive profit to the max at the expense of every possible resource that has not been regulated. Instead of this model, why not banish IP, make everything open all at once and have only the best, most efficient code running, thereby locating missing children faster, or emitting less carbon, or bombing terrorists better or w/e.

How do you manage token limits when sending large amounts of code structure to OpenAI?

Aider has a "token budget" for the repository map (--map-tokens, default of 1k). It analyzes the AST of all the code in the repo, the call graph, etc... and uses a graph optimization algorithm to select the most relevant parts of the repo map that will fit in the budget.

There's some more detail in the recent writeup about the new tree-sitter based repo map that was linked in my comment above.

Any plans to open support up for other languages that tree-sitter supports?

Aider supports the repo map for bunch of languages already, see below. Is there one in particular you need that is missing?

https://github.com/paul-gauthier/aider/tree/main/aider/queri...

Thanks a lot for doing this project. Your blog post got me excited.

It helps gpt understand larger code bases by building a "repository map" based on analyzing the abstract syntax tree of all the code in the repo.

What I do with codespin[1] (another AI code gen tool) is to give a file/files to GPT and ask for signatures (and comments and maybe autogenerate a description), and then cache it until the file changes. For a lot of algorithmic work, we could just use GPT now. Sure it's less efficient, but as these costs come down it matters less and less. In a way, it's similar to higher level (but inefficient) programming languages vs lower level efficient languages.

[1]: https://github.com/codespin-ai/codespin-cli

I've been thinking about this for a while now, wrt two points:

1. This will be the end of traditional SWEs and the rise of the age of debuggers, human debuggers who spend their days setting up breakpoints and figuring bugs in a sea of LLM generated code.

2. Hiring will switch from using Leetcode questions to "pull out your debugger and figure out what's wrong with this code".

What makes you think the LLM couldn’t run a debugging session from the content of a JIRA ticket and the whole code base + documentation?

If the codebase is anything more than a simple Python project... I don't think that'll happen.

It just doesn't scale that well. Hell, GPT-4 can't make sense of my own projects.

Who can say what's possible but there are very few "debugging transcripts" for neural nets to train on out there. So it'd have to sort of work out how to operate a debugger via functions, understand the codebase (perhaps large parts of it), intuit what's going wrong, be able to modify the codebase, etc. Lots of work to do there.

Having never seen it, or anything even close to it. (Of course, I'm a little biased by seeing product demos that don't even get "add another item to this list of command line arguments" right; maybe if everyone already believes it works, nobody bothers to actually sell that?)

Yep, time to start adding source_file.prompt sidecar files next to each generated module so debugging sessions can start at the same initial condition.

The first interview where I was handed some (intentionally) broken C code and gdb was about 15 years ago. I'm not sure that part is a change in developer workflow (this may apply more in systems and embedded though.)

I've been paying attention to this too (mostly by following Simon Willison) and I'm still solidly in the "get back to me when this stuff can successfully review a pull request or even interpret a traceback" camp...

We are doing this for API-Testing now. You should check out our website

https://ai.stepci.com

piece of feedback: it's weird to have a drop-down on "OpenAPI Links" when there are no other options.

Thanks! We will have more examples coming very soon

If these tools help to write more code, but do not assist with maintainance, I wonder if we're going to see masses of new code written really quickly, and then everything grinds to a halt, because no one has an intimate understanding of what was written?

Yep. Companies using LLMs to "augment" junior developers will get a lot of positive press, but I guess it remains to be seen how much the market consistently rewards this behavior. Consumers will probably see right through it, but the b2b folks might get fleeced for a few years before eventually churning and moving to a higher quality old-fashioned competitor that employs senior talent.

But IDK, maybe we'll come up with models that are good at growing and maintaining a coherent codebase. It doesn't seem like an impossible task, given where we are today. But we're pretty far from it still, as you point out.

What are the tasks that you envision are key to maintenance?

- bug finding and fixing

- parsing logs to find optimisation options

- refactoring (after several local changes)

- given new features, recommending a refactoring?

I feel like code assistants are already reasonable help for doing the first two, and the later two are mostly a question of context window. I feel we might end up with code bases split by context sizes, stitched with shared descriptions.

There's a nice code gpt plugin for intellij and vs code. Basically you can select some code and ask it to criticize it, refactor it, optimize it, find bugs in it, document it, explain it, etc. A larger context means that you can potentially fit your entire code base in that. Most people struggle to keep the details in their head of even a small code base.

The next level would be deeper integration with tools to ensure that whatever it changes, the tests still have to pass and the code still has to compile. Speaking of tests, writing those is another thing it can do. So, AI assisted salvaging of legacy code bases that would otherwise not be economical to deal with could become a thing.

What we can expect over the next years is a lot more AI assisted developer productivity. IMHO it will perform better on statically typed languages as those are simply easier to reason about for tools.

You can now [1] pay from $2 to $3 million to pretrain custom gpt-n model. This has gone unnoticed but seems really neat. Provided that a start-up has enough money spend on that, it would certainly give competitive advantage.

[1] https://openai.com/form/custom-models

Edit: forgot to put the link

Well it won’t because they’ll use the model you paid for and take your customers.

How do you square this with OpenAI's assertion that they never use data from enterprise customers for their own training? Are you suggesting they're lying?

They don't have to be currently lying for this to be a valid concern.

Clauses in terms of service are routinely updated or removed.

Clauses in terms of service are routinely updated or removed.

True, but that plays a bit differently in B2B land, because your customers also have legal teams and law firms on retainer.

No-one seems to have sued Unity over an even more egregious set of EULA changes - claiming that users retroactively owed money on games developed and published under totally different terms.

Afaik, they rolled the retroactive fee back, right?

And technically, that was a modification of the future EULA.

If you wanted to continue to use Unity, here was the pricing structure, which includes payments for previous installs.

You were welcome to walk away and refuse the new EULA.

Which is a big difference with historically collecting data, in violation of the then-EULA, and attempting to retroactively bless it via future EULA change.

OpenAI just slurped the entire internet to train their main model, and the world just looks on as they directly compete with and disrupt authors the globe over.

Whoever thinks they are not interested in your data and won't use any trick to get it, then double down on their classic "but your honor, it's not copyright theft, the algorithm learns just like an employee exposed to the data would", isn't paying attention.

This is exactly why I am personally intensely opposed to treating ML training as fair use. Practically speaking the argument justifies ignoring anyone or any group’s preference not to contribute to ML training, so it’s a massive loss of freedom to everyone else.

I agree with you. What come to my mind, is that GPT using private data to learn, if given back to (any) customer, you would have an indirect "open source everything".

Is it the same as avoiding AWS because they will take your software and run it themselves to steal your clients?

This hasn't happened often, but it has happened. Elastic search for example.

Also dynamo db.

This is misleading - intentionally using the incorrect definitions of the words in the parent post to construe a lie that plausibly addresses the concern when read by somebody unfamiliar with the situation.

AWS took an open source project (Elastic) and forked it. They did not take an AWS customer's code.

It’s more like running a PaaS product backed by AWS and then your customers realizing they can just use AWS directly, pay less, and have less complexity. And they have done this before for what it’s worth.

The assumption behind why you'd use the program is that you have access to a proprietary dataset of sufficient size to build a large model around (say, for example, call center transcripts). This almost certainly means OpenAI doesn't have access to train their other models off that data, and it almost certainly means your customers can't take the same data and go straight to OpenAI.

I'm assuming that the target customer of this is people whose moat is proprietary data. If their moat is a unique approach to building a model, then it would indeed be dangerous to engage OpenAI. But then I'd think OpenAI would be hesistent to engage as well.

No. The downside is you spent a lot of money on a model and don’t own it.

Do you have proof or are you just throwing baseless accusations out there?

I had multiple déjà vus, from the API golden days "build anything you want", social media telling you "build your communities with us", to app stores "we take care of the distribution"...it all works, until it doesn't...if they decide to change terms, pricing or make your product/service redundant because its 'strategic'.

It's their platform, their business, their rules.

If your an OpenAI end customer, you'll be fine pre-training gpt-n models for your business; if you're an OpenAI middleman pre-training gpt-n models for other customers, what makes you think OpenAI won't eventually bypass you ? Lookup startups built around APIs and platforms, for every success, there's a graveyard filled due to APIs and platforms changing the rules.

Wow, this is directly going to affect my company in the near term. We had been trying to do it all internally but have found little success. Even at ~$3M it's going to be an attractive choice.

If I had no contact with society from the 29th of November 2022 (the day before ChatGPT was released according to Wikipedia) and came back today to see the OpenAI keynote I would have lost my mind.

The progress and usefulness of these products is absolutely incredible.

I'm sorry, what breakthrough feature did we see here?

- Code interpreter, function calling were already possible on any sufficiently advanced LLM that could follow instructions well enough to output tokens in a rigidly parseable format, which could then be fed into a parser, and its output fed back to the LLM. It was clunky to do with online APIs like ChatGPT, but still eminently possible.

- Custom chatbots were easy to build before, and services to build them (like Poe.com) existed before.

- Likewise outputting JSON just requires a good instruction following AI, that can output token probabilities, along with a schema validator that always picks a token that results in schema-conforming JSON

- GPT4-128k seems to be revolutionary, but Claude-100k already existed, and considering LLM evaluation is quadratic wrt context size, they are probably using some tricks to extend the context, they are not 'full' tokens (I'd be happy to be proven wrong). While having a huge context is useful, for coding, a 8k context can be enough with some elbow grease (like filling the context with a 2-3 deep recursive 'Go To Definition' for a given symbol), so that the AI receives the right context.

- Dall-E 3 seems to be the most revolutionary, but after playing with it, it has much improved compositional ability over SD, but it's still prone to breakdowns

Overall I feel like todays announcements were polish and refinement over last year's bombshell breakthroughs.

The most exciting announcements for me were:

* GPT-4-128k. Sure, Claude exists, but it's closer to GPT-3.5 than 4 IMO. TBD how well 128k context works given that classic attention scales quadratically (so they're presumably using something else) but given how good OpenAI's models tend to be I'm willing to give them the benefit of the doubt.

* Pushed GPT-3.5 finetuning to 16k context (up from 4k when it was released this summer). IME 3.5 finetunes are very useful, very fast, very cheap replacements for specific specialized tasks over GPT-4, and easily outperform GPT-4 for the right kind of tasks. The 4k context limit was a bit of a bummer.

* New tts that to my ears sounds nearly equivalent to Eleven Labs or Play.ht, at one-tenth to one-twentieth the price (with zero monthly commitment). The Eleven Labs Discord is a bit of a bloodbath right now, most of the general chat is just people saying they're switching. (The Play.ht Discord is pretty dead most of the time anyway, so not much new since this morning.) I will say though that it's a bummer that the OpenAI tts doesn't have input streaming, only output streaming, so latency will likely be worse and you'll have to figure out some way to do chunking yourself which is fairly annoying, but for any kind of personalized use case (e.g. a bot talking to customers, as opposed to using pre-recorded snippets) a 10-20x price improvement is worth the extra pain and may be the difference between "neat prototype" and "shippable to production."

Plus, massive price drops for OpenAI's existing products across the board, along with a legal defense fund to protect OpenAI customers from getting sued for using OpenAI models. If you're building an "OpenAI wrapper startup," today was a very good day. If you're competing with OpenAI, though... Oof.

data analyst can now directly plot graphs in the conversation

You're completely ignoring image analysis.

sorry, missed that one, yeah, that's new as well, but to be fair, I missed it because nobody else was talking about it.

Working, multi-lingual voice input/output is huge IMNHO.

Does it boggle anyone else's mind that ChatGPT was released less than a year ago? It definitely feels like it was around lot longer than that.

And there's people (most people) who aren't using it

what's really crazy to me is how fast it became mundane, this technology is still mind blowing to me but most people no longer seem impressed.

Even we who are so impressed with it and use it frequently / build on it -- we don't have a clue of the true potential.

It'll be some pimply intern somewhere that'll blow the lid off things with some ultra-clever-yet-painfully-obvious use case.

I was in prison when ChatGPT came out. All I knew of it was a headline that flashed past really fast on CNN and I called my buddy and said "What the hell is Chat OPT?"

I'd just finished reading The Singularity is Near for the second time too...

The singularity is near is a great book. Hilarious that you read that for the second time and then got out and saw chat gpt!

I love kurzweil but his estimates of timeline are often pretty over optimistic, so I'd be really wondering.

On Metaculus the arrival for weakly general AI was predicted for 2045 two years ago. Now it's at 2026.

https://www.metaculus.com/questions/3479/date-weakly-general...

I remember opening Twitter that night and seeing a bunch of tech people I follow sharing screenshots of conversations with a little green icon. I thought "oh neat, yet another chatbot fad to try out for 5 minutes". I could not have been more wrong.

I've been in contact with society and I'm still losing my mind.

For all the naysayers in the comments, the elephant in the room that no one quite wants to admit, is that GPT4 is still far better than everything else out there

I cancelled my GPT4 subscription because I found Claude more useful for code and Qwen for Chinese language tasks.

It might be better on average but I don’t think it’s better for every task.

All the others are only going to get better too.

Can you go into depth? I’ve used ChatGPT Pro and Phind extensively, didn’t know about Claude and code. Curious to give it a try

I generally use it for boilerplate tasks like “here’s some code, write unit tests” or “here’s a JSON object, write a model class and parser function”.

Claude is significantly faster, so even if it requires a couple more prompt iterations than GPT4, I still get the result I need earlier than with GPT4.

GPT4 also recently developed this annoying tendency to only give you one or two examples of what you asked for, then say “you can write the rest on your own based on this template”. I can’t overstate how annoying this was.

GPT4 also recently developed this annoying tendency to only give you one or two examples of what you asked for, then say “you can write the rest on your own based on this template”. I can’t overstate how annoying this was.

The last model "update" has really ruined GPT-4 in this regard.

When was this? I noticed chatGPT becoming succinct almost to the point of being standoffish about a week or two ago. Probably exacerbated by my having some custom instructions to tame its prior prolixity.

Imagine being told by an ai to do it yourself, hilarious

Claude is superior for me on writing summaries of large documents.

"All the others are only going to get better too."

Yes, including OpenAI, who were already miles ahead :)

Is there anything promising out there?

Is crowd sourced training still unfeasible?

I remember how fast the diffusion world moved in the first year but it seems it's stalled somewhat compared to first midjourney then Dall-e 3. Is it the same with text models?

GPT-4 is the best general model and specifically very good at coding, if correctly promoted. Lots of open source stuff is good at various tasks (e.g. NLP stuff), but nothing is near to the same overall level of performance.

Grok? Just kidding

Phind is the only thing that I've supplemented GPT-4 with, which is still pretty impressive.

The 128k context window GPT-4 Turbo model looks unreal. Seems like Anthropic's day of reckoning is here?

Anthropic never even had a day. I said this before in another Anthropic thread but I signed up 6 months ago for API access and they never responded. An employee in that thread apologized and said to try again, did it, week later still nothing. As far as commercial viability, they never had it.

same here. I wonder why they are not opening it up to more devs. Seems strange.

Purely a guess, but having tried to scale services to new customers, it can be a lot harder than it seems, especially if you have to customize anything. Early on, doing a generic one-size-fits-all can be really, really hard, and acquiring those early big customers is important to survival and often requires customizations.

I got access to Claude 2 - it’s really good and have been chatting with their sales team. Seems they were reasonably responsive- but overall with OpenAI 128k context and price anthropic has no edge

I have not tried, but I assumed that API access to Anthropic's Claude is available through AWS Bedrock.

Yeah i know this wasn't the case for everyone but i got gpt-4 access back in march the next day. Tried Claude and still waiting. Oh well lol.

They can't even compete with open source since multiple platforms have apis available.

Anthropic's $20 billion valuation is buck wild, especially to those who've used their "flagship" model. The thing is insufferable. David Shapiro sums it up nicely.[1] Fighting tools is horrendous enough. Those tools also deceiving and lecturing you regarding benign topics is inexcusable. I suspect that this behavior is a side-effect of Anthropic's fetishistic AI safety obsession. I further suspect that the more one brain washes their agent into behaving "acceptably", the more it'll backfire with erratic and useless behavior. Just like with humans, the antidote to harmful action is more free thought and education, not less. Punishment methods rooted in fear and insecurity will result in fearful and insecure AI (i.e ironically creating the worst outcome we're all trying to avoid).

[1] https://www.youtube.com/watch?v=PgwpqjiKkoY

The model responses are a side-effect of AI reinforcement training in lieu of humans.

The trick is to write as if it were the AI calling the shots.

Set up an agreement on the requirement. Then Force the first word the Assistant: says to "Sure"

Anthropic's valuation has nothing to do with their product being actually good. It is entirely tied up in the perception of built-in risk-mitigation which appeals to client companies that are actually run by lawyers and not product folks.

Products backed by nanny-state LLMs are going to fail in the market. The TAM for the products is tiny, basically the same as Christian Music or Faith-Based Filmmaking.

People love porn and violence.

Anthropic doesn't care about consumer products. Their CEO believes that the company with the best LLM by 2026 will be too far ahead for anyone else to catch up.

And here I was in bliss with the 32k context increase 3 days ago. 128k context? Absolutely insane. It feels like now the bottle neck in GPT workflows is no longer GPT, but instead its the wallet!

Such an amazing time to be alive.

For GPT-4 Turbo, not GPT-4.

Yes, nowhere in the text today was there any assertion that Turbo produces (eg) source code at the same level of coherence and consistently high quality as GPT4.

GPT-4-Turbo seems to be replacing GPT-4 (non-turbo); the GPT-4 (non-turbo) model is marked as "Legacy" in the model list.

EDIT: the above is corrected, it previously erroneously said the non-turbo model was marked as "deprecated", which is a different thing.

It’s insane because it makes no sense. When you read a book you don’t remember the last 100,000 words. It’s so wildly inefficient to do it that way.

Huh? By the time you finish reading a book you've forgotten the book?

The specific words, yes.

128k context? Absolutely insane

128k context is great and all, but how effective are the middle 100,000 tokens? LLMs are known to struggle with remembering stuff that isn't at the start or end of the input. Known as the Lost Middle

https://arxiv.org/abs/2307.03172

sama said they improved it

now with the prices reduced so much even the wallet might not be the bottle neck anymore

Comment will not age well.

Whisper V3 is released! https://github.com/openai/whisper/commit/c5d42560760a05584c1...

Looks like it's just a new checkpoint for the large model. It would be nice to have updates for the smaller models too. But it'll be easy to integrate with anything using Whisper V2. I'm excited to add it to my local voice AI (https://www.microsoft.com/store/apps/9NC624PBFGB7)

I assume ChatGPT voice has been using Whisper V3 and I've noticed that it still has the classic Whisper hallucinations ("Thank you for watching!"), so I guess it's an incremental improvement but not revolutionary.

Does it have diarisation yet?

Check out WhisperX

Thanks!

Too bad they didn't upgrade Whisper API yet. Can't wait to make it available in https://whispermemos.com

This looks awesome!

If you add an Android version that I could activate from my lock screen, you'll have another customer.

Do you also get those hallucinations just on silence?

I kind of wonder if they had a bunch of training data of video with transcripts, but some of the video/audio was truncated and the transcript still said the last speech, and so now it thinks silence is just another way of signing off from a TV program.

IMHO the bottleneck on voice now is all the infrastructure around it. How do you detect speech starting and stopping? How do you play sound/speech while also being ready for the user to speak? This stuff is necessary, but everything kind of works poorly, and you really need hardware/software integration.

You're right, I think that's exactly what happened.

Silence is when you get the most hallucinations. But there is a trick supported by some implementations that helps a lot. Whisper does have a special <|nospeech|> token that it predicts for silence. You can look at the probability of that token even when it's not picked during sampling. Hallucinations often have a relatively high probability for the nospeech token compared to actual speech, so that can help filter them out.

As for all the surrounding stuff like detecting speech starting and stopping and listening for interruptions while talking, give my voice AI a try. It has a rough first pass at all that stuff, and it needs a lot of work but it's a start and it's fun to play with. Ultimately the answer is end-to-end speech-to-speech models, but you can get pretty far with what we have now in open source!

Love that Sama literally spent 16 seconds of a 45minute announcement on whisper - https://app.reduct.video/o/eca54fbf9f/p/250fab814f/share/9d9...

OpenAI releases Whisper v3, new generation open source ASR model - https://news.ycombinator.com/item?id=38166965

OpenAI is committed to protecting our customers with built-in copyright safeguards in our systems. Today, we’re going one step further and introducing Copyright Shield—we will now step in and defend our customers, and pay the costs incurred, if you face legal claims around copyright infringement. This applies to generally available features of ChatGPT Enterprise and our developer platform.

So essentially they are giving devs a free pass to treat any output as free of copyright infringement? Pretty bold when training data sources are kinda unknown.

The investors will only get their 1000x if OpenAI can convince people its risk free to use. So they'll happily cover the legal battle to prove it or spent every last company penny trying

Or, alternatively, copyright risk is a major concern for real customers, and this is a major step forward in addressing that.

Not everything needs to be so cynical. What’s good for investors can be good for users as well.

I'm starting to think that part of the regs around this sector should be that you can't prevent public investment to occur.

I am not a lawyer, but this doesn't seem quite "free". Note that they aren't indemnifying customers for any consequences of said legal claims, meaning that customers would seem to bare the full brunt of those consequences should there be a credible copyright infringement claim.

But it does guarantee that any customer that can’t afford a big legal team uses their big legal team, reducing the chances of a bad (for them) precedent caused by an inept defense.

It also discourages predatory lawsuits against small users of their API by copyright trolls, which would likely end up settled out of court and not give them the precedent they want.

It probably also means having to remain a paying customer as long as you want that protection to persist for any previous output.

costs != damages

It’s not unknown to OpenAI, presumably? And I assume the shield evaporates if their court cases goes against them.

"Everything that is old is new again"

Thats called "...we have Microsoft's lawyers behind us. Bring it on!"

For large-scale usage, it doesn't matter what the devs want. If the lawyers show up and say "We can't use this technology because we're probably going to get sued for copyright infringement", it's dead in the water.

It's a logical "feature" for them to offer this "shield" as it significantly mitigates one of the large legal concerns to date. It doesn't make the risks fully go away, but if someone else is going to step up and cover the costs, then it could be worthwhile.

For large enterprises, IP is a big deal, probably the single biggest concern. They'll spend years and billions of dollars attempting to protect it, cough sco/oracle cough, right or wrong.

That map/travel demo was insane. Trying to find the demo again.

https://www.youtube.com/live/U9mJuUkhUzk?t=2006

(Timestamp 33:26)

Edit: updated the timestamp

~~wat? the video is 45:35 long.~~

Oh! When I replied it was a lot longer — it still had the countdown from before the stream went live. I guess they replaced it with the trimmed version.

Thanks!

It was but most of that functionality was within the "function calling", not really within the assistant as a top 10 of Paris sights isn't really that crazy. Plotting these on a map is the key part which is still your own code, not GPT-based.

Turning an airline receipt pdf into a well structured function call is very nice.

This might also be a bit easier than it seems. I've done similar (though not nearly as nice of a UI) with `unstructured`.

Yep I feel like they solved the problem that Apple never managed to solve with Siri: How to interface it with apps. Seems like this was an LLM-hard problem

My guess is an LLM-based Siri is right around the corner. Apple commonly waits for tech to be proved by others before adopting it, so this would be in-line with standard operating procedures.

My guess is that LLM-Siri will be crippled by internal processes and lawyers

This is kind of the wrong place for this, but given the burst of attention from LLM-loving people: is there any open source chat scaffolding that actually provides a good UI for organizing chat streams and doing stuff with them?

A trivial example is how the LHS of the ChatGPT UI only allows you a handful of characters to name your chat, and you can't even drag the pane to the right to make it bigger; so I have all these chats with cryptic names from the last eleven months that I can't figure out wtf they are; and folders are subject to the same problem.

Seriously, just being able to organize all my chats would be a massive help; but there are so many cool things you could do beyond this! But I've found nothing other than literal clones of the ChatGPT UI. Is there really nothing? Nobody has made anything better?

Organize how?

tree structure. like email.

That would be one very obvious way and a big improvement over the current state of affairs.

This may not be useful to you, but there are browser extensions that add a bunch of functionality to ChatGPT.

The first that comes to mind: https://chrome.google.com/webstore/detail/superpower-chatgpt...

No joy with the one you linked (can't see what problem that one is actually solving), but I'll look through browser extensions -- I hadn't considered that.

I agree why not vector search for history.

ChatGPT Keeper Chrome extension at least allows for search.

Also natural language search of the chat history would be great.

Given that their main goal is still AGI, how does offering better developer tools and nifty custom models that can look at your dog for you help? Is it just bolstering revenue? They said they don't use API input to train their models so it isn't making them constantly smarter via more people using them.

They're in the AGI business the same way Tesla is in the self driving car business

That just isn't true according to literally any evidence. People inside OpenAI, those who've gotten access for various reasons e.g. journalists, Microsoft, other investors, their pattern of behaviour, their corporate governance structure, their hiring practices and requirements, etc. They are true believers, at least the vast majority of them.

They probably want to train on GPT Builder + store rankings to be able to train an AI to effectively spin up new agentic AIs in response to whatever task it has in front of it. If you're familiar with the Global Workspace Theory of consciousness I think they're aiming for something similar to that, implemented in modern AI systems. They'd like data on what creating a new agent looks like, what using it looks like, and how effective different agents are. They'll get that data from people using GPT Builder, people using "GPTs" and their subsequent ratings/purchases, and the sales and ratings data from the GPT Store, respectively.

AGI is what they’ll use to motivate investment in their company. A never reaching goal that promises to deliver growth at some point in the next two decades. That will provide funding to make existing models useful to more than just an over enthusiastic market. If they fail no problem, they “never really meant to make chatgpt work because their goal has always been agi”.

The limiting factor at OpenAI is their internal human developer talent

These tools will help them train and discover the next Ilya Sutskever

They stumbled into a position where they can make a crap ton of money going up the stack, which can fund the ongoing march toward AGI. (The revenue not only is cash in their pocket, but it’s also driving up their evaluation for future investment.)

AGI will be a system of different agents working together, not one mega-model.

Probably have more devs than they know what to do with at this point, so might as well spread them over the existing offerings while having the core work on AGI.

The new assistants API looks both super-cool and (unfortunately) a recipe for all kinds of new applications that are vulnerable to prompt injection.

Do you see a way around prompt injection? It feels like any feature they release is going to be susceptible to it.

I suspect OpenAI's black box workflow has some safeguards for it.

Still, safeguards are quite a lot less safe than if statements. We live in interesting times.

I don’t think there’s any way to guarantee safety from prompt injection. The most you can do is make a probabilistic argument. Which is fine; there are plenty of those, and we rely on them in the sciences. But it’ll be difficult to quantify.

CS majors will find it pretty alien. The blockchain was one of the few probabilistic arguments we use, and it’s precisely quantifiable. This one will probably be empirical rather than theoretical.

Use an llm to evaluate the input and categorise it.

Yes. Hopefully, sandboxing limits the damage somewhat, but it doesn't help if you put any private docs in the sandbox.

Also, the limitations of the Code Assistant tool's server-side Python sandbox aren't described in their API docs. In particular, when does the sandbox get killed? Anyone know? If they're similar to the Code Assistant tool in ChatGPT, then it kills your sandbox within an hour or so (if you go to lunch) which is a crappy user experience.

Running the sandbox on the user's machine seems like a better approach. There's no reason to kill the sandbox if it's not using any server-side resources. Maybe the function-calling API would be useful for that, somehow?

The most immediately useful thing is the price cut, though.

With great power comes great responsibility!

According to [1], the new gpt-4-1106-preview model should be available to all, but the API is telling me "The model `gpt-4-1106-preview` does not exist or you do not have access to it."

Anyone able to call it from the API?

1. https://help.openai.com/en/articles/8555510-gpt-4-turbo

rumours on x are that it will be available 1pm san francisco time

We’ll begin rolling out new features to OpenAI customers starting at 1pm PT today.

^ It says exactly this in the linked article.

oh, totally overread this :D

Same. I am eager to run my code editing benchmark [1] against it, to compare it with gpt-4-0314 and gpt-4-0613.

Edit: Ha, I just re-read the announcement [2] and it says 1pm in the 5th sentence:

  We’ll begin rolling out new features to OpenAI customers starting at 1pm PT today.

[1] https://aider.chat/docs/benchmarks.html

[2] https://openai.com/blog/new-models-and-developer-products-an...

Does anyone have an idea why they are so open about Whisper? Is it the poster child project for OAI people scratching their open source itch? Is there just no commercial value in speech to text?

Everyone’s got a loss leader

speech to text is a relatively crowded area with a lot of other companies in the space. Also really hard to get "wow" performance as it's either correct (like most other people's models) or it's wrong

I've been wondering this as well. I'm super glad, but it seems so different than every other thing they do. There's definitely commercial value, so I find it surprising.

I personally use Whisper to transcribe painfully long meetings (2+ hours). The transcripts are then segmented and, you guessed it, entered right into GPT-4 for clean up, summarisation, minutes, etc. So in a sense it's a great way to get more people to use their other products?

A few notes on pricing:

- GPT-4 Turbo vision is much cheaper than I expected. A 768*768 px image costs $0.00765 to input. That's practical to replace more specialized computer vision models for many use-cases.

- ElevenLabs is $0.24 per 1K characters while OpenAI TTS HD is $0.03 per 1K characters. Elevenlabs still has voice copying but for many use-cases it's no longer competitive.

- It appears that there's no additional fee for the 128K context model, as opposed to previous models that charged extra for the longer context window. This is huge.

Does this mean OpenAI tts is available via api? I saw whisper but not tts - maybe I’m missing it?

It is, indeed!

https://platform.openai.com/docs/guides/text-to-speech

ah that's really great thank you

128,000 token context, Assistants API, JSON mode, April 2023 knowledge cutoff, GPT 4 Turbo, lower pricing, custom GPTs, a good bunch of announcements all-round!

https://openai.com/pricing

I thought GPT-4 had access to internet now?

Per the announcement, the "GPTs" do, natively.

I think everyone else had been hacking it on via "functions"

The “browse with bing” feature allows it to fetch a single webpage into the context, but the new cutoff allows _everything crawled_ to be context (up to the new date, that is)

Does anyone know when this will be coming to Azure OpenAI?

If Azure's history when rolling out GPT-4 is any indication, probably a couple months and/or a staged rollout.

Is Azure adoption really that slow? Ugh.

I would be also interested in knowing when these show up in Azure OpenAI offerings.

So with 128K context window, if you actually input 100K it would cost you:

Input: $0.01 per 1K tokens * 100 = $1.00

$1.00 per query?

Given that each query uses the entire context window, the session would start at $1 for the first query and go up from there? Or do I have it wrong?

It would be $1 for each individual API call, if you were continuing the conversation based on the same 100K input. ChatGPT is stateless.

Right, so that adds up very fast.

If it truly is GPT-4+ with a 128K context window it's still absolutely worth the high price. However if they are cheating like everyone else who has promised gigantic context windows then we are better off with RAG and a vector database.

JSON mode is a great step in the right direction, but the holy grail is either JSON-schema support or (E)BNF grammar specification.

The function calling is JSON Schema support but extremely poorly marketed. I am planning on writing a blog post about it.

Yeah I'm not sure I see the point of "JSON mode", in its current iteration at least, considering function calling already does this more effectively.

I suppose it could help to make simpler API calls and save some prompt tokens, but it would definitely need schema support to really be useful.

It makes it a bit easier to parse returned tabular data, anyways.

I'll be curious to see if it can handle outputting nested data without prompting.

With the assistant API, am I wrong or is it now much cheaper to actually use the API instead of the web Interface? $20 would cover a lot of interactions with the API, and since it’s now also doing truncation / history augmentation the API would have pretty much the same functionality. Thoughts?

Wasn't it always? I've been using a desktop app that talks to the API to access GPT-4, and I've been paying a dollar or two per month.

You would have to build the UI probably a pre-prompt yourself, but yeah it should be fine if the math does work out. I'm not sure it will though, because if it did any company on the planet could launch a thin wrapper around the API, charge less than OpenAI does ChatGPT+, and undercut them that way.

Depends on how much you use it, but yes. I personally use TypingMind with the APIs

I wonder how many startups are obsolete after each OpenAI product release

Not related to your comment. But I see so much future in what ChatGPT can do.

Imagine giving a list of [Input<> Output] pairs, write a minimal program fitting the description in any language, even an Excel macro. Input, Outputs could in future be application interactions.

Adding onto it, imagine a future model where it understands shader toy scripts and its corresponding visual output.

This is like program fitting just as we have techniques for curve fitting and line fitting over a series of data points.

I am super pumped and excited for the future.

If the entire value proposition is a wrapper around the API of a single company, those startups were probably overvalued…

Exactly the same number that haven't been paying attention to How Tech Platforms Work Since 1994. "Embrace, Extend, Extinguish"

Is this just for the API for now?

I just got premium the other day for ChatGPT 4 and have been blown away. I’m wondering if I’ll automatically get turbo when it’s released?

GPT-4 Turbo is already available by default in ChatGPT

I can't find anything that says it's available in ChatGPT

ChatGPT (at least in Plus) when using the GPT-4 model selected (instead of GPT-3.5) currently consistently reports the April 2023 knowledge cutoff of GPT-4-Turbo (gpt-4-1106-preview/gpt-4-vision-preview) as its knowledge cutoff, not the Sep 2021 cutoff for gpt-4-0613, the most recent pre-turbo GPT-4 model release.

The most sensible explanation is that ChatGPT is using GPT-4-Turbo as its GPT-4 model.

Most of the API docs were updated, but none of the new APIs work for me. Are other people experiencing the same?

They will start rolling out at 1pm PST today.

nice it is live now!

got it - thanks

Is langchain still relevant with the release of AssistantAI? It seems managing the context window, state, etc is now all taken care of by Assistant AI.

I guess langchain is still relevant for non-OpenAI options?

The average langchain dev will struggle to even call the API without langchain so I think it's safe

I find the opposite to be true, I use the OpenAI APIs as they are, but really struggle to figure out how the hell LangChain works.

Tiktok AI influencers will be happy to help you

Having ever increasing context is not the silver bullet. For those who believe that the larger the context, the smarter the model, you will find the model still talking nonsense even if it were fed with much larger context.

Even if it were equipped with infinite context, a user cannot dump everything into the conext. For enterprise users, their data volumn can be up to trillion Bytes, and cannot be measured by the number of tokens.

Sure; but before you couldn't even use it for some problems because the problems were bigger than the context window.

For example, I was trying to generate an XSLT 3.0 transformation from one Json format to another. The two formats and description alone almost depleted my context window. In essence, it killed using GPT-4 for this project.

I use it daily, and I haven't had it spit out too much "nonsense" in spite of everyone constantly telling me how that's all it does. The quality of results are on-par with Stackoverflow (in good and bad ways).

Can we get version of ChatGPT Plus where your data is confidential and not used for training, like a light version of ChatGPT Enterprise for individuals?

That exists as a setting, but that same, single setting also disables your web chat history.

That's great, thanks!

I just released a new version of my LLM CLI tool with support for the new GPT-4 Turbo model: https://llm.datasette.io/en/stable/changelog.html#v0-12

You can install it like this:

    pipx install llm

Then set an API key:

    llm keys set openai
    <paste key here>

Then run a prompt through GPT-4 Turbo like this:

    llm -m gpt-4-turbo "Ten great names for a pet walrus"
    # Or a shortcut:
    llm -m 4t "Ten great names for a pet walrus"

Here's a one-liner that summarizes all of the comments in this Hacker News conversation (taking advantage of the new long context length):

    curl -s "https://hn.algolia.com/api/v1/items/38166420" | \
      jq -r 'recurse(.children[]) | .author + ": " + .text' | \
      llm -m gpt-4-turbo 'Summarize the themes of the opinions expressed here,
      including direct quotes in quote markers (with author attribution) for each theme.
      Fix HTML entities. Output markdown. Go long.'

Example output here: https://gist.github.com/simonw/d50c8634320d339bd88f0ef17dea0...

Great tool and example! Makes me wonder, what one can do more with it

Jesus. Yeah, considering the input size, this is a pretty good sign that the 128k context window is working decently well.

My profit margins at https://olympia.chat just got 3x better <3

I think your startup just died

Elaine Jusk…lol

We’re also launching a feature to return the log probabilities for the most likely output tokens generated by GPT-4 Turbo and GPT-3.5 Turbo in the next few weeks, which will be useful for building features such as autocomplete in a search experience.

This is very surprising to me. Are they not worried about people not just training on GPT-4 outputs to steal the model capabilities, but doing full blown logit knowledge-distillation? (Which is the reason everyone assumed that they disabled logit access in the first place.)

How many GBs worth of logits would you need to reverse engineer their model? Also, if it’s a conglomerate of models that they’re using, you’d end up in a blind alley.

I thought the same thing.... My guess is they did a lot of analysis and decided it would be safe enough to do? "most likely" might be literally a handful and cover little of the entire distribution % wise?

Is there a special "developer" designation? I am a paying API customer, but can't see gpt-4-1106-preview in the playground and can't use it via the API.

As other comments have noted it seems to be rolling out at 1pm PST today

Apparently they'll be granting access at 1pm PST. We'll see what happens. Rate limits also don't seem to be updated yet to reflect their new "Usage Tiers" - https://platform.openai.com/docs/guides/rate-limits/usage-ti...

For DALL-E 3, I'm getting "openai.error.InvalidRequestError: The model `dall-e-3` does not exist." is this for everyone right now? Maybe it's gonna be out any minute.

I see the python library has an upgrade available with breaking changes, is there any guide for the changes I'll need to make? And will the DALL-E 3 endpoint require the upgrade? So many questions.

Edit: Oh I see,

We’ll begin rolling out new features to OpenAI customers starting at 1pm PT today.

The documentation/READMEs in the GitHub repo was updated to play nice with the new v1.0.0 of the package: https://github.com/openai/openai-python/

Aha, makes sense, thanks :)

The Assistants API and OpenAI Store are really interesting. Those are the types of things that could build a moat for OpenAI

You think it is hard to export an agent? It's a master prompt, a collection of documents and a few generic plugins like function calling and code execution. This will be implemented in open source soon. You can even fine-tune on your bot logs.

Agreed, the moat are the models (as an extension of the instruction tuning data)

It is interesting that the updates are largely developer experience updates. It doesn't appear that significant innovations are happening on the core models outside of performance/cost improvements. Both devex and perf/cost are important to be sure, but incremental.

128k context?

presumably next model is coming next year?

The playbook OpenAI is following is similar to AWS. Start with the primitives (Text generation, Image generation, etc / EC2, S3, RDS, etc) and build value add services on top of it (Assistants API / all other AWS services). They're miles ahead of AWS and other competitors in this regard.

And just like amazon they will compete with their own customers. They are miles ahead in this regard as well since they basically take everyone’s digital property and resell it.

don't hate the player hate the game.

I am very much looking forward to, but also dreading, testing gpt-4-turbo as part of my workflow and projects. The lowered cost and much larger context window are very attractive; however, I cannot be the only one who remembers the difference in output quality and overall perceived capability between gpt-3.5 and gpt-3.5-turbo, combined with the intransparent switching from one model to the other (calling the older, often more capable model "Legacy", making it GPT+ exclusive, trying to pass of gpt-3.5-turbo as a straight upgrade, etc.). If the former had remained available after the latter became dominant, that may not have been a problem in itself, but seeing as gpt-3.5-turbo has fully replaced its precursor (both on the Chat website and via API) and gpt-4 as offered up to this point wasn't a fully perfect replacement for plain gpt-3.5 either, relying on these models as offered by OpenAI has become challenging.

A lot of ink has been spilled about gpt-4 (via the Chat website, but also more recently via API) seeming less capable over the last few months compared to earlier experiences and whilst I still believe that the underlying gpt-4 model can perform at a similar degree to before, I will admit that purely the amount of output one can reliably request from these models has become severely restricted, even when using the API.

In other words, in my limited experience, gpt-4 (via API or especially the Chat website) can perform equally well in tasks and output complexity, but the amount of output one receives seems far more restricted than before, often harming existing use cases and workflows. There appears a greater tendency to include comments ("place this here") even when requesting a specific section of output in full.

Another aspect that results from their lack of transparency is communicating the differences between the Chat Website and API. I understand why they cannot be fully identical in terms of output length and context window (otherwise GPT+ would be an even bigger loss leader), but communicating the Status Quo should not be an unreasonable request in my eyes. Call the model gpt-4-web or something similar to clearly differentiate the Chat Website implementation from gpt-4 and gpt-4-1106 via API (the actual name for gpt-4-turbo at this point in time). As it stands, people like myself have to always add whether the Chat website or API is what our experiences arise from, while people who may only casually experiment with the free Website implementation of gpt-3.5-turbo may have a hard time grasping why these models create such intense interest in those more experienced.

Would really love to know the results of your benchmark testing.

The rapid deprecation of the models is definitely unsettling. They're on there barely long enough to establish reliable performance baselines within derived services.

I imagine behind the scenes it's all about resource use and cost. What stood out to me during the talk was how much emphasis ("we worked very hard") Altman put on the new price tiers. "Worked very hard" probably just means "endlessly argued with the board". It'a little sad that technical achievements take back seat to tug of war with moneybags.

For all the hate: Elon ships. And OpenAI ships.

People claim OpenAI is closed, that they are controlled by Microsoft, that they don't care enough about safety...

But the fact is, Anthropic, Google Brain, even Meta -- OpenAI blows them all out of the water when it comes to shipping new innovations. Just like Twitter ships much more now with Elon, and how SpaceX ships much more than NASA and Blue Origin.

If you disagree, give me just one logical reason why. It's just a fact.

As if shipping was the end goal...

Other than coding, do we have a good application of LLMs?

I doubt their application is suitable for coding, unless of course the goal is to create bugs or nonsense.

I’m confused by the pricing. Gpt-4 turbo appears to be better in every way, but is 3x cheaper?!

The same as true of GPT-3.5-turbo compared to the GPT-3 models which preceded it.

They want everyone on GPT-4-turbo. It may also be a smaller (or otherwise more efficient) but more heavily trained model that is cheaper to do inference on.

Something I'd really like to see is GitHub integration. Point it at a git repository, have it analyze it and suggest improvements, provide a high level break down, point me towards the right place to make changes.

This shouldn’t be too difficult, the api allows writing such a tool and with the „assistant API“ it should hopefully be able to put attention on the right parts… So: git clone -> system prompt -> add files of repo to messages -> get answer…

While this makes some of what my startup https://flowch.ai does a commodity (file uploads and embeddings based queries are an example, but we'll see how well they do it - chunking and querying with RAG isn't easy to do well), the lower prices of models make my overall platform way better value, so I'd say overall it's a big positive.

Speaking more generally, there's always room for multiple players, especially in specific niches.

Their system also does not seem to support techniques like hybrid search, automated cleaning/modifying of chunks prior to embedding, or the ability to access citations used, all of which are pretty important for enterprise search.

Could just mean it's coming, though.

Reproducible outputs and log probabilities

The new seed parameter enables reproducible outputs by making the model return consistent completions most of the time. This beta feature is useful for use cases such as replaying requests for debugging, writing more comprehensive unit tests, and generally having a higher degree of control over the model behavior. We at OpenAI have been using this feature internally for our own unit tests and have found it invaluable

This will be useful when refining prompts. When running tests, at times I wasn't sure if any improvement from a prompt change was the result of random variation or an actual improvement.

Nicely spotted! Yeah, even with temperature turned all the way down, the variation in results makes it harder to test.

They need to tone down the "GPT" "persona" marketing if they don't want a backlash. It's one thing releasing AI and saying "do what you want with it" but it's another to actively list and illustrate the people it can replace.

Those personas aren't listing jobs, they're listing tasks. If your job is just a task then it's going to be replaced by something anyway even if OpenAI specifically forbids their model from ever doing it.

That said, we should have comprehensive retraining and guaranteed jobs programs, or a UBI. Either would ameliorate the stress on the employment market. When people require their current job to provide them and their family with food, shelter, water, and medical care and someone takes that away, they are going to react regardless of how inevitable it was, and they're right to do so, because people have a right to self-defence.

We just changed a project we've been working on to try out the new gpt-4-turbo model and it is MUCH faster. I don't know if this is a factor of the number of people using it or not, but streaming a response for the prompts we are interested in went from 40-50 seconds to 6 seconds.

I noticed that too but I think it's because we are hitting new servers that just went online. They will probably get saturated and slower with time when other gpt-4 users start using gpt-4-turbo.

The Assistants playground doesn't seem to be available yet

https://chat.openai.com/gpts/editor

you currently do not have access to this feature :(

The new announcement just wiped out a bunch of startups

Any examples?

There are a lot of huge announcements here. But in particular, I'm excited by the Assistants API. It abstracts away so many of the routine boilerplate parts of developing applications on the platform.

how so?

If they could roll back the extreme rate-limiting on dalle 3 in gpt4, that would be great.

What context length will ChatGPT have on GPT-4-Turbo? It wasn't using the full 32K before was it?

Many people are reporting errors in the playground

Failed to update assistant: UserError: Failed to index file

The Assistants API is really cool. Together with the retrieval feature, it makes me wonder how many companies OpenAI killed by creating it.

It is just a matter of time before they get a huge disruption. Yes, by the definition of success they have accomplished something extraordinary. I have used OpenAI playgrounds before they even made a mark and I knew someday they were going to wow everyone. The problem that I sought to be impacting individuals out of their hard work is the lack of credibility. Any content that OpenAI used during the training needs to cite the origin and list the success rate. If they are allowed to profit of previous work, well guess what, the original content makers deserve the same.

Don’t let my input discourage you; this is going to make everyone super efficient and it is definitely going to help us grow in areas we lacked intel but I just think that their business model screws the living financial status of those who actually make answers valid.

I am still hoping to see some inline models, compete with OpenAI, using consumer grade hardware. But for now I will continue to be a customer because I have no other great choices. Cheers to the unlimited source of knowledge.

Any ETA on when will the new GPT4 turbo release out of preview. I really want to use it in my production app but 100 RTD limit is prohibiting that, I guess they will remove it once out of preview.

only thing I learnt: openai will come for your customers if you depend on it

So over a year later and openai couldn’t be further ahead of all its competition. Google is still trying to catch up with its ai-flavoured Google search 2.0 and it’s becoming painstakingly clear that this was also the wrong path taken. They’re not even playing in the same league.

did they break the api?

from openai import OpenAI

Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name 'OpenAI' from 'openai'

If so where is the current documentation?

Few queries out of ignorance.

What are some use cases for 128k context length?

can't stop lolling at this example. wow, simple division. and all it required was 8 api calls.

Didn't the tickets to Dev Day cost around 600$? They basically took that money and gave it back to developers as credits so they can start using their API today! Pretty smart move!

The new TTS is much cheaper than eleven labs and better too.

I don't know how the model works so maybe what i'm asking isn't even feasible but i wish they gave the option of voice cloning or something similar or at least had a lot more voices for other languages. The default voices tend to make other language output have an accent.

Uh if turbo's the much faster model a few have had access to in the past week, then pressing x on the "more intelligent than legacy 4" statement.

Excited to see GPT4-Turbo and longer sequence lengths from OpenAI. We just released Vectara's "Hallucination Evaluation Model" (aka HEM) today https://huggingface.co/vectara/hallucination_evaluation_mode... (along with this leaderboard: https://github.com/vectara/hallucination-leaderboard). GPT-4 was already in the lead. Looking forward to seeing GPT4-Turbo there soon.

How many startups got shafted today?

Awesome. Adding GPT-4 Turbo and DALL·E 3 to my ChatGPT macOS client[0]

[0]: https://boltai.com

The TTS seems really nice, though still relatively expensive, and probably limited to English (?). I can’t wait until that level of TTS will become available basically for free, and/or self-hosted, with multi-language support, and ubiquitous on mobile and desktop.

For the Assistants API with unlimited context length it’s not clear to me how the pricing works. Do you pay only for the incremental tokens per message for follow up messages, or does each new message cost the full prior context amount + the cost of the message itself?

Any idea what the deal is with what looks like the Singapore coats of arms here? https://www.youtube.com/live/U9mJuUkhUzk?si=H8yYWiuJvaxVhIsV...

I wish dalle3 had inpainting and variations.

They are competing with an awesome product in midjourney and need to have at least these as minimum features if they want to compete.

128k context is getting up to the point where it will fit all of a moderately large codebase, right?

What's definitely interesting is the speed of the new gpt-4 turbo model. It is blazing fast, I would guess something like 3x or 4x the speed of 3.5 turbo.

We need some independent benchmarks (LLM elo via chatbot arena etc) about how gpt4 Turbo compares to gpt4.

One step closer to augmenting day to day internet browsing with the announcement of the GPT's

Text to Speech is exciting to me, though it's of course not particularly novel. I've been creating "audiobooks" for personal use for books that don't have a professional version, and despite high costs and meh quality have been using AWS.

Has anybody tried this new TTS speech for longer works and/or things like books? Would love to hear what people think about quality

In the keynote @sama claimed GPT-4-turbo was superior to the older GPT-4. Have any benchmarks or other examples been shown? I am curious to see how much better it is, if it all. I remember when 3.5 got its turbo version there was some controversy on whether it was really better or not.

Stream of keynote: https://youtu.be/U9mJuUkhUzk?t=1806

Related ongoing threads:

GPTs: Custom versions of ChatGPT - https://news.ycombinator.com/item?id=38166431

OpenAI releases Whisper v3, new generation open source ASR model - https://news.ycombinator.com/item?id=38166965

OpenAI DevDay, Opening Keynote Livestream [video] - https://news.ycombinator.com/item?id=38165090

Okay it's 1pm PT. How are you testing when you get the new features? Just running a curl or something until it works? :)

This round of updates seems amazing. I need to pour over these docs and start experimenting.

It only got a quick mention during the keynote, but I think the most important new feature for me, in terms of integrating ChatGPT into my own workflows, will be the ability to get consistent results back from the same prompt using a seed.

Nice I have access, not sure what I am gonna test it with.

The ChatGPT app is broken on IOS after the update. Image generation and code analysis no longer work

Can I pay someone to have my ChatGPT transcripts searchable?

Under Custom models:

This will be a very limited (and expensive) program to start—interested orgs can apply here.

Something about the "(and expensive)" part was refreshing. Probably there to cut down on applications from those who can't afford it, but still.

I sure hope this carrot thrown down to the masses is not going to slow down open models development.

When the deal looks too good to be true. You're not a customer. You're a product/a resource to mine. In case of (not-at-all)OpenAI this is doing two things. Killing competition by running their services below costs(this used to be illegal even in the USA) and gathering massive amounts of human generated question/ranking data. I'm not sure about others, but I'm getting quite a few of these "which answer is better" prompts.

Why do I hope for the continued progress in open models even if this is so much more powerful/cheap to run? Because when you're not a customer, but a product the inevitable enshittification of the service always ensues.