ChatGPT with voice is now available to all free users

I’ve been using this for several weeks now and it’s replaced podcasts for me. I’ll just pick a topic and then ask for a summary and then ask clarifying questions. Other times I’ll just goof off. In one chat, I’m rewriting the Star Wars sequels movie by movie with ChatGPT. I’ve realized that I’m built to learn socially, and ChatGPT never tires of my questions or gets annoyed.

One interesting story. My 15 year old kid was talking to it about Percy Jackson, and as Gen Z is prone to do, was using the phrase “kind of” quite a bit. At some point in the conversation, ChatGPT started using “kind of” as well in the audio. I imagine the underlying prompt is “make sure your answers are clear to the user” and somehow it’s determining that “kind of” is a way to build clarity in that context, or the phrase is informing the token set. Unsure but it was a little unsettling.

For my part I find the natural pauses and accentuations really helpful. It’ll slow down when it’s hitting an “important” or complicated bit of information, and just like with humans, the pause and change in tone are indicators.

How can you tell if ChatGPT is lying to you and what you're learning is total hogwash?

I imagine the stakes of rewriting the Star Wars films are sufficiently low that this risk is acceptable...

   I’ll just pick a topic and then ask for a summary and then ask clarifying questions.

Probably don't want to start quoting hallucinated facts!

Good thing podcasts never have false/misleading information on them...

This is an absurd argument that you are just making for the sake of argument and you know it. There's a difference between someone getting things wrong and a LLM shamelessly making something up out of the whole cloth.

No; I'm much more confident in ChatGPT's accuracy than I am in a random podcast's accuracy.

Podcasts tend to be relatively terrible, accuracy wise. If the alternative is learning from ChatGPT, your odds of getting correct information is substantially higher.

Podcasts are entertaining! But not where I would go to learn anything.

Listen to better podcasts. :P

It's a mistake to listen to a podcast to learn important information.

That's just like, your opinion, man.

There are podcasts for almost every topic where experts are present. Journalists, Scientists, Activists, Researchers etc. can be heard in podcasts, I don't really see why it's generally a mistake to listen to a podcast to learn important information.

During the pandemic my partner was attending university from home and listening to their professors via MS Teams, these classes were also recorded so that they could listen to them at a later point. In some ways that's just a professional podcast.

I think you know that isn’t a podcast in the sense that we’re discussing.

And depending on the class and the institution, I may still trust ChatGPT more than what gets taught.

Of course the part about university classes is different, but you seem to ignore everything else I've said.

There are tons of podcasts involving experts talking about their field of expertise, how can it be a mistake to listen to such podcasts to gather information?

It’s a mistake because what drives podcasts are their popularity, not their accuracy.

There is no “sort by accuracy” button in any podcasting app, nor are they peer reviewed.

Furthermore, podcasts are not a review of the body of knowledge on a subject; they’re often a complete layperson interviewing a single member of a given field, at best. Almost never do the views of any individual actually represent any field as a whole.

So once we’ve thrown out the concept of accuracy and completeness, ChatGPT fares exceedingly well in comparison. You’d do much worse than ChatGPT for idle conversation level accuracy.

What you just wrote makes more sense applied to LLM output than podcasts! You'd just as easily argue that "radio" or "news" is all bad if you don't want to differentiate between different forms of expression and communication within a medium. (Which, obviously, would be silly)

Sorry what? Nothing I wrote apples to LLMs; they are not optimized for popularity, they’ve been meticulously designed and built to be as accurate as possible.

They absolutely have not been meticulously designed to be as factually accurate as possible.

https://platform.openai.com/docs/guides/fine-tuning

Yes, they have.

That link doesn't say anything about the fundamental design goals of the network architecture or training process. It doesn't even mention factual correctness, except in the sense that it may broadly fall under "producing a desired output".

No they're not. If they were, they would default to 0 temperature and have no Top P, frequency/presence penalty, and frankly not have knowledge as a function of language to begin with. They're designed to be convincing as a "presence" and output reasonable sounding language in context, with accuracy as an afterthought.

Almost never do the views of any individual actually represent any field as a whole.

Setting aside the many individuals who currently lead their fields, history is filled with groundbreaking heretics.

https://informationisbeautiful.net/visualizations/mavericks-...

History is filled with many more plain ol’ heretics however, and we’re exceptionally bad at telling the difference in the meantime.

You, most importantly, are complete garbage at telling the difference between a crackpot and an innovator in a field you know nothing about.

Trying to drink unpasteurized knowledge will infect you a lot more quickly than it will enlighten you.

I'd say that I regret engaging with you, but that last line is golden.

It's not censorship, propaganda and official disinformation, it's "Pasteurized Knowledge®"

But that's the problem; you falsely believe you're capable of differentiating between propaganda and quality. You're not, on topics of which you are not an expert.

That's what flat-eartherism is, that's what jewish space lasers are. The argument you're giving is a tacit endorsement of that kind of "inquiry", which for reasonable people is unconscionable.

This is such a weird take. Ppl don't listen to podcast like wikipedia page to learn 'facts' . Why are ppl even comparing chatgpt to podcasts.

why are you forced to listen to "random" podcasts vs podcasts by people you trust?

This argument is so dishonest its infuriating.

Trust is not a rational behavior, nor is it indicative of reliability of fact.

Sure but that doesn't make it "random" . I don't get why top comment was listening to podcasts knowing full well that you will be given possibly inaccurate facts. I have hard time accepting that podcasts are audible wikipedia, such a strange take is top voted comment.

I listen to podcasts; they’re fun! But I am comfortable operating with incomplete information, and thus know how to treat a low quality source. I also chat with LLMs to better understand topics, and am better off than the folks here who can’t do that as a result.

The main thing I’m getting from this discussion is that a lot more very smart people seem to have deluded themselves into thinking knowledge is objective or “locked in” than I had initially realized. The desire for certainty is an extremely human thing, but it’s a dead end, intellectually.

yea exactly "chatgpt replaced my podcasts" is kind of silly. People don't listen to joe rogan experience to learn facts about how particular jiu jitsu choke works. I don't know what kind of podcasts op replaced with chatgpt but i call BS .

Problem with ChatGPT isn't that they get facts wrong, but that they're what the categorical name suggests, large language models.

At one point I came across this series of "are CJK languages related" questions in Quora with cached ChatGPT responses[1], all grammatically correct and very natural, largely turboencabulators, sometimes contradicting even within a single response.

Podcasters? They're not _this_ inconsistent.

1: https://imgur.com/a/7tlNDll

What's the issue with the answer for "Are Korean and Chinese mutually intelligible"? The highlighted part is definitely true, at least.

That part is correct, but not consistent with others.

My problem is not that GPTs are too often wrong, it's that they always prioritize syntax over facts since they are _language_ models.

The sentence "Colorless green ideas" always make more sense to LLM than "Water is wet", simply because the latter is syntactically invalid, and that would be problematic for many use cases including Podcast replacement. Sometimes us humans want AI to say "water is definitively wet", and that has been attempted by forcing LLM to accept that factoids are more syntactically correct, but that isn't a solution and it's still an architectural problem for these pseudo-AGI apps.

Be aware that you're talking about Quora's implementation of ChatGPT here. As far as I know, the cached answers were generated with an incredibly outdated version, which is definitely not indicative of its current quality.

Even worse, I think they actually prime it with answers already posted on the thread, or even just related threads. For example, one of the answers to the first question mentions the same Altaic root as ChatGPTs answer, and I've found multiple people that are seeing their own rephrased answers in the response.

If you preprompt ChatGPT with questionable data, then the answer quality will be massively degraded. I've noticed many times now that Bing will rephrase incorrect information or construct a very shallow summary out of unrelated articles when internet searches are allowed, but is able to generate a cohesive and detailed summary when they're disabled.

Throwing random answers - some contradicting each other and some talking about subtly different aspects of the topic - into a session without further guidance just isn't a great idea.

Slightly off-topic/meta but this debate reminds me of one people had 20 years ago: Should you trust stuff you read on Wikipedia or not?

In the beginning people were skeptical but over time, as Wikipedia matured, the answer has become (I think): Don't blindly trust what you read on Wikipedia but in most cases it's sufficiently accurate as a starting point for further investigation. In fact, I would argue people do trust Wikipedia to a rather high degree these days, sometimes without questioning. Or at least I know I do, whether I want to or not, because I'm so used to Wikipedia being correct.

I'm wondering what this means for the future of LLMs: Will we also start trusting them more and more?

Is there a similar difference between an LLM getting things wrong and someone shamelessly making something up out of the whole cloth? That's closer to what actually happens.

I'm pretty sure that in the podcasts I listen to, no one is shamelessly making something up completely.

The kinds of mistakes an LLM makes aren’t nearly as bad as the kind of nonsense you can find humans saying, particularly in podcasts.

Podcasts make stuff up from thin air all the time. Sometimes even maliciously.

they definitely do but i think in podcasts i generally have a better ability to evaluate how much trust i should place in what i'm hearing. i know if i'm listening to a professional chef podcast, i can probably trust they will generally be right talking about how they like to bake a turkey. if i'm listening to the hot wings guy interview zac efron i know to be less blindly trusting of their info on astrophysics.

with chatgpt i don't know it's "experience" or "education" on a topic and it has no social accountability motivating it to make sure it gets things right so i can't estimate how much i should trust it in the same way.

I guarantee ChatGPT won't lie to you about how to bake a turkey.

The paranoia around hallucination is wildly overblown, especially given the low stakes of this context.

I don't know how people use ChatGPT at all. It confidently hallucinated answers to 4 out of 5 my latest "real" questions, with code examples and everything. Fortunately with code I could easily verify the provided solutions are worthless. Granted I was asking questions about my niche that were hard enough I couldn't easily Google them or find a solution myself, but I think that's the bar for being useful. The only thing it got right was finding a marketing slogan.

Try asking it about the Roman empire (e.g. “who was Nero?”) then checking the answer.

It’s very good at things like that. Go down a whole Roman Empire rabbit hole, have fun with it!

This is what an idle afternoon talking to ChatGPT is about, not trying to get it to do your job for you.

Here, I did it for you:

Nero was a Roman Emperor from 54 to 68 AD, known for his controversial and extravagant reign. He was the last emperor of the Julio-Claudian dynasty. Here are some key points about his life and rule:

1. *Early Life and Ascension*: Nero was born Lucius Domitius Ahenobarbus in 37 AD. He was adopted by his great-uncle, Emperor Claudius, becoming Nero Claudius Caesar Drusus Germanicus. He ascended to the throne at the age of 17, after Claudius' death, which many historians believe Nero's mother, Agrippina the Younger, may have orchestrated.

2. *Reign*: Nero's early reign was marked by influence from his mother, tutors, and advisors, notably the philosopher Seneca and the Praetorian Prefect Burrus. During this period, he was seen as a competent ruler, initiating public works and negotiating peace with Parthia.

3. *Infamous Acts*: As Nero's reign progressed, he became known for his self-indulgence, cruelty, and erratic behavior. He is infamously associated with the Great Fire of Rome in 64 AD. While it's a myth that he "fiddled while Rome burned" (the fiddle didn't exist then), he did use the disaster to rebuild parts of the city according to his own designs and erected the opulent Domus Aurea (Golden House).

4. *Persecution of Christians*: Nero is often noted for his brutal persecution of Christians, whom he blamed for the Great Fire. This marked one of the first major Roman persecutions of Christians.

5. *Downfall and Death*: Nero's reign faced several revolts and uprisings. In 68 AD, after losing the support of the Senate and the military, he was declared a public enemy. Facing execution, he committed suicide, reportedly uttering, "What an artist dies in me!"

6. *Legacy*: Nero's reign is often characterized by tyranny, extravagance, and debauchery in historical and cultural depictions. However, some historians suggest that his negative portrayal was partly due to political propaganda by his successors.

His death led to a brief period of civil war, known as the Year of the Four Emperors, before the establishment of the Flavian dynasty.

I also did a brief fact check of a few details here and they were all correct. Zero hallucinations.

Does this make sense? Notice how little it matters if my understanding of Nero is complete or entirely accurate; I’m getting a general gist of the topic, and it seems like a good time.

It may matter little to you that your understanding is not complete or entirely accurate, but some of my worst experiences have been discussing topics with people who think they have anything insightful to add because they read a wikipedia page or listened to a single podcast episode and then decided that gave them a worthwhile understanding of something that often instead takes years to fully appreciate. A little knowledge and all of that. For one, you don't know what you're missing by omission.

It also doesn’t matter to you, unless you’re claiming you only ever discuss topics you’re extremely well versed in.

This is missing the broad concern with hallucination: You are putting your trust in something that delivers all results confidently, even if they were predicted incorrectly. Your counter-argument is lack of trust in other sources (podcasts, the education system), however humans, when they don't know something, generally say they don't know something, whereas LLMs will confidently output incorrect information. Knowing nothing about a certain subject, and (for the sake of argument) lacking research access, I would much rather trust a podcast specializing in a certain area than asking a LLM.

Put more simply: I would rather have no information than incorrect information.

I work in a field of tech history that is under-represented on wikipedia, but represented well in other areas on the internet and the web. It is incredibly easy to get chatGPT to hallucinate information and give incorrect answers when asking very basic questions about this field, whereas this field is talked about and covered quite accurately from the early days of usenet all the way up to modern social media. Until the quality of training data can be improved, I can never use chatgpt for anything relating to this field, as I cannot trust its output.

I am continually surprised by how many people struggle to operate in uncertainty; I am further surprised by how many people seem to delude themselves into thinking that… podcasts… can provide a level of certainty that an LLM cannot.

In life, you exceptionally rarely have “enough” information to make a decision at the critical moment. You would rather know nothing than know some things? That’s not how the world works, not how discovery works, and not even how knowledge works. The things you think are certain are a lot less so than you apparently believe.

I've done this and, from what I can tell, it is reasonably accurate. However, I did have an instance where I was asking it a series of questions about the First Peloponnesian War, and partway through our discussion it switched topics to the first part of the Peloponnesian War, which are different conflicts. At least, I think they are. It was quite confusing.

i appreciate your confidence and would love to know how far you would go with the guarantee ! it makes me realize that there is at least one avenue for some level of trust about gpt accuracy and that's my general awareness of how much written content on the topic it probably had access to during training.

i think maybe your earlier comment was about the average trustworthiness of all podcasts vs the same for all gpt responses. i would probably side with gpt4 in that context.

however, there are plenty of situations where the comparison is between a podcast from the best human in the world at something and gpt which might have less training data or maybe the risks for the topic aren't eating an uncooked turkey but learning cpr wrong or having an airbag not deploy

There are zero podcasts from “the best person in the world”, the very concept is absurd.

No one person is particularly worth listening to individually, and as a podcast??? Good lord no.

LLMs beat podcasts when it comes to, “random exploration of an unfamiliar topic”, every single time.

The real issue here is that you trust podcasts so completely, by the way, not that ChatGPT is some oracle of knowledge. More generally, a skill all people need to develop is the ability to explore an idea without accepting whatever you first find. If you’re spending an afternoon talking with ChatGPT about a topic, you should be able to A) use your existing knowledge to give a rough first-pass validation of the information you’re getting, which will catch most hallucinations out of the gate, as they’re rarely subtle, and B) take what you learn with a hefty grain of salt, as if you’re hearing it from a stranger in a bar.

This is an important skill, and absolutely applies to both podcasts and LLMs. Honestly why such profound deference to podcasts in particular?

It has lied to me about extremely basic things. It abssolutely is worth being paranoid about. You simply cannot blindly trust it.

Good thing nobody here is suggesting blind trust. The mistake being made here is thinking I’m suggesting LLMs are a good way to learn. What I am instead saying is that podcasts are not a good way to learn, and should be treated with the sane level of skepticism one holds for an LLM response.

One thing that could happen, sooner rather than later, is that agents are deployed that have access to a facial database of information relevant to the topic. So, there could be a GPT of that day’s NYT Daily with a high ranked database of info on, say, Israel and Hamas, and the text synthesis generates an overview but the user can “ask” for clarifying details to drill down, and responses.

I consider the information I’m receiving at a “water cooler” level of factual accuracy.

That said there are a few topics I’ve asked where I know enough to know (I had a grad degree in US history) and I’ve found it’s about on par with what you’d dig up through Google.

I’ve also found recently that it’ll state when it can’t equivocally say something. Or, in some cases, when I have misunderstood something and repeat it back incorrectly to try and clarify, it’ll correct me. 6 months ago, over text interface, it never would have corrected me: it almost always assumed the user was right and went along with it.

But again - it’s not like I’m fact checking Frank at the water cooler (or, often anyway). It’s shooting the shit and it’s great for that.

It can't be worse than TED talks, about half of which suffer from the reproducibility crisis (and therefore likely false).

My 15-year old loves the Percy Jackson series, and he said that the audio conversation he had with ChatGPT was the first time he felt like like “someone” “got” Percy Jackson, in “real life.” Lots of airquotes there because there isn’t a someone, and ChatGPT doesn’t get it, and what is real even? But the conversation was so good that my 15-year is now totally creeped out by ChatGPT and refused to use it after that that 30 min conversation.

You take about low stakes but fiction is a massive industry. In making what is essentially Star Wars fanfic with ChatGPT, I’ve realized why AI was such a contentious point for the recent writers strikes.

As long as we're entertained, who cares?

If you're replacing a factual podcast with it, maybe you should care a little...

How can you be sure the podcast is factual, are you fact checking everything they say on the podcast?

Every source of information has a bias, but with ChatGPT that's changing every time you hit "enter"

1. The primary source of bias will still be the person writing the prompt. The 'most probable completion' for the podcast will depend a lot on the setup. 2. I'm not sure it's going to vary more than a human day to day is it?

1. Absolutely not, ChatGPT has a mountain of biases that are simply _uncovered_ by prompting. To call this bias via prompting is intellectually dishonest.

2. No, ChatGPT is borderline schizophrenic. I trust a human and team of writers and producers to be more consistent in their bias and truthiness outright than a do a program that's trained on Reddit / Twitter and has almost no grounding.

Might be able to ask the same question to someone listening to real podcasts.

People never consider the base rate. They compare AI things to gods rather than what it is replacing, which is usually a low-value wiki or podcast.

It's more commenters here revealing how low their standards are when they say "well people and/or podcasts spread false info too".

No actually, you can pay people to lie less and prove that they do so. We pay $20/month for ChatGPT to straight up lie to us and we know we need to sort it out.

“These people have no idea what they’re talking about” is a frequent reaction to podcasts I hear. To the point that there are very few I want to keep listening to.

ChatGPT lies to me but only in a specific way.

If you've used it a lot this year for coding like many people, you'll have found that you develop the ability to know when it doesn't know what it talks about.

Perhaps you're one of the people who think they're above it, and hence haven't developed that ability?

And you know not to push it further down the path of ‘not knowing’. And even before your next prompt, you can feel it’s got a foot down the wrong path. Nudge it to fix it, and/or move on.

Right now, chatgpt is trained on a broad spectrum of slightly dated information, so hopefully the biases even out. And we know it hallucinates etc.

In the future I see two big risks with the tech, approach and ownership:

1) short term, SEO is going to move into trying to influence the LLMs to push products unwittingly. Companies will be getting great offers from shady SEO outfits promising to make their products the default answers for broad questions?

2) mid term, now that all sense and pretence of safety and greater good are gone from openai, openai will be working out how to insert and see ads in the results? There will probably even be product placement in the results coming from the paid-for in-ms-office version.

I've been developing understanding from using it for coding. I guess a lot of us are developing this understanding. It is very trustable about high level things many people know. As the knowledge is more specific or esoteric the less trustworthy it is.

If my mom asks it questions about my work industry, they are vague and high level enough that chatgpt is very accurate in it's answers.

If I ask it probing questions I care about my industry it hallucinates or fails to grasp key terminology.

So, if I was OP, likely asking it broad questions about things I know very little about is quite safe. Going too far down a rabbit hole to something super specific I would want to verify on Wikipedia.

Me, as a meat popsicle, can you explain to me how the phrase “kind of” is not common?

I feel like “kind of” is kind of a common adjective phrase, right?

You can entirely remove it from your speech and lose very little.

This is true in the narrow sense of it not changing the literal meaning of the sentence much. In a broader sense, using "kind of" a lot is part of a sociolect (in this caze zoomers) and you would lose some sense of beloning to that group by dropping it from your speech.

fr fr, low-key this is true.

I disagree. These linguistic innovations (including "like") serve to soften the message. It signals to the listener that the information is not based on considered truth, but on impressions, vibes. This is genuinely how its used, to signal the difference between a personal opinion and something somebody is convinced of. It's kind of like adding "but I could be wrong" to the end of everything

It is, kind of! I definitely use it frequently, but the quantity that my kids use it is at a whole other level. But it’s a colloquial phrase that I rarely, if ever, hear ChatGPT use in the audio presentation, but looking at the chat transcript, there’s definitely a point at which it begins to use “kind of” in multiple ways in its responses, and it wasn’t before.

A more blatant example: it will start responding in French if you start speaking to it in French. These things are pattern machines and it makes sense that it will pick up on the lingo they fed (just like we will)

I was playing with the voice interface and pointed out something minor it got wrong (a misunderstood word in my input, not a hallucination/confabulation). The voice response began with, “Oops, my bad”, which made me laugh.

I’ve been using this for several weeks now and it’s replaced podcasts for me. I’ll just pick a topic and then ask for a summary

This is why I could never get “into podcasts”

Which apparently can mean 100 different things, but some popular radio talk show style ones I’ve been exposed to are just two guys rambling about a topic for two hours with lots of filler, where one person is essentially reading a Wikipedia article but getting interrupted every 5 words by the other, to go on - what are supposed to be - comical tangents.

Like, this could have been a 30 second sentence.

Far too frustrating for me.

I get exposed to them on long car rides with other people and that’s shaped my entire opinion.

I’ll just pick a topic and then ask for a summary and then ask clarifying questions.

I made a public GPT for a similar purpose. Some examples of my conversations with it:

https://youtube.com/@Tom_Gally_UTokyo

It is very trustable about high level things many people know. As the knowledge is more specific or esoteric the less trustworthy it is.

That's my experience, too.

In addition to sticking to general topics, another fun way to talk with it is to discuss counterfactuals. "Hey, GPT! What do you think the world would be like now if Germany had won World War II [smartphones had not been invented, half of all people were born blind, etc.]." Those questions don't have right or wrong answers, and the answers it does come up with can be thought-provoking.

This remind me of the rick and morty episode, keep summer safe.

https://youtu.be/hguiN8oo0dI?si=8SWsnCMkjZ5bYLH9

"I’ve realized that I’m built to learn socially"

That sentence really resonated with me and sums up a bunch of ideas that have been swirling in my head recently. I've been "interviewing" ChatGPT about world history for a few months now and even turned it into an unsuccessful podcast! I really enjoy learning and creating much more than I would enjoy listening to my own podcast, though :)

it’s replaced podcasts for me

Curious. Which podcast did you replace?

Interesting - this was actually a feature that got me to convert. I have some slight annoyances with how it handles natural pauses as I collect my thoughts and speak, but overall it's been great.

I've had some interesting discussions and it's helped me structure some thoughts by asking questions and follow-ups, then summarizing our conversation... all while I'm on a walk.

One fun thing I did with my family was have it do an interactive adventure story starring us. We had an adventure, and then used the built in DALL-E to generate images of scenes from our adventure.

Yeah, I wish there was an option to verbally cue that you were finished talking... like an "over and out" thing. Do you know about the feature where you can press the circle to force it to listen and then release for it to answer?

It's interesting how in my frustration I intuitively tried that push-to-talk sorta-feature and it worked. Integrating something like Whisper and streaming live text could be neat, especially considering how different cultures handle conversational pauses and turn-taking. Wondering why there's no move towards full-duplex conversations in such tools.

Because it’s hard. I’m not aware of any full duplex speech assistant existing yet.

I wondered if this was possible, so I tried to build it, and I ended up making an app: Call Annie (https://apps.apple.com/us/app/call-annie/id6447928709). Feel free to try it out and let me know what you think

You could probably instruct it to behave that way.

I use it a lot while driving, and that's hard to do in that scenario, but it's cool otherwise.

Tell the AI to respond with "..." until you end your prompt with a certain phrase.

While walking the dog today, it talked me through some trade-offs between DBSCAN and isolation forests. Walking + verbalizing the problem is a very different and positive experience for me.

I've also used it several times on ~15-20min drives to memorize something I wanted to have available for immediate recall. I had it chunk & quiz me, and by the end of the drive I had it down pat. Fun use of drive time.

A word of caution - I've asked ChatGPT 3.5 to generate quizzes based on books before and while most answers were right, a few were technically wrong, and some were outright fabrications (presented very confidently!)

LLMs fabricating things? Well I never!

The memorization use case is brilliant. Put your talk track for a presentation in and say “help me memorize this by quizzing me”. Thanks!

For a second there, I genuinely thought you had a very smart dog!

I had it chunk & quiz me

Can you go a bit into how this works? How do you prompt it? This is the first use of ChatGPT I've heard of that would directly benefit me.

I have some slight annoyances with how it handles natural pauses as I collect my thoughts and speak

I hate this about all voice assistants. They force you to speak in an unnatural way.

It seems like there’s nothing intelligent about when they decide to respond, it’s like they just wait for X milliseconds of silence instead of using the context of what’s being said like a human would.

Sometimes it’s the opposite problem. You finish asking your question but you gotta wait for the assistant to pick it up whereas a human would understand that you’re finished talking based on what you said.

It might seem small and unimportant but I really think it’s one of the main reasons why voice assistants feel so… artificial.

That and long ass responses.

They force you to speak in an unnatural way.

I think the same thing can happen when speaking to anyone with different societal/cultural factors than yours. We just become more accustomed to it over time and it's less noticed. I think if this GPT had a big green alien face then we would find speaking to it less strange, somehow.

Yeah, but I don't want to be code-switching with a computer.

One of the very few things I like about Alexa is that it has an option for being more forgiving about stammers and pauses. It actually works pretty well.

Maybe the same will make it into ChatGPT at some point.

That and long ass responses.

you can ask it to be terse and shorten answers in the custom instructions section.

I hate this about all voice assistants. They force you to speak in an unnatural way.

It would be great to have an option to have ChatGPT stop when saying "over" - it does not feel natural to have to blurt the questions all at once.

Oh, that's sounds like it could be a lot of fun. Thanks for sharing. I'll remember that for when my little one is a bit older :)

I just tried it out and I have to say I am blown away. I've never been someone who likes the voice capabilities (tried Siri once or twice, hated it) for just about any service as I can type pretty quickly but this is intelligent enough that I can have an unstructured conversation and still get to the results I want quickly. Very impressive.

I feel blown away and creeped out at the same time. It sounds so natural that's it's creepy. Only question I have now is that is audio being streamed or generated on device.

I'm almost sure its streamed

It is streamed. They also offer this as an API for developers [0].

I just wish they'd offer more languages as it currently is English only. It can speak German and Spanish, but with a very clear English accent, which makes it kind of funny and makes one think about why the accent sounds so real.

[0] https://platform.openai.com/docs/guides/text-to-speech

The Brazilian Portuguese accent was hilarious when I tried it a month ago. We thought it was some kind of joke we didn't understand.

Just tried now, and it was pretty good. Some slight accent that I wasn't able to detect (Minas Gerais?). Similar / better than the artificial voice of other assistants in pt-br. I'm using Sky's voice, FWIW.

When I tried it right when it was released, it was some really heavy accent from Piracicaba. People there talk like that but ChatGPT seemed to be doing a forced impression of that.

It seems similar to what other commenters are saying about it taking in German with an American accent.

It's no big deal but Siri/Google/Alexa voices are better normalized, IMHO. I don't mind heavy accents at all in humans but they get annoying after a while for a voice assistants, especially if it's not my accent :)

I tried it now but it doesn't seem to be working for me. Maybe related to the outage yesterday (is it still going on?). It just listens but never replied.

Which voice did you pick? I picked one that sounds amazing, only has a funny gringo accent.

ChatGPT version works for any language

Yes, there it sounds just perfect. The best TTS I've ever heard.

It can speak German and Spanish

Among others...

Had to check myself.

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

I'm playing with it now, It can listen in Russian, reply in English or Russian.

The English pronunciation is perfect, but some Russian words do not have the correct pronounciation or emphasis.

I'm wondering if I can train the model to pronounce these words more accurately.

Even takes an audible breath. I'm kind of annoyed at that actually, it's deceptive.

That's lame. Why are they making it pretend to be human? What a silly thing to put effort into.

It's learned, not programmed. It would take more effort to remove every breath.

The new model on the front page atm (StyleTTS) is found to be more humanlike than actual human voice.

I just got the Rayban Meta glasses an hour ago, put them on, said "Hey Meta" and I was interacting with an AI with my voice, and hearing a response back. It's so unbelievable and surreal I don't understand why Siri/Google assistant are not doing this already.

I really feel like I live in the future.

I don't understand why Siri/Google assistant are not doing this already.

My theory is that voice assistants are made to be cheap entry points to those ecosystems. LLMs, on the other hand, are incredibly expensive to run. No GAFAM wants to pay dozens of cents when you ask their assistant to switch on the lights.

looks like Meta has no problem paying. It's usable on Messenger/WhatsApp/Rayban Meta

Well, Siri is not doing it because its „intelligence“ basically sounds to a bunch of if-then-else statements. It’s still the most embarrassing Apple thing in existence. So very bad.

The best part about Siri is that what it replaced was insanely better at the specific task of reproducing music. Why does my iPhone in 2023 need to connect to the internet and shamefully fail when I try to play a song or even change the volume.

If anyone still remembers what we had before Siri, you’ll know how well it worked.

It's possible to run a voice AI similar to this entirely locally on a normal gaming PC with a good GPU using open source AI models. I have a standalone demo I've been working on that you can try if you have a 12GB+ Nvidia GPU: https://apps.microsoft.com/detail/9NC624PBFGB7

It's still very much a demo and not as good as GPT-4, but it responds much faster. It's fun to play with and it shows the promise. Open models have been improving very quickly. It remains to be seen just how good they can get but personally I believe that better-than-GPT-4 models are going to run on a $1k gaming PC in just a few years. You will be able to have a coherent spoken conversation with your GPU. It's a new type of computing experience.

What models/libraries is it based on?

Currently OpenHermes2-Mistral-7B (via exllama2), OpenAI Whisper (via faster_whisper), and StyleTTS2 (uses HF Transformers). All PyTorch-based.

I will probably update to the OpenHermes vision model when Nous Research releases it, so it'll be able to see with the webcam or even read your screen and chat about what you're working on! I also need to update to Whisper-v3 or Distil-Whisper, and I need to update to a newer StyleTTS2. I also plan to add a Mandarin TTS and Qwen-7B bilingual LLM for a bilingual chatbot. The amount of movement in open AI (not to be confused with OpenAI) is difficult to keep up with.

Of course I need to add better attribution for all this stuff, and a whole lot of other things, like a basic UI. Very much an MVP at the moment.

This is why I think the personal Jarvis for everyday use won’t eventually be in the cloud. It can already be done on local hw as you are demonstrating and cloud has big downsides around privacy and latency and reliability.

Like you said it’s difficult to keep up with and to me it feels very much like open source stuff might win for inference.

And yet of all things word processing and spreadsheets are going to the cloud, or even coding.

Not sure big players won't be pushing heavily (as in, not releasing their best models) for the fat subscriptions/data gathering in the cloud, even if I'd much rather see local (as in cloud at home) computing

I understand what you mean. It makes me wonder if there is room for a solution those who want to own their own hardware and data. Almost like other appliances and equipment that initial cost too much for household ownership but maybe having a “brain” in your house will be a luxury appliance for example. As a Crestron system owner I would love to plugin Jarvis to my smart home somehow.

I'm thinking I will open source mine for people who have GPUs but possibly try making a paid service for people who don't.

Maybe it can be a hardware with markup and support and consulting model. That way there could be many competitors in various regions or different countries and we all use the same collection of open source tools. That would be pretty neat. Unlikely I guess but still worth thinking about how it could work.

Those things are going to the cloud for completely unrelated reasons

Thanks for such a detailed answer!

if you have a 12GB+ Nvidia GPU

Doesn't that seem a bit excessive for Whisper and Coqui? Or does it also run an LLM for a full local stack?

It's a full local stack. The TTS isn't Coqui, it's StyleTTS2.

Ahh cool, makes sense then.

Haven't heard of that one before, I'll have to check it out.

It's quite new. Good quality and very fast.

Related to the subthread above, and something I've been thinking about - how do you detect when the user has stopped speaking so the bot can respond?

Great question! Right now both ChatGPT and my demo are doing very simple and basic stuff that definitely needs improvement.

ChatGPT is essentially push-to-talk with a little bit of automation to attempt to press the button automatically at certain times. Mine is continuously listening and can be interrupted while speaking, but isn't yet smart enough to delay responding if you pause in the middle of a sentence, or stop responding at the natural end of a conversation.

I wrote up my detailed thoughts about it here: https://news.ycombinator.com/item?id=38339222

That is really interesting, thanks for the insight!

I've seen a couple of these tts + voice recognition + LLM projects popping up on GitHub as well:

https://github.com/modal-labs/quillman

I also built something similar using WebKit speech recognition (limited to Chromium) a year back for my own use but it was hooked to davinci-003.

Would love to see a video of this ...

Let me guess, your voices are being used for training purposes. I miss discussions about what a privacy nightmare it is that all the information you share is forever owned by openai and can be used for training purposes - BY DEFAULT!

Imagine being pissed that someone is hearing and learning from your voice as you’re having conversation together. The concepts of copyright and privacy we have come from a different world and they already made little sense in the 90s with the rise of the web and search. We think of them as absolute but they are a human invention. Arbitrary and specific to the zeitgeist it was born in. We need boundaries for sure. But what they are is unclear.

Imagine being pissed that someone is hearing and learning from your voice as you’re having conversation together.

Hello. USian here, talking about things from a USian perspective! YMMV in other countries, visit your local dealership for more information, & etc, etc, etc.

A multi-million (or multi-billion) dollar company who can

* Store recordings all of my utterances forever

* Use my recordings to make tons of money without ever sending me a cent

* Share my recordings with "trusted business partners" for "contractually-agreed-upon [with their partners, not with me] purposes" whenever they feel like

* In short, use those recordings that they've stored forever for any nearly any purpose they see fit to

is not "someone" that I'm having a "conversation" with.

When I have a conversation with someone, they have a human's capacity to remember and disseminate that information. People get worse and worse at remembering things as those things recede into the past. Even if someone remembers an event very clearly, it's also usually impossible for someone to perfectly relate that event to another.

When a company records what I'm saying and feeds that recording into their perfect-copy-machines and the perfect-copy-machines of their "trusted business partners", that's an entirely different thing. Surely you see that?

Why aren't you complaining about what happens to your HN posts?

It's a spectrum, not a binary. They may not care about HN posts because:

1. We're anonymous here (by default)

2. You can't be fingerprinted from your HN posts the way you can be with your unique voice

You can be anonymous in openai too. Your voice is a terrible fingerprint with today's models.

On point 2, someone posted a tool a while back called Stylometry which is now sadly defunct, but had some interesting feedback and seemed to be pretty successful at identifying “alts”. I’m sure that the tech has improved since then and would be only more accurate than it was a year ago.

https://news.ycombinator.com/item?id=33755016

I'm pretty sure hackernews posts can be used as a unique fingerprint, especially by a party who collects sensitive text information like chatbot conversations.

Voice makes a very bad fingerprint. Especially when you know you can simulate it from like 2 seconds recording with AI.

Why aren't you complaining about what happens to your HN posts?

That doesn't seem related to the point?

Do you disagree that "Posting on a public Internet forum (such as HN)" is (other than the "communicating information to other people" aspect) _radically_ different than "Talking one-on-one to someone in person without any recording devices present"?

If you do disagree, then there's no way we can have a even vaguely productive conversation on the topic.

It's not sending you a cent? Care to explain why someone would use ChatGPT Free with voice, if this provides no value to them? Pokes a sizable hole in your entire argument. It's not like the government came to your house and told you "use it or go to jail".

Do you ask people for money when they talk to you in general? Saying "it's not a someone" is just arbitrary categorization. If it was like talking to a wall, you probably wouldn't talk to it, or would you? Is it possible not everything has a monetary aspect on personal level, but on a corporate level that's the only way you can capture value which is then also expressed in non-monetary ways such as people providing work or advice for the furtherment of a goal or project?

I don't know. Let's maybe think about the word we live in. Some might say the most worthless thing in the world is money. What is it good for by itself? It's not only useless paper you can't even wipe your behind with, but these days it's just digits on a computer. But the value you can get out of it is provided by people and systems that do something useful for you in exchange for it. And if something provides that value directly in communication with you, that's much the same as being paid digits on a computer.

In other words, are we so shallow we need to first pay for ChatGPT's output in money, and then it should pay us back the same money (or less) for our voices and data, in order to comprehend what a barter exchange of value is?

You appear to have gotten derailed by the monetary compensation bullet point and totally ignored the "Retain a perfect copy of my utterances forever to do nearly anything they wish with (including, but not at all limited to, making money from) for the rest of eternity" aspect.

Someone? This isn’t someone!

Am I the only one who doesn't like these specific voices? The quality is incredible, but they feel too cheery/enthusiastic/casual and it just gets annoying after a while.

I made an iOS shortcut a while ago that uses Siri with the ChatGPT app (it has iOS shortcut bindings) and despite Siri being a useless pile of junk compared to this, I actually prefer Siri's voice to this in some ways, because it doesn't feel so over the top.

Maybe this is partly because of different cultural expectations between the USA and Europe? Or maybe I'm just being too cynical and ChatGPT really is that happy talking with me!...

I don’t care about the voices themselves but the speech recognition is borderline unusable sometimes. It interjects when it shouldn’t and will frequently hear things incorrectly.

At one point it misinterpreted me mentioning “tai chi” as “I can’t breathe” and responded with advice about medical emergencies.

Do you mean Siri's voice recognition? If so, 100% agreed. My iOS shortcut uses OpenAI's Whisper API for voice recognition, and Siri (English United Kingdom - Siri Voice 1) for text to speech.

I really like dictating things sometimes, and Whisper is perfect for that (automatic paragraphs inside the model itself would be nice but not a big deal).

If anyone is interested - the "Whisper speech recognition in iOS" part is based on this shortcut I found that you can easily use yourself on both iOS and MacOS (free except for the OpenAI API usage fees obviously): https://giacomomelzi.com/transcribe-audio-messages-iphone-ai...

No, I mean the voice recognition in ChatGPT.

free except for the OpenAI API usage fees

There are several versions of Whisper which have been distilled and can run locally, so I don’t see what advantage making API calls would be other than increased latency and decreased reliability and data security.

That's really interesting, Whisper is generally considered the current state of the art in STT and I've personally never experienced errors like the ones you describe. I've actually never had an error from Whisper.

First question, is there another STT you have used which works better for you?

Second question, is there any reason your voice might be considered unusual, like having a strong Welsh, Irish, or Indian accent, or being Deaf or Hard of Hearing?

Yeah, whisper is pretty good out of the box in my experience, but the vast majority of the time I’m using it in my car. So the conditions aren’t ideal, or are out of distribution for Whisper. However CarPlay is detectable and common enough from what I’ve heard.

Second, even if the transcription is correct, it cuts me off at inappropriate times. It’s hard to talk naturally without pauses.

I haven’t used a better transcription model, no.

Oh that's really interesting. Probably an acoustic environment it's not used to, like you said, but also people talk differently when they're driving. Like the cadence of our speech is significantly different because of the way our mental focus changes. I have to imagine that changes some things.

It is probably cultural or linguistic. I love audio books but I cringe when I find a book I want to listen to that has an English voice actor. I don't think it is just the accent but all the pacing and emphasis.

I also though don't like most the chatGPT voice models besides for Sky. Sky to me is really good. Robertson Dean reading an audio book is perfection but Sky is pretty awesome.

I should add that as an American there are a ton of American voice actors that ruin books for me too. Sometimes this can be fixed if played at 1.2X speed.

Of all of them, Sky's voice seems the most sober. I've been using it as a Plus subscriber for several weeks now and am also very impressed.

Yes, sometimes it thinks I'm done speaking when I'm not, but on the whole it's very good. Siri/Alexa, et al are not only unusable but are now supremely frustrating.

It is very much a cultural thing, the voice equivalent of decorating your Instagram. Ordering pizza after a 60h work week? Well, better make it sound like fun!!

Nope, you're not the only one. I posted as well: they sound to me like your classic, well-trained call center agents: Fake friendly (but please kill me now).

Reminds me way too much of some of the people I had to talk to, when cleaning up my mother's affairs. Places trying to get me to pay bills I did not owe, call center agents "cheerfully" following scripts that they themselves hated. The voices sound exactly like that.

Give me a neutral voice. This is a computer I'm talking to, not a fake friend.

Can somebody who knows their stuff about voice synthesis explain to me why chatGPT sounds like it's speaking German with an American accent when I speak to it in German? It's really surprising how believably it sounds like a native American-speaker speaking excellent, just not accent-free, German.

I would imagine that voice synthesis models would somehow be trained on data from native speakers, so why the accent?

The same thing happens with Polish, it's your typical Polish Chicago resident.

Can you make a compassion between polish / polish Chicago and any two US accents?

My immediate assumption was that Chicago has a high Polish population, not necessarily that it's any different from other Polish American ones.

edit: It would appear so:

Chicago is a city sprawling with Polish culture, billing itself as the largest Polish city outside of Poland, with approximately 185,000 Polish speakers, making Polish the third most spoken language in Chicago.

I am going to guess that 99% of the source data, models, training, correction, verification is translated by Americans.

Translating a language is difficult on top of regional dialects creates additional complications.

If someone is talking long enough to me I can identify their birth State based upon their American accent. Every State has a different pronunciation of certain words which can leak location data.

Guessing you're Californian, educated parents, about 34 years old. No siblings.

Not sure how you arrived at that guess, but generally I guess that describes the largest demographic of HN users.

Without AI but based on some freely available statistics combined with post history, we can say definitively (assuming honesty on OP’s part) that OP is 35 or 36, and was either born in Germany or moved there when they were young, and they likely still live there or at least have a strong affinity to Germany.

We can reasonably speculate that the fact that they speak German and likely live there, as well as the standard of their English drastically increases the likelihood their parents were educated. Germany’s current total fertility rate is 1.5. It was likely higher when OP was born back in 1986/7 but I couldn’t immediately find any data on that and didn’t look too hard. Given the fact that TFR is a population mean, this suggests a high proportion of single-child families, and generally the more educated one is, the fewer children, so it’s a decent guess that OP is an only child (apparently more than 50% of German families have only one child), but I don’t see anything that would make this a sure fire bet.

I was able to complete some further analysis which could reveal more likely truths about OP such as gender, political orientation, sexuality, etc., but I think this goes far enough without starting to doxx them.

models would somehow be trained on data from native speakers, so why the accent?

you wouldn't believe it, but models haven't been trained yet. as usual.

I expect that the training data for the voice AI is overwhelmingly English. Native speakers of other languages will be a small minority. I'm sure it will improve over time.

It’s a good question. I’m not sure of exactly the answer, but I suspect the answer is similar to the answer “why do Americans speak German with an American accent?”

If you learn how to pronounce specific vowels, consonants, etc. in a particular way, it takes a LOT of effort to learn how to pronounce these in a different way. You can approximate to a good extent, but most researchers say that if you don’t develop this skill as a child, you won’t ever be able to pronounce things in a way that sounds like a native speaker.

Presumably, the models have been exposed to significantly more American accents than other accents, and learning how to pronounce phonemes with subtle differences without accepting a “close enough” approximation is a big challenge, especially given that there is already a threshold for acceptability at which level you can still sound like a native English speaker.

What is the privacy policy; how long are recordings retained, are they trained on, etc.?

A chat is a chat is a chat. There is a setting in the app under data controls where you tell it not to use your charts for training.

The illusion of control. We check all of these boxes yet our data still continues to get sold

I don't disagree but I'm just stating what is available in the app. Voice or text, they treat chats as a single thing.

I understand the downvotes for showing disapproval at OpenAI's policies (or big tech's), but I don't know what else could I say. Should I have just replied with a snarky comment saying "a-ha! you expect them to respect your privacy? pffff"

The setting is only saved client side which is a very odd design decision. I guess they made it that way so that if you disable the flag at one point (yes, it's active by default), the change only effects future requests. This whole privacy situation with openai makes me really angry with Sam Altman, who seems not to care about that at all - this setting seems to be specifically designed to be privacy-disrespecting. It reminds me on the early days of facebook when they set all your profile information to be public, but openai is much worse, because the information you put into the prompt is likely much more sensitive.

What bothers me about HN currently is that the majority seems to not be bothered by it, stating that OpenAI tries to be ethical, that Sam can be trusted etc.

Has everyone forgotten about Google? Facebook? Every other SV Tech-Startup? Why would OpenAI be different?

"They're a non-profit". Yeah, sure. As if Wall Street, M$ et al would just not try to monopolize and commercialize AI as soon as possible.

I enjoy using the voice mode to learn about subjects when I’m driving.

Does it tell you the truth, or just stuff that sounds right?

Have you not been using chatGPT for programming questions this year? The implied criticism you're making became out of date when GPT4 was released. Of course, it makes things up when you stretch it, but the utility is no longer in question. What did you think all the fuss has been about?

Have you not been using chatGPT for programming questions this year?

No, I'm a fighter pilot in EVE Online.

Yes

Hasn't really been a problem. The way I use it is like

"Explain this concept to me."

"Give examples from the physical world and every day life."

"Elaborate on that last example."

"Give some examples from the domain of ___."

"Tell me about the history of the concept and why it was necessary to invent it."

"How does this concept relate to the concept of ___."

So I'm trying to approach what I want to learn from many different angles and build context and intuition. Sort of like talking to a tireless grad student tutor. Errors would become more obvious due to contradiction. I've caught it making some mistakes but they were sort of trivial and I could see the spirit of what it was getting at. Plus, "truth" tends to "click."

Remember: If it’s free. You are the product.

Remember: Even for things you spend lots of money on, you are still the product.

Late stage capitalism!

hoeing for prompts

It's not free, though. There's a free tier, but the product is ChatGPT Plus, metered access to API/newer models, etc.

Usually this truism is invoked in reference to advertising business models where the actual customer is advertisers, and you're just inventory so from cable TV to web search the business has usually been happy to take a shit in your mouth whenever it serves their advertisers.

We will see how things pan out with ChatGPT as the interests of a ChatGPT Plus user likely align closely with the interests of someone on the free tier. Historically I think freemium business models tend to be kinder to their customers than advertising-based ones, or at the least, they simply lean on you until you upgrade to paying customer (if they do enshittify the free tier I think you would see an amazing conversion rate to ChatGPT Plus because the product is so good and unique).

I wonder why they launched it today, when it was supposed to be a week-off for all OAI employees and amidst whatever else is going on?

One possibility is to capture Thanksgiving gatherings as a growth hack where people demo this to their families/friends and increase app downloads for OAI.

Another possibility is that people who build products don’t actually care about leadership drama as much as Twitter or HN

Almost all of the OpenAI team signed a letter saying they would resign if the board is not replaced, so safe to say that they do care about the leadership drama; perhaps more than the rest of us.

There was at least one OpenAI employee on Blind [0] saying that they (unclear who "they" is but presumably some other employees) pressured employees into signing and called him in the middle of the night to do it. I just googled it and found a thread as well [1].

[0] https://teamblind.com

[1] https://www.teamblind.com/post/OpenAI-employees-did-you-sign...

My theory is that it was nearing ready for release so they pushed it out to deflect some of the bad press over recent drama. I'd suspect the challenge with releasing something like this is about 2% "the code" and 98% scaling it out to everyone and making that not explode.

Where is this available? I have the app installed but I don't see a headphones button, just one that seems to record my voice and then attempts to transcribe it for submission (and fails every time). I'm on Android, App icon is the black ChatGPT logo on a white background. Thanks!

For me, it didn't appear until a few hours later. Pending update, maybe? Closing and re-opening the app 2-3 times? No idea, but eventually the headphone icon did appear (bottom right, for me on Android).

Thanks! It was there this morning for me as well. Didn't even have to do an app update.

I had to click the pencil/edit button first.

It fails to transcribe due to an API outage at the moment but I imagine that's the correct button.

One of the reason i don’t want openAI to die isn’t only the (impressive) power of GPT-4.

Actually, it’s the mobile application. They should have opted for a boring PWA but instead they developed a nice native application with good UI/UX. I also appreciate that they created integrations with the Shortcuts app which allows you to do make very powerful things. I really hope they extends this because it’s text only for now but now with GPT4V and the new ability to generate files, you could do impressive things with shortcuts.

I wish I could use the mobile app on iOS. Getting a weird error when trying to login, something about date/time being wrong on my device. Since date/time is autosynced, it must be something else. I wonder why I have this tendency to hit on bugs that nobody else has...

Could there be a MITM on your connection? That could cause SSL errors that may be reported as "make sure your date and time are correct"

That MITM would have needed to follow me around, since I tried in several networks. But nothing is ruled out so far. I wish the error was a bit more informative, since my date/time settings are clearly OK, so I am kinda stuck here.

I wish I could have the text output instead of getting a vocal answer. We all can read way faster than we can listen to speech, and I don't need a lot of the fluff I get from LLMs. But I guess that's what most users want? Something that's closer to a conversation with a human? But I don't care about being polite with an LLM, it feels like an anti-pattern because it's too much overhead.

Ideally I would:

* query by voice

* get the response in text as soon as possible, maybe even as I'm still talking, with the LLMs correcting the response as I give more information, and without needing me to touch anything

* follow up by voice, even before the response was finished, and without needing to press anything

I hope we'll get there eventually!

We all can read way faster than we can listen to speech

I can’t. At least not with reading every word. I can finish an audiobook much faster than reading the book on my own.

And I also can’t pick up as much reading by myself. My preferred way of consuming is now reading while a screen reader or audiobook is played at 1.5x or faster speed. Much better retention because

Interesting, it's exactly the opposite for me. My understanding of voice is much lower than of reading, and much slower. I think the biggest factors are that reading is faster than words are typically spoken, yet I can also pause and dwell on sections I don't get the first time, so I really get it and then proceed.

Obviously, brains work very differently, so there really is no one-size-fits-all solution; we need the varied methods.

At least not with reading every word.

That's the crux of the issue though, isn't it? I don't read words by reading single letters, and I don't think I read sentences by reading every single word either. That'd be the main reason why listening is slower than reading: the forced linearity.

To be fair, my "all" was clearly an exaggeration, I'm sure there are people for whom this doesn't apply, but I'd still expect it to apply to 50%+ of the populations of countries where literacy is not a significant issue anymore.

Will this be available on the browser version?

callannie.ai has browser support, and video characters too

What does this have to do with ChatGPT with voice?

Also this Call Annie requires Google sign in.

That person developed Call Annie, so it was probably just a way to plug their own solution :)

https://news.ycombinator.com/item?id=38374582

I don't get why in the midst of llm craze, Amazon is ditching devs from Alexa. It seems like abandoning a product for which the right technology just got invented

Alexa is a great example of a business building a market leading product that a lot of people love and then just completely failing to do anything with it.

They seem to have a similar concept: https://www.youtube.com/watch?v=UqS3NxJ2L_I But I think very few think of Alexa as a personal assistant, most people are just happy playing music and toggling room lights.

Yes, this is the missing piece to make all those voice assistants actually ... assistants. Alexa is for me currently music player and nothing more because it's so Dumb. With ChatGPT, it would be a whole different device for me.

You can ask questions in any language and it automatically detects the language! Tried it and it works amazingly well. It replies in English though.

It replies in the language it thinks follows from what you said. It may have detected that you're non-English languages sounded like an English speaker?

It understands more languages than it speaks, so no it does not always reply in the same language.

Always replied in Portuguese for me if requested, with a weird English accent to be fair. It seems whisper will transcribe everything you say (translated if needed) into just one language, usually English, if you mix them. This is with GPT4 though.

Does that example really work?

It's suggesting to ask it where to order 195 pizzas from.

Does it have access to active business records like that?

They re-enabled browsing access, so I believe it has the ability to search Bing again now.

It seems much better now too. Before, it would do exactly one query and read one or two results and then give up. Yesterday I asked a question and it did several queries and read probably a dozen results before answering. And it correctly said that little information was available on the topic I asked about, which impressed me because I expect it to hallucinate in that situation instead.

I'm working on making it possible to feed LLMs live data from websites. It is challenging because injecting webpages into LLM prompts directly has a mixed result, especially if there's a smidge of ambiguity or complexity in the webpage HTML, so part of my work is on automatic methods to economically prune, format and feed web data to LLM RAG pipelines.

You can follow my progress on twitter if you're interested (I tweet about other things, don't expect LLM/Data stuff only :)

https://twitter.com/omarkamali

https://twitter.com/themonitoro

App Store on iOS 17.1.1 in Europe has ChatGPT app version 1.2023.319 (14269) released three days ago. No sign of ChatGPT Voice. No headphone icon to be found. There’s also no «New Features» entry in Settings on the mobile app and so no way to opt into voice conversations as the blog article suggests. https://openai.com/blog/chatgpt-can-now-see-hear-and-speak

Edit: The headphone icon appeared on Nov 22, 2023 around 6am CET without an update to the app. There is no «New Features» entry in the Settings.

Anyone having trouble getting this working, just close the app and reopen.

thank you!!!

Hello [UserName], my name is Doctor Sbaitso.

I am here to help you. Say whatever is in your mind freely, our conversation will be kept in strict confidence. Memory contents will be wiped off after you leave,

So, tell me about your problems.

https://classicreload.com/dr-sbaitso.html

Wow! We had a random conversation at work a couple of years ago and my boss mentioned some other old "chat bot", and suddenly my Doctor Sbaitso memory came rushing back. Along with lots of memories of trying to pick my soundcard in DOS games.

I just used this to practice my Japanese and it was really helpful having a judgement free correction of my grammar. It definitely takes longer than English because it’s using more tokens but the feedback is very thorough. I wish I could make the pause before sending automatically a bit longer. (I know I can hold to do it manually but the automated send is nice.)

Does the audio go direct to chatgpt, or through a transcription model first?

I could see it being a bit inaccurate if the transcription silently corrects my error.

I tried telling it to speak slowly but it just kept rattling off at full speed while saying things like "I understand. I'll slow down. How is this for you?" Etc

An unfortunately absent bit of accessibility that would really help second language learners

I’ll ask the obvious question - does this make it more likely they’re rejoining OAI?

I think it's just him acknowledging his team's (at least until very recently) likely months long efforts to have this ready to go live.

It's just a retweet. Probably just a reflex action.

I'm unconvinced. Just a brief experiment, but I find the voices grating. They are really well synthesized, but they sound like the classic well-trained call center agent: fake friendly.

I think the dialect is part of it as well: identifiably US West coast, presumably SF-Bay. Something more neutral, say, mid-Atlantic would be preferable for non-USaians. I asked ChatGPT if it could change its dialect - it said yes, but sounded exactly the same anyway.

Maybe you get used to it? Dunno, I'll try again, if I have a use case...

I asked ChatGPT if it could change its dialect - it said yes, but sounded exactly the same anyway.

Dialect, or accent?

In English, I asked it to change to a Scottish "accent". It said sure, but nothing changed. In German, I asked it to change to the "dialect" spoken in the Wallis - I got a bunch if info in the dialect, but all still in high German.

tl;dr: I tried both...

Announced by the OpenAI account, and retweeted by Brockman

https://twitter.com/OpenAI/status/1727065166188274145

Yes, let's keep our implosion drama separate from our product announcements shall we?

Changed from https://twitter.com/gdb/status/1727067288740970877 above, plus I degregged the title. I'm sure he won't mind. (submitted title was "ChatGPT Voice for All Free Users Announced (By Greg Brockman)")

Thanks!

Is there a way to paste text into the mobile app and get the voice to read it? I can't for the life of me figure out how to do this. There's no text input shown while in the voice feature despite ChatGPT telling me to "just type what you want me to read"

I'm not sure if there's a better way but I've typed out requests via text, and as part of the request I type "read the results to me when I ask for them" or something along those lines. Then I turn on the voice mode and make the request.

It says it doesn't work when you have chat history turned off - and thus data collection, which is combined into the same setting. Even on the paid version.

It's not because OpenAI wants to force people to enable history. It's just a technical limitation in the way it currently works.

Seems like the service is down. Maybe this is the cause.

Still (or again) down for me, 11 hours later.

Oops! Our systems are a bit busy at the moment, please take a break and try again soon.

I just wish it was a little bit faster. When this reaches imperceptible levels of latency, the future will be here.

callannie.ai is faster and has and video characters too

I am wondering if this is kind of like an enjoy it while OpenAI still exists thing.

I assume there will be a MS ChatGPT if OpenAI folds though. Or just Bing.

Bing app already has gpt4 with voice input and voice output.

I’ve been using this for a while now and will bring ChatGPT as an additional conversational partner when having discussions where we don’t know the topic that well.

It’s wild. Very close to Star Trek land. I never thought I’d see this.

It's what people said about Google!

We went from debating a topic to Googling the answer and resolving a whole night's debate in seconds.

And I will tell you that Google didn't always get it right. Often a tweak to the search term could get the wrong answer, but win the argument.

There are three topics banned at my table. Religion, politics, AI, Sam Altman, punctuation and maths.

I'm wondering which part of this link I should focus on: (1) ChatGPT Voice is now free for all users (2) Greg Brockman is retweeting an announcement from a company he left

Maybe it is Greg_Brockman_30T_fp16.gguf?

How can I use this without a computer like Alexa for conversations?

Like an Android device with a smart speaker? I think there is an OpenAI app.

Serious question.

I've been using it on my iPhone for a few weeks. It works great with AirPods.

The reference to [OpenAI's] exact number of employees is interesting in context.

I laughed. It has indeed been a long night(s) for the team.

Is this available to API users?

This is going to make me order a pizza tonight, fml.

As an aside its a shame that ChatGPT doesn't work with LineageOS phones even with microg spoofing. I hope they fix this. Many hackers who can provide valuable feedback are unable to use it at the moment

Wow I just tried the app with voice and the quality is amazing. It felt creepy. It is not a person I was talking to but the response time and quality is so good and I can't help myself but think I am talking to a real person. It is eerie.

Edit: Based on the response time, I don't understand where the audio is being generated, is it local?

So that's why ChatGPT has failed to load or work for the past few hours. No really, it's down right now.

Open source LLM generic frontend project such as bigAGI (https://github.com/enricoros/big-agi) has been having this feature for many months now. The good news: it even works with open source and local LLMs.

ChatGPT - yet another mechanism to mine data from you.

Why do you need humans if you have ChatGPT?

Online communities are echo chambers, imagine now talking to a bot all the time.

that is just... incredible. the intonation, the response.

Got to love the builders. Even the EA decels can't stop them.

"Hey everybody! Look over here!"

Refuses to work for me all day.

"it's been a long night [...] how many pizzas should I order for 778 people".

Hits differently than it would have last week. :-/

What a coincidence, was just looking something similar for local models and stumbled up on this, his Repo seems full of TTS/STT projects..

https://github.com/KoljaB/LocalAIVoiceChat

I wish I could talk to ChatGPT on the website. I end up using Microsoft Word's dictate feature and then copy pasting.

Considering the API is currently broken, I low-key think it's time for me to find a new partner...

This voice is so horrible. Sounds nothing like a real human voice. This kind a voice generator would be OK for 2008, but not for 2023...