I might be missing something, but what are the non-questionable, or at least non-evil, uses of this technology? Because every single application I can think of is fucked up: porn, identity theft, impersonation, replacing voice actors, stealing the likeness of voice actors, replacing customer support without letting the customers know you're using bots.
I guess you could give realistic voices to people that lost their voices by using old recordings, but there's no way that this is a market that justify the investment.
Text to speech is very close to being able to replace voice actors for a lot of lower budget content. Voice cloning will let directors and creators get just the sound they want for their characters, imagine being able to say "I want something that sounds like Harrison Ford with a French accent." Of course, there are going to be debates about how closely you can clone someone's voice/diction/etc, both extremes are wrong - perfect cloning will hurt artists without bringing extra value to directors/creators, but if we outlaw things that sound similar the technology will be neutered to uselessness.
That's basically replacing voice actors and stealing their likeness: both are arguably evil, and mentioned. So, I haven't missed them.
P.S.: "but what about small, indie creators" that's not who's gonna embrace this the most, it's big studios, and they will do it to fuck over workers.
As someone involved in the AI creator sphere, that's a very cold take. Big studios pay top shelf voice talent to create the best possible experience because they can afford it. Do you think Blizzard is using AI to voice Diablo/Overwatch/Warcraft? Of course not. On the other hand, there are lots of small indie games being made now that utilize TTS, because the alternative is no voice, the voice of a friend or a very low quality voice actor.
Do I want to have people making exact clones of voice actors? No. The problem is that if you say "You can't get 90% close to an existing voice actor" then the technology will be able to create almost no human voices, it'll constantly refuse like gemini, even when the request is reasonable. This technology is incredibly powerful and useful, and we shouldn't avoid using it because it'll force a few people to change careers.
Have you seen how big studios treat vfx artists? They absolutely will replace voice actors with AI.
Also:
At what, exactly? The only "useful" case you presented is "actually, replacing voice actors with AI isn't so bad".
You want a world where only the rich can create beautiful experiences. You're either rich or short sighted.
Edit: If you've got a cadre of volunteer voice actors that don't suck hidden somewhere, you need to share buddy. That's the only way your comments make sense.
I don't know what else to tell you, I just think people deserve to be paid for the work they do.
Your vision of a world where anyone can create voice for their projects for cheap CAN NOT exist without someone getting exploited. Nor is it sustainable, really.
You said they this world would be worth some people losing their careers, but what do we gain? More games/audiobooks of questionable quality? Is this really worth fucking a whole profession over?
We agree that people should be paid for the work that they *DO*. Your view smacks of elitism, and voice actors don't have any more right to be able to make decent money peddling their voice than indie game devs have to peddle games with synthetic voices.
Your view smacks of contempt for workers, particularly in the arts. Specially the emphasis on "do", as if voice actors don't actually work, and just live of royalties or something. The kind of worldview that the rich and the delusioned working poor tend to share.
Professions disappear, it's a natural side effect of progress. Stablehands aren't really that common anymore, because most people drive cars instead of horses.
I really hope we can deprecate a whole bunch of professions related to fossil fuels, including coal miners and oil drillers etc.
I sympathise with the people working in those professions, I do, but times change and professions come and go, and I don't buy the argument that we should stop inventing new stuff because it might outcompete people.
As for positive uses of this technology, it might be used to immortalise a voice actor. For example Sir David Attenborough probably won't be around forever, but thanks to this technology, his iconic voice might be!
You have a narrow view of what a beautiful experience is. It does not require professional-level voice acting.
It is not unfair that, in order to have voice acting, you must have someone perform voice acting. You don't have the natural right to professional-level voice acting for free, nor do you need it to create beautiful things.
The tech is simply something that may be possible, and it has tradeoffs, and claiming that it's an accessibility problem does not grant you permission to ignore the tradeoffs.
I also don't have the natural right to work as a professional-level voice actor.
"Natural rights" aren't really a thing, the phrase is a thought-terminating cliché we use for the rhetorical purpose of saying something is good or bad without having to justify it further.
A few times as a kid, I heard the meme that the American constitution allows everything then tells you what's banned, the French one bans everything then tells you what's allowed, and the Soviet one tells you nothing and arrests you anyway.
It's not a very accurate meme, but still, "permission" is the wrong lens: it's allowed until it's illegal. You want it to be illegal to replace voice actors with synthetic voices, you need to campaign to make it so as this isn't the default. (Unlike with using novel tech for novel types of fraud, where fraud is already illegal and new tech doesn't change that).
The lightness with which you treat forcing tens of thousands of people to change their career is absurd. Indie games are hardly suffering for a lack of voice acting, even if you only look at it from a market perspective and ignore that voice acting is a creative interpretation and not simply reading the words the way the director wants.
Yes, we should avoid using it because it will upend the lives of a significant amount of artists for the primary benefit of "some indie games will have more voice acting and big game companies will be able to save money on voice actors". That's not worth it, how could you think it is?
Suppose all existing voice actors, and, to be maximally generous, everyone who had spent >1 year training to be a voice actor, was given a pension for some years, paying them the greater of their current income or some average voice actor income. And then there would be no limits on using AI voices to substitute for voice actors.
Would you be happy with that outcome, or do you have another objection?
Only tens of thousands? Cute. For most of the 2010s, I was expecting self-driving cars to imminently replace truck drivers, which is a few millions in the US alone and I think around 40-45 million worldwide. I still do expect AI to replace humans for driving, I just don't know how long it will take. (I definitely wasn't expecting "creative artistry" to be an easier problem than "don't crash a car", I didn't appreciate that nobody minds if even 90% of the hands have 6 fingers while everyone minds if a car merely equals humans by failing to stop in 1 of every (3.154e7 seconds per year * 1.4e9 vehicles / 30000 human driving fatalities per year ~= 1.47e+12) seconds of existence).
Almost every nation used to be around 90% farm workers, now it's like 1-5% (similar numbers to truckers) and even those are scared of automation; the immediate change was to factory jobs, but those too have shifted into service roles because of automation of the former, and the rest are scared of automation (and outsourcing).
Those service-sector roles? "Computer" used to be a job; Graphical artists are upset about Stable Diffusion; Anyone working with text, from Hollywood script writers to programmers to lawyers, is having to justify their own wages vs. an LLM (for now, most of us are winning this argument; but for how long?)
We get this wrong, it's going to be a disaster; we get it right, we're all living better the 0.1%.
I tried indie game development for a bit. I gave up with something like £1,000 in my best year. (You can probably double that to account for inflation since then).
This is because the indie game sector is also not suffering from a lack of developer talent, meaning there's a lot of competition that drives prices below the cost of living. Result? Hackathons where people compete for the fun of it, not for the end product. Those hackathons are free to say if they do or don't come with rules about GenAI; but in any case, they definitely come with no budget.
A few hours ago I was in the Deutsches Technikmuseum; there's a Jacquard Loom by the cafe: https://technikmuseum.berlin/ausstellungen/dauerausstellunge...
The argument you give here is much the same argument used against that machine, back in the day: https://spectrum.ieee.org/the-jacquard-loom-a-driver-of-the-...
Why do you think those textile workers lost the argument?
And to pre-empt what I think is a really obvious counter, I would also add that the transition we face must be handled with care and courtesy to the economic fears — to all those who read my comment and think "and therefore this will be easy and we should embrace it, just dismiss the nay-sayers as the Luddites they are": why do you think Karl Marx wrote the Communist Manifesto?
Do you think Blizzard won't when the tech gets cheap and good enough?
I don't disagree with the thought that large companies are going to try to use these technologies too, with typical lack of ethics in many cases.
But some of this thinking is a bit like protesting the use of heavy machinery in roadbuilding/construction, because it displaces thousands of people with shovels. One difference with this type of technology is that the means to use it doesn't require massive amounts of capital like the heavy machinery example, so more of those shovel-weilders will be able to compete with those that are only bringing captial to the table.
I'm not saying that this should be forbidden or something. I just wonder what is the motivation for the people pitching and actually developing this. I'm all for basic, non-profit-driven, research, but at some point you gotta ask yourself "what am I helping create here?"
Saying something is evil would seem to suggest that you think it should be forbidden. Maybe you should choose a different word if that’s not your intention.
I disagree on three of your points.
It is creating a new and fully customisable voice actor that perfectly matches a creative vision.
To the extent that a skilled voice actor can already blend existing voices together to get, say, French Harrison Ford, for it to be evil for a machine to do it would require it to be evil for a human to do it.
Small indie creators have a budget of approximately nothing, this kind of thing would allow them to voice all NPCs in some game rather than just the main quest NPCs. (And that's true even in the absence of LLMs to generate the flavour text for the NPCs so they're not just repeating "…but then I took an arrow to the knee" as generic greeting #7 like AAA games from 2011).
Big studios may also use this for NPCs to the economic detriment of current voice actors, but I suspect this will be a tech which leads to "induced demand"[0] — though note that this can also turn out very badly and isn't always a good thing either: https://en.wikipedia.org/wiki/Cotton_gin
[0] https://en.wikipedia.org/wiki/Induced_demand
I have the same concerns generally. But one non-evil popped into my head...
My dad passed away a few months ago. Going through his things, I found all of his old papers and writings; they have great meaning to me. It would be so cool to have them as audio files, my dad as the narrator. And for shits, try it with a British accent.
This may not abate the concerns, but I'm sure good things will come too.
Serious question: is this a healthy way to treat ancestors? In the future will we just keep grandma around as an AI version of her middle aged self when she passes?
There's a Black Mirror episode about something like that, though I don't remember the details.
Yup, "Be Right Back", S2E1
And possibly another one, but that would be a spoiler
I remember a journalist actually doing it, but just the AI part of course, not the robot.
Fair question. People have kept pictures, paintings, art, belongings, etc of their family members for countless generations. AI will surely be used to create new ways to remember loved ones. I think that is a big difference than "keeping around grandma as an AI version of herself", and pretending they are still alive, which I agree feels unhealthy.
Not sure if this is related to this tech, but I think it is worthwhile: The Beatles - Now And Then - The Last Beatles Song (Short Film)
https://www.youtube.com/watch?v=APJAQoSCwuA
I can think that better quality audio content generated from text would be a killer application. As someone else mentioned, pipe in an epub, output an audiobook or video game content. With additional tooling (likely via ai/llm analysis), this could enable things like dramatic storytelling with specific character voices and dynamics interpreted from the content of the text.
I can see it empowering solo creators in similar ways that modern music tools enable solo or small-budget musicians today.
That falls into “replacing voice actors”, mentioned by the OP.
No, it really doesn’t. There are thousands of very smart and talented creators without the budget to hire voice actors. This lets them get a start. AI voices let you lower the barrier to entry, but they won’t replace most voice actors because the higher you go up the stack, the more the demand for real actors will also go up because AI voices aren’t anywhere near being able to replace real voice actors.
As another reply put, I'm very skeptical that the benefits for small content creators will offset the damaged to society as a whole, from increased fraud and harassment.
That is as absurd as saying LLMs are increasing the demand for writers.
Even if that were true—which it is not; the current crop is more than adequate to read long texts—it assumes the technology has reached its limit, which is equally absurd.
What if I want to listen to my notes in my own voice
Or my favorite books in my own voice.
Or my lecture notes in my professor's voice.
Or, when it gets fast enough, someone could have their own personal dub of video games (BlazBlue Central Fiction) or TV shows and such.
AI girlfriend... ok I'm done.
It's 2024. Are nerds still trying to turn any technology of sufficient ability into Kelly LeBrock?
this is going to be a real thing for gen z, but replace kelly with any girl from anime
Jeeze, I can't imagine why women feel so alienated from the tech industry.
It's almost as if any time some sort of way to make computers more human-like emerges, the first thing a subset of the men in the space do is think "How can I use this to make a woman who has absolutely no function other than my emotional, practical, and physical gratification?"
Humans in desiring deep emotional and sexual connections with people of their desired gender and being driven to weird behaviours when they can't achieve it in the way you personally approve of shock
Then work on it. Ask friends for feedback. Go to therapy. Have some damned introspection instead of just reducing 51% of the people on the planet to a bangmaid.
Non robotic screen readers for blind people
That would be non-evil, sure. But I wonder if blind people even want it? They're already listening to screen readers at insane speeds, up to 6-8x, I think. Do they even care that it doesn't sound "realistic"?
Well, I’m sure the blind readers of HN (which I am certain exist) can answer this question, and you, a sighted person, don’t need to even wonder from your position of unknowing.
I mean, I explicitly used "wonder" because I don't wanna assume about blind people's experiences and needs. What else should I have done so you wouldn't come in kicking me in the nuts?
In this thread there's a bunch of "non-evil" responses, and your replies are all "I'm skeptical" or just dismissing them outright.
It appears from the outside that you've decided this is Officially Bad technology and aren't genuinely seeking evidence otherwise.
You're assuming worse of me than I'm assuming of the technology.
There's almost no reply here with a use that is a) not somewhat bad and b) has enough of an upside to compensate the downsides.
Except maybe this one, but I do know enough about accessibility to know how blind people generally use computers, which is why I asked the question.
It enables to use your favorite audiobook reader’s voice for all your TTS needs. E.g. you can have HN comments read to you by Patrick Steward, or by the Honest Trailers voice. Maybe you find that questionable? ;)
So, replacing voice actors with unpaid clones of their voices, effectively stealing their identity.
The range of use goes from totally harmless fun to downright evil.
If I take pictures of someone and hang my home with AI-generated copies of those pictures, I’m not stealing their identity.
The existence of Photoshop doesn't mean that you can put Kobe Bryant on a Wheaties box without paying him. There's no reason that a voice talent's voice can't be subject to the same infringement protections as a screen actor's or athlete's likeness.
Utterly questionable.
What about for remembering lost loved ones? There are dead people I would love to hear talk again, even if I know it's not their personality talking just their voice (and who knows, maybe with LLM training on a single person it could even be roughly their personality, too).
I can imagine a fairly big market of both people setting it up before they die, with maybe a whole load of written content and a schedule of when to have it read in future, and people who've just lost someone, and want to recreate their voice to help remember it.
I can't, and if I could, I think this would be fairly dystopian. Didn't black mirror have an episode about something similar? I vaguely remember an Asimov/Arthur C. Clark short story about the implications of time travel (ish) tech in a similar context. Sounds like a case of "we've build the torment nexus from classic sci-fi novel 'do not build the torment nexus'"
We already have ways to preserve the voices of people past their lives. Cloning their voices and writing things in their names is not only wrong but deceptive.
Jack Crusher did something similar for Wesley.
A long term goal of mine is to have a local LLM trained on my preferences and with a very long memory of past conversations that I could chat with in real time using TTS. It would be amazing to go on a walk with Airpods and chat with it, ask questions, learn about topics, etc.
I do that already with the chatgpt mobile app, but not with my own voice.
I'd like it if there were more (and non-american) voice options, but I don't think I'd ever want it to be my voice I'm hearing back.
Yeah, I wouldn't necessarily want it to be my own voice either, but it would be very cool to make it be the voice of someone I enjoy listening to. :)
You can use it to easily fix voice overs on you videos without needing to re-record etc.
Reasonable, but I'm skeptical of the market
Why is replacing voice actors evil? How is it worse than replacing any other job using a machine/software?
Agreed. I think the framing of "stealing" is a needlessly pessimistic prediction of how it might be used. If a person owns their own likeness, it would be logical to implement legal protections for AI impersonations of one's voice. I could imagine a popular voice actor scaling up their career by using AI for a first draft rendering of their part of a script and then selectively refining particular lines with more detailed prompts and/or recording them manually.
This raises a lot of complicated issues and questions, but the use case isn't inherently bad.
It could be used by people who can write English fluently, but are slow at speaking it, as a more personal form of text-to-speech.
Personally, I'm eager to have more control over how my voice assistant sounds.
Similarly, a real-time voice to voice translation system that uses the speakers voice would be really cool.
iPhone Personal Voice [1] is one. It helps people who are physically losing their voice and the ones around them to still have their voice in a different way. Apple takes long voice samples of various texts for this though.
[1]: https://www.youtube.com/watch?v=ra9I0HScTDw
That's kinda what I was thinking on the second paragraph. Still, gotta be a small market.
Organized crime should be happy to invest in that. Especially the "indian scam callcenter" type of crime.
Huh? Replacing human labor with machine is evil? You wouldn't even able to post this comment without that happening, because computers wouldn't exists or we wouldn't have time for that because many of us would work on farms to produce enough food without the use of human-replacing technologies.
In a similar way as machines allowed to produce abundance of food with less labor, the voice AI combined with AI translation can make information more accesible for the world. Voice actors wouldn't be able to voice act all the useful information in the world, (especially for the more niche topics and for the smaller languages) because it wouldn't worth to pay them and humans are also slower to than machines. We are not far from almost realtime voice translation from any language to any other one. Sure, we can do it with text-only translation, but voice makes it more accessible for lot of people. ( For example between 5–10% of the world has dislexya. )
There's a huge gap in uses where listenable, realistic voice is required, but the text to be spoken is not predetermined. Think AI agents, NPCs in dynamically generated games, etc. These things are currently not really doable with the current crop of TTS because either they take too long to run or they sound awful.
I think the bulk of where this stuff will be useful isn't really visible yet b/c we haven't had the tech to play around with enough.
There is also certainly a huge swath of bad-actor stuff that this is good for. I feel like a lot of the problems with modern tech falls under the umbrella of "We're not collectively mature enough to handle this much power" and I wish there were a better solution for all of that.
The ability to use my own voice in other languages so I can do localization on my own youtube videos would be huge.
With game development as well, being able to be my own voice actor would save me an immense amount of money that I do not have and give me even more creative freedom and direction of exactly what I want.
It's not ready yet, but I do believe that it will come.
I used it to translate a short set of tv shows that were only available in Danish with no subtitles in any other language and made them into English for my personal watching library.
The episodes are about 95% just a narrator with some background noises.
Elevenlabs did a great job with it and I cranked through the 32 episodes (about 4 mins each) relatively easily.
There is a longer series (about 60 hours) only in Japanese that I want to do the same thing for. But don't want to spend Elevenlabs prices to do.
It's not just worth justifying investment. You can make just about anything worth the investment as measured by a 90-day window of fiscal reporting. H!tmen were a wildly profitable venture for La Cosa Nostra.
It's about not justifying the societal risk.
I like the idea of cloning my own voice and having it speak in a foreign language
The first couple I've come up with are training courses at scale, or converting videos with accents you have a hard time understanding to one you can (no one you'll understand better than yourself)
If I am learning new content I can make my own notes and convert them into an audiobook for my morning jog or office commute using my own voice.
If I am a content creator I can generate content more easily by letting my AI voice narrate my slides say. Yes that is cheap and lower quality than a real narrator who can deliver more effective real talks ...but there is a long tail of mediocre content on every topic. Who cares as long as I am having fun, sharing stuff, and not doing anything illegal or wrong.
Maybe having better real time conversations in computer games. Like game characters saying your name in voiceovers.
I don't know how stressful my life will be then, but I thought about reading to my kids later and creating audiobooks with my voice for them, for when I am traveling for work, so they can still listen to me "reading" to them.