return to table of content

OTranscribe: A free and open tool for transcribing audio interviews

btown
18 replies
3h8m

Are there any open-source or paid apps/shareware/freeware that can:

- Transcribe word-by-word in real time as audio is recorded

- Work entirely locally

- Use relatively recent open-source local models?

I've been using otter.ai for real-time meeting transcriptions - letting me multitask and instantly catch up if I'm asked a question by skimming the most recent few seconds worth of the transcript - but it's far from perfect and occasionally their real-time service has significant transcription delays, not to mention it requires internet connectivity.

Most of the Whisper-based apps out there, though, as well as (when I last checked) the whisper.cpp demo code, require an entire recording to be ingested at once. There are others that rely on e.g. Apple's dictation frameworks, which is a bit dated in capability at the moment.

Anything folks are using out there?

uohzxela
8 replies
3h2m

I have built my own local-first solution to transcribe entirely locally in real time word by word, driven by a different need (I'm hard of hearing). It's my daily driver for transcribing meetings, interviews, etc. Because of its local-first capability, I do not have to worry about privacy concerns when transcribing meetings at work as all data stays on my machine. It's about as fast as Otter.ai although there's definitely room for improvements in terms of UX and speed. Caveat is that it only works on MacBooks with Apple silicon. Happy to chat over email (see my HN profile).

WaitWaitWha
6 replies
2h34m

I have some staff with combined hearing and visual needs. Have you researched the one-, two- all-party consent requirements? Asking because I hope to identify transcription as "non-recording".

btown
3 replies
2h28m

California has an exception for hearing aids and other similar devices, but it’s unclear if transcription aids count, or if this has been tested in court. https://codes.findlaw.com/ca/penal-code/pen-sect-632/ (Not a lawyer, this is not legal advice.)

noah_buddy
2 replies
1h45m

If it were ephemeral? Would that change this? Say recording the meeting locally a 5 minute frame then updating a meeting summary?

smeej
1 replies
1h30m

Do you mean ephemeral, or are you actually wondering about something implanted under the skin? I'd think/hope if it goes under the skin, it ends up in "hearing aid" territory. I'm less sure about if it doesn't persist.

noah_buddy
0 replies
1h1m

Yup, typo, sorry

hansvm
0 replies
1h2m

Two/all-party consent are hacky workarounds for the actual harm being inflicted (valid goals including not having your microwave inform Google's ad servers, not recording out-of-context jokes as evidence to imprison people, ... -- invalid goals caught up in the collateral damage include topics like the current one about hearing issues (note that a sufficiently accurate transcription service has all the same privacy problems 2-party consent tries to protect against, maybe more since it's more easily searchable)).

I'd be in favor of some startup pulling an Uber or AirBnB and blatantly violating those laws to the benefit of the deaf or elderly if it meant we could get something better on the books.

CyberDildonics
0 replies
37m

What did your own research turn up?

smeej
0 replies
1h31m

I was so excited until the very end. I have the wrong hardware.

baby_souffle
4 replies
2h46m

Are there any open-source or paid apps/shareware/freeware

Google Pixel phones have this feature and it works _very_ well.

ericjmorey
2 replies
1h51m

How is that feature accessed? Or what does Google call it so I can search for it.

abecedarius
0 replies
59m

Live Transcribe in the accessibility settings. AFAIK it's available on any fairly recent Android phone. I bought a Pixel tablet for no other reason but to run it -- nothing else I've tried comes close for local-only continuous transcribe-as-they-speak. (iOS has a similar feature also under accessibility; it's good but not at the same level. Of course I'd love to see an open-source solution.)

This was for English. One problem it took me a while to realize: when I switched it to transcribe a secondary language, it was not doing it on-device anymore. You can tell the difference by setting airplane mode.

neves
0 replies
2h15m

Have you tried for non English languages?

New Microsoft Surfaces have this feature but just works for English

alfredgg
1 replies
1h28m

I helped coding oTranscribe+ [0], which does something similar to what you are asking for. Using ElectronJS and the current, at that moment, version of oTranscribe, there is this desktop application. It also exists as web version and PWA [1].

Language models were those from BSC (Barcelona Supercomputing Center) at the time. The transcription is done via WASM, using Vosk [2] as base.

I hope it fits.

[0] https://github.com/projecte-aina/oTranscribe-plus [1] https://otranscribe.bsc.es/ [2] https://github.com/alphacep/vosk-api

smeej
0 replies
5m

Is there a way to get it to punctuate? Or does it only jot down words?

smeej
0 replies
1h32m

I've been using Transcribro[0] on Android/GrapheneOS. It's FOSS and only local, and while it's not word-for-word real-time, it doesn't have to wait for the whole audio to be uploaded before it can work. This is on a Pixel 5a, so hardly impressive hardware.

It works well enough that I use it with Telegram to shove messages over to my Linux machine when I don't feel like typing them out, which is such an unsophisticated hack, but is getting the job done. I spent a couple hours trying to find a Linux-native alternative, or even get this running in Waydroid, and couldn't find anything that worked as well, so I decided not to let the "smooth" become the enemy of the "good enough to get the job done."

[0] https://github.com/soupslurpr/Transcribro

andrei-akopian
0 replies
2h22m

futo.org has FOSS voice input android app (voiceinput.futo.org) and live captions (https://github.com/abb128/LiveCaptions) for Linux. They specifically developed their own model that does fast real time transcriptions.

Not sure if that helps for your specific usecase.

cube2222
15 replies
10h2m

I needed to do this this week (transcribe an interview with multiple speakers) and used https://github.com/MahmoudAshraf97/whisper-diarization

Worked excellent.

It generates both a file that just contains a line per uninterrupted speaker speech prefixed with the speaker number, as well as a file with timestamps which I believe would be used as subtitles.

wanderingmind
4 replies
7h38m

The problem with using OpenAI whisper is that its too slow on CPU only machines. Whisper.CPP is blazing fast compared to Whisper and I wish people build better diarization on top of that.

stavros
2 replies
7h3m

What's OpenAI Whisper vs whisper.cpp? Do you mean whisper-diarization uses the API?

stavros
0 replies
6h49m

Ah I see, thanks. Hm, I would imagine that it's not hard to make something that works with both (the surface area of the API should be fairly small, I imagine), odd that projects use the former and not the latter.

aidenn0
0 replies
3h10m

Another advantage of Whisper.CPP is that it can use cublas to accelerate models too large for your GPU memory; I can run the medium and large models with cublas on my 1050, but only the small if I use the pure GPU mode.

adipasquale
3 replies
9h47m

I have had very good results using Spectropic [1], a hosted Whisper Diarization API service as a platform. I found it cheap and way easier and faster than setting up and using whisper-diarization on my M1. Audiogest [2] is a web service built upon Spectropic, I have not yet used it.

disclaimer : I am not affiliated in any way, just a happy customer! I had some nice mail exchanges after bug reports with the (I believe solo-)developer behind these tools.

---

[1] https://spectropic.ai/

[2] https://audiogest.app/

thomasmol
2 replies
7h23m

Thanks for the shout-out and kind words!

Thomas here, maker of Spectropic and Audiogest. I am indeed focused on building a simple and reliable Whisper + diarization API. Also working on providing fine-tuned versions of Whisper of non-English languages through the API.

Feel free to reach out to me if anyone is interested in this!

dchuk
1 replies
4h1m

Great looking API. Are you able to, or do you have plans, for there to be automatic speaker identification based on labeled samples of their voices? It would be great to basically have a library of known speakers that are auto matched when transcribing

thomasmol
0 replies
2h34m

Thanks! That is something I might offer in the future and is definitely possible with a library like pyannote. Would be really cool to add for sure.

I am also experimenting with post-processing transcripts with LLMs to infer speaker names from a transcript. It works pretty decent already but it's still a bit expensive. I have this feature available under the 'enhanced' model if you want to check it out: https://docs.spectropic.ai/models/transcribe/enhanced

RamblingCTO
3 replies
9h57m

I had better success with whisperx, as whisper-dia does sometimes have weird issues I couldn't resolve: https://github.com/m-bain/whisperX

cube2222
2 replies
8h39m

iirc whisper-diarization uses whisperx under the hood.

I’ll be honest, I haven’t dived much into this as I just needed something transcribed quickly, but when I was looking at WhisperX I couldn’t find a CLI that would just out of the box give me a text file with a line per speaker statement (not per word).

RamblingCTO
0 replies
7h24m

I use it like this:

whisperx $file int8 --min_speakers 3 --max_speakers 3 --language de --hf_token $token --diarize

hubraumhugo
0 replies
3h36m

Fascinating how traditionally very complex and hard ML problems are slowly becomming commodities with AI:

- transcription

- machine translation

- OCR

- image recognition

H8crilA
0 replies
9h32m

I often subtitle old, obscure, foreign language movies with Whisper. Or random clips found on foreign Telegram/Twitter channels. Paired up with some GPT for translation it works great!

You can do this locally if you have enough (V)RAM, but I prefer the OpenAI API, as usually I don't have enough at hand. And the various Llamas aren't really quality on par with GPT-4. If you only need Whisper, and no translation, then local execution is indeed very viable. High quality Whisper fits in 4GB of (V)RAM.

jagermo
5 replies
10h22m

fantastic tool; I used it a lot to transcribe interviews during plane travels where there was no internet, and I needed to fill the time. Really useful to have if you do a lot of interviews

dotancohen
4 replies
8h28m

From the homepage:

A free web app to take the pain out of transcribing recorded interviews

How did you use a web app on the plane with no internet?

jagermo
0 replies
7h13m

it works offline if you preload the website :)

grandfunction
0 replies
8h21m

Ran the server on his or her laptop...

You don't need the internet to use a web browser

Havoc
0 replies
8h21m

It’s MIT licensed so presumably self hosted

kimoz
3 replies
7h51m

Anyone knows a free tool for generating subtitles for movies and series videos ?

drtgh
0 replies
3h25m

SubtitleEdit is one of the most complete and has many online tutorials from users.

Make sure they are recent tutorials because they will probably mention how to use the automated generation tools/plugins that wasn't available years ago.

https://github.com/SubtitleEdit/subtitleedit

doug_life
0 replies
7h18m

https://github.com/McCloudS/subgen worked very well for me. I had a TV series where somehow the last few seasons timestamps did not match up with subtitle files I could find online. I used subgen and it worked surprisingly well.

TrojanHookworm
3 replies
10h10m

Use this a lot. It's nice and simple and has exactly the tools you need (playback speed control, easy pause/play) and nothing more. Greatly prefer it over automatic transcription tools give you 40 pages of 'umm's and 'ahhhh's to filter through and edit.

stavros
2 replies
6h49m

Can you not give the transcript to an LLM to remove the umms and ahhs?

BiteCode_dev
1 replies
5h56m

People not used to AI have blind spots that prevent them from seing evident use case like this.

I'm always surprised at the amazed look of my friends when they see me concretely use the tool. They just didn't picture it until they saw it in action.

stavros
0 replies
5h50m

It's not even people not used to AI, I developed a tool that uses AI to do something, and then kind of couldn't be bothered to fix some of the output manually. It only occurred to me days later that I can ask the AI to fix it.

leiferik
2 replies
3h21m

You're always welcome to try my service TurboScribe https://turboscribe.ai/ if you need a transcript of an audio/video file. It's 100% free up to 3 files per day (30 minutes per file) and the paid plan is unlimited and transcribes files up to 10 hours long each. It also supports speaker recognition, common export formats (TXT, DOCX, PDF, SRT, CSV), as well as some AI tools for working with your transcript.

rsingel
1 replies
2h29m

This looks great. Did you have an API or plan to release one?

leiferik
0 replies
2h18m

Thanks! Nothing to announce on the API front right now, but appreciate you asking :)

jrochkind1
2 replies
5h59m

Kinda surprised to not have AI integration.

You do still need to proof and QA even AI results, if you want a publication quality result, and do things like attribute who is speaking when (at least Whisper can't do that), and correct "unusual" last names and things. So I feel like people using AI still need good tools for the correcting/finishing/proofing too, that would be similar to the tools for non-assisted transcription.

MattieTK
1 replies
5h22m

This was written a really long time ago by a former WSJ Graphics reporter (Elliot Bentley) who is now at Datawrapper.

It is now operated by Muckrock and hasn't seen changes made to it in a while.

That's why it doesn't have any of these integrations, the technology just didn't exist.

jrochkind1
0 replies
5h3m

Aha, good to know! That's actually important context, that this is not a recent release, and doesn't necessarily have a lot of ongoing development.

choya-love
2 replies
9h23m

Any new language support in the future? Fingers crossed for japanese

fabianmg
0 replies
8h25m

Am I missing something?. For what I checked it supports every language, as is yourself the one transcribing by hand. This is just an UI to watch the video or audio while you're typing it.

comradesmith
0 replies
6h1m

https://tactiq.io is made for meetings, but also does uploaded transcripts and supports Japanese!

accidbuddy
2 replies
4h7m

Anyone knows one with transcription and translate in real time?

Nowadays, I use libretranslate/libretranslate and pluja/whishper to do this, but not at real time.

Bayko
1 replies
4h3m

Ah this brings back memories. When I was in college with limited money, I used to pirate movies and most of them didn't have subtitles and I used to daydream of writing a VLC plug-in which would real time generate subtitles. But I had better things to do like play video games...

space_oddity
0 replies
1h38m

Many of us have had those ambitious tech ideas...

BetterWhisper
2 replies
8h20m

If you are looking for something automatic that also allows you to interact with your transcripts chatgpt style then I would recommend https://www.videototextai.com/

Terretta
1 replies
5h12m

That cookies box though... Dark pattern (accept lots + accept all, fake drag affordance, covering a quarter of the page) for cookies doesn't bode well for privacy protections around the transcripts.

BetterWhisper
0 replies
2h2m

You are allowed to delete any transcription you make and with that we do not keep any copy of the transcripts :) . The cookie banner is there to comply with the EU laws.

nullbar
1 replies
8h54m

Maybe it isn't perfectly clear, but OTranscribe isn't an automatic speech-to-text tool, but instead, a UI for assisting in manual transcribing.

So no AI here, folks.

space_oddity
0 replies
1h42m

Yep, it's designed to assist with manual transcription

neves
1 replies
2h23m

Does anybody tested it with Brazilian Portuguese? It is a hard problem, since we have too many accents.

dmd
0 replies
1h37m

I don't understand what the issue is. You don't know how to type the different diacritical marks? Or the textbox isn't accepting them? (Which seems like it would be a browser issue, not an issue with the site.)

bcherny
1 replies
5h6m

Looks cool! Unclear from the docs, but does it support non-English languages? How about mixed-language interviews?

avodonosov
0 replies
4h45m

Yes! Any language you understand is supported!

ulrischa
0 replies
2h58m

Pretty amazing what a webapp an do. I whished there were more lile them and not all these native apps

tkgally
0 replies
4h47m

I was curious how good a transcription I could get from what may be the best multimoldal LLM currently, Gemini-1.5-Pro-Experiment-0801, so I had it transcribe five minutes of an interview between Ezra Klein and Nancy Pelosi from earlier today. The results are here:

https://www.gally.net/temp/20240809geminitranscription/index...

Aside from some minor punctuation and capitalization issues, Gemini’s transcription looks nearly perfect to me. There were only one or two words that I think it misheard. If I had transcribed the audio myself, I would have made more mistakes than that.

One passage struck me in particular:

  And then he comes up with "weird," which becomes viral and the rest, and here he is. 
How did Gemini know to put “weird” in quotation marks, to indicate—correctly—that the speaker was referring to Walz’s use of the word as a word? According to Politico, Walz first used the word in that context in the media on July 23.

https://www.politico.com/news/2024/07/26/trump-vance-weird-0...

space_oddity
0 replies
1h43m

oTranscribe is a free option for transcription but in many cases it's just too simple

matejmecka
0 replies
2h57m

Just pitching in a transcription tool that lets you transcribe video and audio files using Whisper and WASM in your browser, and get a .txt, .srt, .vtt file. Maybe in the future support for Whisper Turbo?

https://video2srt.ccextractor.org/

Disclaimer: Working on this project.

kgdiem
0 replies
4h16m

I started making an open source macOS app to do this with whisper and potentially pyannote.

It is functional but a bit slow. I think using whisper directly instead of swift bindings will help a lot.

Really interested in adding diarisation but having a lot of trouble converting Pyannote to CoreML. Pyannote runs so slowly with torch on CPU. Haven’t gotten around putting my latest work for that on GitHub yet.

Happy to accept contributions —

Some priorities right now:

* Fixing signing for local builds

* Replace swift whisper with whisper cpp

* Allowing users to provide their own models

https://github.com/Stack-Studio-Digital-Collective/Auditif

justinclift
0 replies
3h15m

From their FAQ:

    Does oTranscribe automatically convert audio into text?
    
    Sorry! It doesn’t. oTranscribe makes the manual task of transcribing
    audio a lot less painful. But you still have to do the transcription.

ilt
0 replies
9h6m

I currently use Aiko’s free iOS app which does offline transcription using OpenAI’s Whisper model. It has been working pretty well for me so far. It can export in formats like SRT, TXT, CSV, JSON and text with timestamps too. https://sindresorhus.com/aiko

dmitrykan
0 replies
6h35m

I'm working on the tool, that includes AI. My original target is to test it on my https://www.youtube.com/c/VectorPodcast by offering something that Lex Fridman does for his episodes.

Current features: 1. Download from YT 2. Transcribe using Vosk (output has time codes included) 3. Speaker diarization using pyannote - this isn't perfect and needs a bit more ironing out.

What needs to be done: 4. Store the transcription in a search engine (can include vectors) 5. Implement a webapp

If anyone here is interested to join forces, let me know.

ciaran00
0 replies
5h51m

Talio.ai allows you to do this with chatGPT style chat with the transcript plus numerous other features https://talio.ai

bilater
0 replies
39m

If you just want quick transcriptions of YouTube video this works pretty well https://www.you-tldr.com/

avodonosov
0 replies
4h52m

I made a similar tool for making tables of contents for youtube videos: https://youtoc.by/

Not developing it actively after I created tables of contents for the several videos I needed, years ago. If I ever need it again, I will probably work on mobile UI (aka responsive)