return to table of content

How to do OCR on a Mac using the CLI or just Python

zavertnik
21 replies
22h31m

Nice post, OP! I was super impressed with the Apple's vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database. I tried other OCR CPU methods (since macOS and Nvidia still don't play nice together) such as Tesseract but found the output to be incorrect too often. The vision framework was not only the highest quality output I had seen, but it also used the least amount of compute. It was fairly unstable, but I can chalk that up to user error w/ my implementation.

I used a combination of RHetTbull's vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.

I wouldn't call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.

[1]: https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...

[2]: https://github.com/straussmaximilian/ocrmac

kkielhofner
9 replies
16h44m

Tesseract alone is widely known to be "meh" at this point.

If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.

I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.

There are also various approaches to use these in conjunction but that gets involved.

[0] - https://github.com/Unstructured-IO/unstructured-inference

[1] - https://github.com/mindee/doctr

[2] - https://github.com/mindee/doctr#models-architectures

[3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

fancy_pantser
3 replies
15h44m

https://github.com/mindee/doctr/issues/1049

https://github.com/JaidedAI/EasyOCR#whats-coming-next

Happy to see OCR is advancing lately, but I really need HWR.

I am looking for something this polished and reliable for handwriting, does anyone have any pointers? I want to integrate it in a workflow with my eink tablet I take notes on. A few years ago, I tried various models, but they performed poorly (around 80% accuracy) on my handwriting, which I can read almost 90% of the time.

riveducha
0 replies
12h41m

This is maybe not a solution, but how does ChatGPT do on your handwriting if you upload a photo? If that works well then maybe you can use the API?

animal_spirits
0 replies
1h52m

AWS Textract is by far the best OCR engine we've used, it does great with handwritten text

Someone
0 replies
9h27m

Reading https://heartbeat.comet.ml/comparing-apples-and-google-s-on-... (2017), I expect this code to work for handwritten text.

How well it works on your handwriting is for you to test, but if you, having all kinds of contextual information, can’t read it well, I guess it won’t, either.

mdani
1 replies
12h3m

Does anyone know what languages Apple supports? The docs don't have a list. Tesseract might be "meh" but it is probably the best open source option available for devnagari scripts or Persian, for example.

lelandfe
0 replies
11h20m

I've used it on a number of Cyrillic languages (Russian, Bulgarian, etc), Hungarian, Turkish, along with the typical ones (Spanish, German, French, Italian, Portuguese). I've heard it supports Chinese. I just tried Persian and devnagari samples on my Mac and it could not do either.

mcbetz
0 replies
10h20m

I found this detailed comparison of OCRs (both open source and cloud services) super helpful: https://source.opennews.org/articles/our-search-best-ocr-too...

docTR comes out as strongest open solution.

haolez
0 replies
15h52m

Looks nice! Do you know if they can do table structuring as well? Something similar to what Amazon Textract does[0].

[0]https://docs.aws.amazon.com/textract/latest/dg/how-it-works-...

beembeem
0 replies
16h27m

I have found Tesseract to be both better than I expect (it feels great when it works most of the time) and worse than I expect (not quite enough correct data to fully rely on).

RockRobotRock
6 replies
19h22m

It's better than Tesseract? That's really impressive.

Could you run a farm of macOS machines and turn this into an API for profit? Would that be legal?

wcedmisten
1 replies
16h11m

You could run a farm of iphones to OCR memes if you felt so inclined

https://findthatmeme.com/blog/2023/01/08/image-stacks-and-ip...

oarsinsync
0 replies
10h39m

That blog post is glorious. Thanks for sharing.

lelandfe
0 replies
18h32m

In my experience using it constantly, it is far beyond Tesseract’s.

I have never gotten truly garbled output from Apple’s, whereas Tesseract will frequently produce random Unicode characters from text.

Apple’s also handles things like overlapping text or changing font sizes and typefaces far better than any open-source OCR I’ve used.

laborcontract
0 replies
15h11m

IMO it goes head to head with the anazon/google cloud OCR services. It’s works superbly.

gvkhna
0 replies
19h11m

Yes, as long as you pay for the mac hardware it’s yours to do with as you please. I’m not an attorney and this is not legal advice.

bufo
0 replies
18h31m

Way, way better than Tesseract!

sumedh
3 replies
17h16m

Is there a tutorial on how to extract table from pdf or image for Apple Vision Framework. I tried the two links in your post and it just extracts the text without maintaining the table structure.

AWS textract provides sample python code to extract tables into csv which works great.

mcbetz
1 replies
10h23m

I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)

sumedh
0 replies
7h40m

Thanks will check it out.

Have you compared it with Textract?

dkjaudyeqooe
0 replies
13h47m

The best way I've found for extracting tables from PDFs in a well formatted way is Adobe's free online service:

https://www.adobe.com/acrobat/online/pdf-to-excel.html

tough
4 replies
21h23m

I'm a huge fan of this little ocr tool isntalled through brew onto my macbook https://github.com/schappim/macOCR

nemosaltat
2 replies
20h44m

Same, and for my purposes, I just wrap that utility in a macOS Shortcut I can click from my menu bar, or launch from Quicksilver.

schappim
0 replies
8h41m

Great to hear! Shottr also has nice OCR these days.

bogeholm
0 replies
20h19m

Quicksilver, now there’s a blast from the past! I don’t think I’ve installed it on any Mac in the past 5 years, but I used to love it.

What are the advantages over native macOS shortcuts these days?

schappim
0 replies
8h42m

Awesome to hear!

hintymad
3 replies
18h54m

I did notice that many Mac apps, including Safari and Preview and Notes, do OCR on images automatically. It's pretty neat that I can easily select text in an image and copy and paste it somewhere else.

sen
1 replies
18h11m

It’s kinda ridiculous how good it is, you can even select text from inside a YouTube video while it’s playing (or pause if needed).

Also if it’s text of a URL/domain or a QR code (eg in a photo of a poster, or in a video) you can hold-press/hold-click to open the link directly from the image.

H4rryp0tt3r
0 replies
15h4m

Thanks for sharing this! I had no clue about it.

lostlogin
0 replies
9h1m

The photos apps too. It’s just so good at conferences or when you need a long string digitised (iso default router password!). Photo > select > copy > then paste on phone or Mac (via that actually awesome handoff feature).

HelloImSteven
3 replies
22h11m

I'll throw my solution into the mix: https://skaplanofficial.github.io/PyXA/tutorial/images.html#...

PyXA uses the Vision framework to extract text from one or more images at a time. It's only a small part of the package, so it might be overkill for a one-off operation, but it's an option.

wahnfrieden
2 replies
21h8m

fyi you're using the old and less accurate api, VNRecognizeTextRequest

ImageAnalyzer is newer and much better

I bet this shortcut from OP is also using the older API under the hood

HelloImSteven
1 replies
20h22m

ImageAnalyzer is Swift-only and has no corresponding Objective-C method, so it's not available in PyObJC. I can look into bridging it at some point.

gvkhna
0 replies
19h9m

This would probably be pretty easy to do with swift and python processes running side by side with grpc.

stephenr
2 replies
14h16m

The article was posted.. yesterday, and the entire reason given for not using the builtin Shortcuts sharing feature is... an article from 2 years ago, about a bug in the shortcuts hosting service, which has obviously been fixed.

I get that some people will want to create it from scratch themselves or incorporate the actual meat of it into a larger shortcut... but not sharing one that does what the article says, because of a bug 2 years ago, is a bit of a weird take.

gregsadetsky
1 replies
12h54m

sorry, that link may have been a cheap shot... but I did try to export the shortcut I created, and kept getting an error about not being signed in to icloud...! and I am signed in to icloud. it's just so confusing.

why can't shortcuts be exported as ... shortcut files?

it's not ideal to have people recreate the shortcut step by step (which is what I ended up describing in my post) but... I couldn't find a better way..! :-)

if you'd be able to recreate the shortcut and share it, and post the link here (and/or email it to me), I'd love to place that in the blog article! thank you

stephenr
0 replies
10h37m

It seemed to work on iOS (https://www.icloud.com/shortcuts/cd7d2c5e63d8482ab0618e163bb...)

I'll try it again on macOS when I'm back at my desk.

Edit: also works on macOS Sonoma (https://www.icloud.com/shortcuts/6216aa9072144846adcaae69a5a...) - this one has all input sources selected, the iOS created one has only images/media/pdfs/files/rich text selected for input.

srott
2 replies
22h27m

you can use clipboard with pbpaste/pbcopy commands

ocr-text "$1" && pbpaste

llimllib
1 replies
21h13m

It also outputs to the command line if you pipe it to cat

    shortcuts run ocr-text -i new-haven-pizza.jpg | cat

philsnow
0 replies
13h28m

Oddly enough if you enable it as a "quick action", when you run it, Finder creates a file in the same directory as the image containing the OCRed text (and named according to the first line of OCRed text).

I went back into my shortcut and Shortcuts added a pseudo-action "Stop and output <copy to clipboard>; if there's nowhere to output: <Do Nothing>", and I would think that "Do Nothing" would mean don't create a file, but I guess Quick Actions has some kind of special meaning given that all the other ones seem to be intransitive actions, implying that the user wants a file as the output.

melonamin
2 replies
17h29m

I've built an opensource tool that gives you both CLI and a nice UI. It is free.

https://trex.ameba.co

chanandler_bong
1 replies
10h26m

+1000 for Trex!! I use it daily, thank you for creating it!

I am impressed how it handles handwriting and crappy screen grabs.

082349872349872
0 replies
9h7m

It's not so well known that one of the original rationales for "offside rule" programming languages is that it works just as easily for handwritten code as it does for typed.

Will we ever have programming languages that are primarily designed to take input from whiteboard grabs? (ie where not only handwriting, but also placement, connectivity, and maybe shape are meaningful?)

gist
2 replies
21h58m

To place contents in a file (not claiming this is the most efficient way but it works)

OCRTHISFILE="ocr-test.jpg"

shortcuts run ocr-text -i "${OCRTHISFILE}"

pbpaste > ${OCRTHISFILE}.txt

or to view output and place in file:

OCRTHISFILE="ocr-test.jpg"

shortcuts run ocr-text -i "${OCRTHISFILE}"

pbpaste | tee ${OCRTHISFILE}.txt

msxbel
1 replies
19h51m

Or use MacOS shortcuts to output ocr text as file (Action: "Append to Text File")

gist
0 replies
46m

Yes took a bit of fiddling but that does work thanks.

eigenvalue
2 replies
22h35m

Weird, I couldn't get it to work on a bunch of different files, even using very simple file names. Kept getting this error:

Error: The operation couldn’t be completed. (WFBackgroundShortcutRunnerErrorDomain error 1.)

Oras
1 replies
22h5m

I suppose you haven't renamed the new shortcut to `ocr-text`

eigenvalue
0 replies
5h41m

I did do that.

rikafurude21
1 replies
21h25m

Are ios and macos shortcuts crosscompatible? I didnt know there was shortcuts for the mac, seems pretty powerful to be able to run them from the terminal too. Thanks OP

diegof79
0 replies
21h11m

Yes they are compatible as long you use actions available on both platforms. For example, you can use AppleScript or shell in macOS but it will not work on iOS. However, if you use cross platform apps shortcuts it works even when you write files into the iCloud folder. For example, I did a shortcut that takes today’s events from the Calendar and appends the list into a Markdown file in a Obsidian vault on iCloud. I use it to scaffold meeting notes, and it works on my phone too.

pugio
1 replies
13h35m

I would really love an `ocrmypdf` like tool which uses Apple Vision to create searchable PDFs from scanned images. I've been searching every week or so for some kind of project but so far haven't found anything. Perhaps it's time to make it myself...

gregsadetsky
0 replies
12h53m

that sounds bonkers useful!! you should definitely prototype the smallest version possible and publish it (and post it here as a Show HN!)

I know that I'd definitely use it!

mushufasa
1 replies
21h41m

Very cool. Anyone know how this compares to AWS Textract in general? Does the Apple Vision framework support table recognition?

llimllib
0 replies
21h4m

It looks like it does, but you need to handle it at a pretty low level, this shortcut won't get you there: https://developer.apple.com/videos/play/wwdc2019/234?time=19...

minimaxir
1 replies
20h57m

Surprisingly, the Extract Text from Image action is available on Intel Macs: normally, features like automatic-image-OCR is limited to Apple Silicon Macs.

stephenr
0 replies
14h24m

It's almost as if the constant clucking about "planned obsolescence" and deliberately withholding features is a load of bollocks.

justinl33
1 replies
20h14m

Awesome! Is there a similar technique for the Apple vision ‘Copy Subject’ feature? I’ve become extremely reliant on it, but it feels very limited in access.

pimlottc
0 replies
19h55m

I had to Google this, do you mean the feature in Photos on mobile where you can "extract" items from a picture and make them into stickers? Apple seems to call it "lifting subjects" [0] [1].

0: https://support.apple.com/guide/iphone/lift-a-subject-from-t...

1: https://developer.apple.com/videos/play/wwdc2023/10176/

EDIT: Try replacing the "Extract text" action with "Remove background". When running the shortcut, use "-o" to specify output image filename.

   shortcuts run remove-background -i ~/Downloads/portrait-beard.avif -o beard.jpg

TimeBearingDown
1 replies
22h32m

Very cool, and seems handy!

I’ve always had good results from the Preview.app. I wonder how this engine compares for number of errors in a difficult source versus Free alternatives.

smrtinsert
0 replies
18h55m

Yeah preview app is everything. I take screenshots now for deliverables.

BoppreH
1 replies
21h22m

I tried doing something similar on Windows, and realized that PowerToys[1], a Microsoft project I already had installed, actually contains a very good OCR tool[2]. Just press Win+Shift+T and select the area to scan, and the text will be copied to the clipboard.

[1] https://learn.microsoft.com/en-us/windows/powertoys/

[2] https://learn.microsoft.com/en-us/windows/powertoys/text-ext...

mywacaday
0 replies
20h2m

I use autohotkey + powertoys to append screenshot data to a CSV, works great with it's own key mapping

systemtrigger
0 replies
13h36m

This works great for local files. I can't seem to modify the shortcut correctly for an image hosted at a public URL.

sigoden
0 replies
11h50m

use LLMs (gpt-4-vision or LLaVA) with aichat

`aichat -f tmp/test.png -- output only text in the image`

https://github.com/sigoden/aichat

schappim
0 replies
8h42m

If you want to do this a lot easier use: https://github.com/schappim/macOCR

predictsoft
0 replies
20h41m

On Windows, A9T9 does a great job of OCR'ing scanned JPEG files (and any JPEG file). It's also free.

I scanned about 100 A4 documents in just a couple of minutes.

osbkca
0 replies
10h3m

I'm using https://xclippy.com/ app. It also has an OCR feature.

novagameco
0 replies
21h27m

On Windows I recommend text extractor from powertoys:

https://learn.microsoft.com/en-us/windows/powertoys/text-ext...

loevborg
0 replies
4h45m

CleanShot X (which is great) also allows you to OCR from your screen ("Capture Text")

krudnicki
0 replies
9h3m

I made a Shortcut + PHP to get text from a screenshot, ask ChatGPT to make a task name from text, and create new task in Clickup and attache a screenshot. Use it often.

jmz1
0 replies
2h34m

Raycast (macOS only) is also nice as it's able to search images by text. It also allows you to copy text from those images. Quick official demo here: https://www.youtube.com/watch?v=c96IXGOo6E4

gvkhna
0 replies
19h8m

Is there any benchmarks on speed/compute/accuracy anywhere comparing to tesseract v5?

ggm
0 replies
10h25m

How to interact with built in OCR via the cli? "Doing" something is (to me) which ocr tooling, what fonts it recognises, all the associated package management and tuning not "how I configure the gui and ui to let me use the tool they shipped with the os"

geniium
0 replies
20h15m

Have u guy tried ChatGpt or other alternative?

est
0 replies
17h20m

Speaking the need of OCRs, I found a comment relevant and quite funny

we already have a common, portable data format for social media. It's screenshots of tweets

https://news.ycombinator.com/item?id=38841569

elpakal
0 replies
18h57m

I don't know why but instead of pasting the text it copied to make sure it worked, I made it read it:

shortcuts run ocr-text -i <A PATH TO SOME IMAGE> | say -v Fred

dotsam
0 replies
17h41m

I have played around with the OCR on my mac, and have been very impressed. It has been consistently better than tesseract for my purposes.

However, when creating a PDF from images using Preview and exporting using ‘Embed Text’ option to OCR, I have noticed the text is worse than if you OCR the exact same images using the shortcut above or using a script. Presumably Preview is using the Vision framework’s less accurate fast path when preparing the PDF. https://developer.apple.com/documentation/vision/recognizing...

djhn
0 replies
7h14m

Does anyone know of a straightforward library or setup to scan newspapers and/or magazines and detect and extract images and advertisements?

cyberax
0 replies
20h0m

It doesn't work for Chinese characters :(

andreasley
0 replies
19h46m

macOS Ventura and newer actually have basic OCR functionality integrated into the Image Capture UI. When using an AirPrint-compatible scanner and scanning to PDF, the checkbox "OCR" is shown in the right pane.

CodeNest
0 replies
19h2m

Python is quite basic and might not be very helpful for advanced users. It seems overly detailed for such a simple task.