return to table of content

Stirling-PDF: local web application to perform various operations on PDFs

Alifatisk
46 replies
18h58m

It's scary how such a widely used format (PDF) is almost in full control by Adobe. I have yet to see a true competitor to Adobe Acrobat. The only one that has come really close is the one that comes built-in for macOS. It's a hidden gem.

bla3
14 replies
18h1m

PDF had an open spec and oodles of programs supporting it. I don't understand where this comment is coming from.

webel0
12 replies
17h55m

Adobe acrobat (and maybe reader) is really the only app that fully supports the full PDF spec as understood by the authors of the spec. There are ridiculous parts of the spec that allow support for things like JS, etc.

bdd8f1df777b
4 replies
15h15m

That is true, but I have never encountered a PDF that is not produced by me and cannot be faithfully represented in third party PDF readers. And I give up the idea of producing those kinds of PDFs because I know the people I send to will complain about me rather than their PDF readers. So, Adobe Acrobat doesn't have any monopoly power here, since almost no one cares about those things only they can do.

Moru
2 replies
10h7m

We often get PDF's that does not work in our pipeline and it's always blamed on the pipeline, not on the creating software. The user usually converts the PDF to an image with adobe reader and screenshot, load up Libreoffice, paste and export it as PDF archive.

bdd8f1df777b
1 replies
9h34m

So the PDF that does not work in your pipeline is created by LibreOffice rather than Adobe Acrobat? That doesn't seem to add any strength to the argument that "Adobe Acrobat has unusual powers because only it can handle the full spec of PDF".

Moru
0 replies
8h43m

No, you missread. The PDF that works is created by anything that does not use the full spec of new PDF versions. We have chosen Libreoffice because we already use it for other things. If we recreate the PDF in Libreoffice as PDF archive version it works just fine. The problem is usually a pamphlet created by some ad agency using the absolutely latest version of some layout program, neither adobe nor libreoffice. The PDF usually works just fine in Adobe but not in our pipeline that uses all sorts of linux programs to process into a JPG in the format and orientation our system needs. Noone has had the time or energy to fix it since most stuff works so for now it will be downsampled by a screenshot and just showed into the system. The added benefit is the PDF shrinks from 150 MB to 300 kB in the process.

Adobe Acrobat is the only thing that can handle all cases yes. All other programs uses (different) special cases each and most of them fail in some edge cases. It can be funny letters showing up because of fonts not working properly or images disapearing or all sorts of things. I have given up to fix them all. I still have a library of PDF's that we used to run through to try to get as many as possible to work.

izacus
0 replies
7h39m

That is true, but I have never encountered a PDF that is not produced by me and cannot be faithfully represented in third party PDF readers.

That's because PDF is well designed and has a fallback for advanced page elements so more primitive readers can still render them.

cmrx64
3 replies
17h47m

I don’t think it’s ridiculous to want a scriptable document, especially for complicated forms. Likewise for the other much-dragged features for 3d scenes.

https://pdfa.org/resource/pdf-in-manufacturing/ is a great usecase.

fuzztester
2 replies
14h5m

The PDF spec even supports attachments in PDF files.

rudasn
1 replies
4h48m

How does that work exactly? Is it widely supported?

I recently had to add an embed feature to our pdf rendering, to allow users to embed other pdfs inside the one we generate for them. Since we use headless Chrome, I used pdfjs from mozilla to render the embedded pdf on screen before generating the pdf, so you can actually see and read the embedded pdf.

Works pretty well, but was wondering about this attachment feature of pdfs.

cmrx64
0 replies
3h32m

PDF is a container format and yoy can just shove files in there. pdftk supports this with attach_files, and at the very least the linux pdf readers I’ve used know how to deal with them.

wolverine876
1 replies
16h40m

I've seen many third party PDF viewers; I think all supported JavaScript. It's commonplace, not 'ridiculous' at all.

Adobe acrobat (and maybe reader) is really the only app that fully supports the full PDF spec

The full spec is large and afaik has many obscure pieces, including 3-D, etc. Like many specs, they don't match reality and nobody takes completeness too seriously. For almost all users, supporting the entire PDF spec doesn't matter (does it matter for any user - does any person or organization use the entire spec over their lifetimes?).

Also, do we know that Adobe supports the entire spec?

mpweiher
0 replies
8h12m

Yeah, even Adobe doesn't really use the full spec. Or at least didn't.

There's a fairly big chunk in the spec of special presentation attributes for slideshows. When I implemented them I was surprised that slide shows produced by Acrobat didn't work. Well, obviously my implementation was buggy.

Er, no, Adobe didn't use their own slide show attributes for the slide shows produced by Acrobat. They used JavaScript instead.

Oh well. ¯\_(ツ)_/¯

KeplerBoy
0 replies
17h29m

What's wrong with using a sane subset of the spec aka PDF/A?

wolverine876
0 replies
16h38m

Agreed. Also, possibly the most commonly used reader is pdf.js, the FOSS component used by at least some major web browsers.

pmarreck
10 replies
16h48m

It’s incredible to me that not only has Preview.app been the best non-Adobe way to use PDF’s for decades now and only on macOS (perhaps because NextStep, its roots, used PostScript natively?) but that Linux actually also seems to have better tooling in this space than Windows (where you’re pretty much stuck with Adobe Reader if you want a free solution)

wolverine876
4 replies
16h36m

Preview.app been the best non-Adobe way to use PDF’s for decades now

Where is all this stuff coming from? Why would you say Preview is the best? Foxit? Nitro? Their are endless PDF applications much more powerful and capable, some designed for professionals.

ornornor
2 replies
11h33m

For annotating and adding a hand drawn signature to PDFs, preview really is the best: lightweight, straightforward, free, comes with the OS. I don’t know any comparable app for Linux (or windows although I rarely use it)

nip
1 replies
11h4m

https://simplePDF.eu will be the closest « Preview-like » experience on Linux (and any OS really).

It’s local only (the document you load and data you fill in never leave the browser) and free

Disclosure: I’m the developer behind it

Alifatisk
0 replies
37m

Any plans on implementing redaction?

fauigerzigerk
0 replies
10h6m

"Best" isn't necessarily the same as "has the most features".

I think many people find that Preview.app does everything they ever wanted to do with PDFs. It really is surprisingly capable. It's also fast and far less convoluted than most PDF tools I have seen.

And of course it comes free with every Mac, which often makes it "best" in terms of value for money.

It doesn't help that many PDF editors (including the two you mentioned) are full of the most ridiculous pricing shenanigans.

wpm
1 replies
15h37m

Isn't SumatraPDF a decent program for Windows?

In regards to Preview, I still find it insane that it doesn't have an iOS/iPadOS equivalent. Bits of the functionality are scattered all over the place, usually in ways that don't feel as good as they do on the Mac. Sometimes I just want to open a PDF and leave it open, and not have to do it from Files which assumes I want to do something else with it than just looking.

sphars
0 replies
15h1m

I personally use SumatraPDF on Windows, but it's basically just a fantastic PDF viewer. It does little else in regards to editing/modifying PDFs. Even the PDF viewer in Edge does more.

But for a lightweight, bloat-free experience, SumatraPDF is the way to go.

izacus
1 replies
7h37m

What a bizarre comment - Preview.app isn't even the best PDF software on Apple platforms (it's more likely something like Readdle's PDF Expert).

Alifatisk
0 replies
6h51m

There are of course better softwares than the Preview.app, but that's if you are willing to pay for it. Preview.app is free.

kergonath
0 replies
7h38m

perhaps because NextStep, its roots, used PostScript natively?

And OS X and its successors use Display PDF natively, which is why it is trivial to save almost anything that can be displayed into a PDF file. The PDF stack that Preview.app leverages is a foundation of the OS itself.

asdfologist
6 replies
17h59m

I find Chrome's built-in PDF viewer much snappier than Adobe Acrobat.

noAnswer
5 replies
17h41m

Sadly I had to install Adobe Reader on my father PC again after he had documents* with formulas. Chrome would calculate the numbers wrong. Everything was off by 10.

*To get reimburses from a union or something.

asdfologist
2 replies
17h39m

Whoa, I had no idea PDFs can have formulas.

SturgeonsLaw
1 replies
16h10m

You can embed Javascript in PDFs

fuzztester
0 replies
14h4m
manmal
0 replies
8h51m

I recently had issues with macOS‘ Preview.app and formulas. It’s a nice feature, but probably not widely supported.

aragonite
0 replies
15h1m

If you occasionally need Adobe Reader/Acrobat exclusive features but don't want to install, you can use the free online version of Acrobat. It's pretty decent though it doesn't have all the features:

https://acrobat.adobe.com/us/en/

philsnow
4 replies
18h35m

Does that one have a name? I’m a MacOS transplant and have never gotten terribly familiar with the territory. Thanks!

cmdrriker
1 replies
18h28m

Preview.app which owes its heritage to NextStep. https://en.wikipedia.org/wiki/Preview

philsnow
0 replies
12h58m

... ah, thanks, I misread. I thought GP was talking about a document format in the same space as PDF that was native to MacOS.

tonyedgecombe
0 replies
18h33m

Preview is the app on macOS that lets you view and even edit PDF files.

jwong_
0 replies
18h29m

preview.app

simondotau
2 replies
14h58m

PDF has been a free-as-in-beer standard since 1993 and a free-as-in-speech ISO standard since 2008. The reality is that PDF is open, reliable, useful, feature rich, and widely accepted. It has no serious competitors.

There's a reason why, unlike raster image formats, there aren't any serious competitors. The thing to realise about printed page file formats is that even if you set aside all of the silly "multimedia" and "interactivity" features, there's still a gargantuan rabbit hole of non-trivial features that need to be implemented absolutely perfectly, from kerning to spot color. PDF does it all very well. There's really no scope for a competitor to come along to make something that's obviously better.

izacus
0 replies
7h40m

Especially since there's a PDF/A standard subset for archival, which makes those documents readable decades after any other format would rot.

fuzztester
0 replies
14h11m

PDF has been a free-as-in-beer standard since 1993 and a free-as-in-speech ISO standard since 2008.

Yes. The first sentence of the Wikipedia article about PDF is:

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.

And the last sentence of the first paragraph of the same article is:

PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020. .
whatisyour
1 replies
16h32m

What about Okular?

Alifatisk
0 replies
6h58m

There is no support for redacting text properly on Okular.

rmbyrro
1 replies
17h10m

Foxit Phantom is pretty good

Alifatisk
0 replies
6h57m

It costs, I might aswell pay for Acrobat.

hyperthesis
0 replies
15h46m

Adobe is expert at software standards. They aren't compulsive about control, yet don't give the farm away. The know when to be open and how much. That is how they dominate.

RyanShook
19 replies
21h26m

I have a PDF problem that I thought was simple but has proven difficult to solve and there is no paid solution I’ve found…

I want to forward an email to an inbox, have the email body converted into a PDF, and then email that attachment to someone all automatically. I’ve tried Make, Zapier, pdf.co, pdftool, and a few other tools but have had no success. Has anyone solved this problem reliably?

victorbojica
5 replies
21h16m

If you are able to code or can ask someone, then you should be able to do it with some email api service (Nylas, AWS SES, etc) or headless client that gets the body of the email and convert it to pdf using wkhtmltopdf and then send it as attachment using the same service as before.

Using low/no code tools might be very hard/unlikely

RyanShook
4 replies
20h39m

Thanks, yes I think this is the right direction. Surprised it doesn’t exist as SAAS, I guess demand isn’t there.

geraldwhen
2 replies
19h25m

If you want the pdf to look anything like the email, you will need to render it in a browser and capture a pdf. It’s not particularly hard if you know what you’re doing.

vinibrito
1 replies
13h37m

Any libs to help with that? Thanks.

la_fayette
0 replies
1h15m

A headlees browser like e.g. puppeteer would do the job. I use it a lot for exatly that purpous...

cyanydeez
0 replies
5h55m

I'm pretty sure you can do this with a Office365 & their automation stack Power Automate.

Obviously it's only an option if your org has already sunk deep into Microsoft-of-things (MoT) universe.

toomuchtodo
1 replies
21h20m
RyanShook
0 replies
20h31m

Thanks for sharing!

brailsafe
1 replies
21h8m

Probably depends on the purpose of the pdf and why it needs to be an attachment, but I'd just skip all the steps and print the email since that's more or less what pdf is for. Print it and re-attach or just print at the destination.

RyanShook
0 replies
20h43m

This is what I currently do. I was just hoping to automate the process.

JumpCrisscross
1 replies
18h25m

I think Mail.app in macOS can do that with Automator. At the very least, the PDF emails coming from an email address and forward as attachment bits.

mlfreeman
0 replies
17h55m

Anecdotal bug warning:

Mail.app can "Export as PDF" from the File menu, but I noticed on 13.6 that it exports blank pages if the email is plain text only.

I had to choose to print the emails and then save as PDF from the print dialog.

rqtwteye
0 replies
21h12m

I did something like this 10 years ago as an internal tool for a company. BAck then I did it with Outlook VBA.

reachableceo
0 replies
17h13m

If you need to send an email anyway , why not print to pdf and email the pdf?

karl_gluck
0 replies
21h11m

Google Apps Script can do all of this. Take the email body and put it into a Google doc, then export the doc as a pdf to drive and attach it from there to send.

hkhanna
0 replies
18h11m

I needed this for expensing receipts that come via email. I created an API for it where you POST the email to an endpoint and get back a PDF.

Email me and I’ll give you access for free.

fgonzag
0 replies
21h9m

It seems quite doable but you'd need scripting skills to set it all up. Read the incoming queue, pass it to wkhtmltopdf then pipe the result to the mail command. For windows I believe I once used a java smtp server (apache james) that allowed you to set custom code as an incoming email handler. After that the conversion and email sending is trivial.

davchana
0 replies
15h41m

I use Google apps script for it.

A filter labels a specific email.

A timed trigger runs a script.

Script fetches all emails with that filter.

Script runs in loop. Convers each message into a blob, blob gets converted to pdf. Pdf gets saved in google drive. Email gets label removed.

My code was based on this https://www.labnol.org/code/19117-save-gmail-as-pdf

cookie_monsta
0 replies
19h9m

If you have access to O365 (or whatever it is called this week) this would be easy to do using powerautomate

mrfumier
14 replies
12h20m

But... why? WHY??

Why would I run a docker container, a webserver, start a browser, navigate webpages... just to do some operations on a pdf locally?

A few KiloBytes native program like PDFtk (https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/) does the job perfectly.

I don't understand what is the point of bloating softwares like this. Not even speaking of the very bad consequences for the planet.

ornornor
6 replies
11h38m

A web app makes it cross platform. If you have a homelab, deploy it only once for every client.

And PDFtk doesn’t do annotations afaik which is a huge pain point on Linux (at least for me) because there are no applications that I know of to easily do things that are trivial on OSX like adding text or hand drawn signatures to PDFs. Masterpdf can do it but with a watermark and some limitations.

Maybe it doesn’t suit your particular use case but I wouldn’t say pdftk can replace this project.

subtra3t
2 replies
9h53m

I think edge would be perfect for you

ornornor
1 replies
4h51m

Edge the browser?

subtra3t
0 replies
3h45m

Yes, its pdf editor is very good.

plugin-baby
0 replies
8h29m

no applications that I know of to easily do things that are trivial on OSX like adding text or hand drawn signatures to PDFs

Try xournal++

jraph
0 replies
8h14m

Xournal++ was already mentionned, Okular also has annotations and I think adding hand drawn signatures.

Though I welcome (new) work in this area.

adrenvi
0 replies
10h38m

Firefox now has some simple built-in PDF editing tools. Text and images can be added on top, but existing text can't be modified.

zelon88
1 replies
4h57m

lol, you think sending TCP packets of your 25mb PDF to Google, so they can send it back, so you can send it to Google again in an email attachment, so they can send it to another Google server to another Google user, so that user can download it and upload it to Google, so they can print it on an 8.5" x 11" piece of paper is saving the planet?

You just sent how much wattage around the globe 100 times for what? To print the paper you already had on your screen?

You sound like the kind of people who put their very important network documentation on Google Drive, so when your network goes down you have no way to access the information required to bring it back up. I'd rather have one engineer who knows the ins-and-outs of a LAMP stack than 10 who only know how to provision cloud VMs.

rawling
0 replies
3h27m

How did you get "sending it to Google" from

A few KiloBytes native program like PDFtk
ungerik
0 replies
4h44m

Because PDF is a minefield and PDFtk does not solve all problems on all platforms. You'll learn that if you try to process millions of PDFs in the wild that may or may not comply with any of the numerous specs.

maltris
0 replies
9h5m

Tf are you talking about

kristiandupont
0 replies
10h4m

Because it's a simple (for me as the user) and reliable way to get a UI without having to send my personal files to some server somewhere. What's the big deal?

ikurei
0 replies
7h2m

It may not be right for you, but I can see situations where this would be preferable.

If because of your job you find yourself doing these operations very often, and the ability to do them from several devices with different OSes is valuable, it might be great to throw this on a server.

Or if your work has several people, may be not very technical folks, do them often. I’ve worked in a couple of places where this could’ve come in handy.

Also, it includes an API. Also, being open source, if in the future you’re creating a web app that needs some of these features, you could learn from/copy from its code.

I think this is a good contribution to the world.

I don't understand what is the point of bloating softwares like this. Not even speaking of the very bad consequences for the planet.

I partially share this concern. I wouldn’t deploy this for myself unless I had a very easy way to stop it and start it, but anyway while not in use it it should only be consuming a bit of RAM, and there’s plenty of very efficient hardware suitable for small servers these days.

chaps
0 replies
7h51m

"does the job perfectly"

Um. I work with PDFs a LOT and.. nah PDFtk is pretty weak. It doesn't even do OCR!

This looks like a wonderful tool that solves a lot of problems that existing tools don't typically put together. It's an achievement, and your post is unnecessarily crass.

This is very much the sort of tool that you host internally at a newsroom so journalists don't have to wrangle with software or write code. Like, in that situation, who cares if it's on top of docker? The users definitely won't give a shit.

Please consider that you're not the target audience....

d4rkp4ttern
14 replies
20h35m

I’ll join some other commenters, to add my favorite difficult pdf problem that I haven’t found a ready to use (even paid) solution for: extract key value pairs from a filled form such as this medical claims form:

https://imgur.com/a/EJDi7L7

There are two levels of difficulty: the starting file could be an image (pdf or png or jpg), which is the most difficult scenario. The slightly easier one is where it’s a text-based pdf so no OCR is needed.

I threw this as an image file at google form parser but it did poorly, I.e missed quite a few fields.

Froodle
5 replies
20h10m

Dev here for the above stirling pdf app, Please raise features like this as a feature request github issue ticket and we can try address it in future!

kpandit
4 replies
19h9m

I would do exactly what you have done here if I were the dev of the said app. But with the luxury of being an outsider, a user has expressed an inconvenience and it seems to make sense, then if I were to be the dev of the app here, wouldn't I go and create the ticket in whatever system with a link to this post instead of asking the user of the app to follow the red tape? I know there are places where this is not incentivised so this is a question for your org and not for you.

d4rkp4ttern
0 replies
18h36m

I tend to agree. As an open source dev myself, I avoid asking folks to create issues, as it puts a burden on the user. I’ve see some highly respected open source leads so this, and I’m not faulting them, as I think they’re coming from a good place; it may be a difference of opinion on what’s best practice.

cyanydeez
0 replies
5h36m

This is open source software sir, it needs multiple steps to ensure users actually need these features and are willing to use them.

Sai_
0 replies
17h11m

Not OP. My take is that if the requester can’t be bothered to create a GH issue, it’s likely that this isn’t really a problem for them. An annoyance possibly but has not risen to “pain” levels.

Froodle
0 replies
18h31m

I see what you're saying and for simple features I agree However Without the OP creating the ticket there can be no feedback look on the feature. If i wanted it tested for their usecase, there input and confirmation on if its what they wanted and improvements for the workflow etc.. If I base the whole feature on this comment it could end up only doing half a job. Id rather have that communication loop open!

Closi
5 replies
19h36m

Have you tried Azure AI Document Intelligence?

In theory it's exactly this...

brianjking
4 replies
19h28m

I second this, that or have you tried GPT-4 Vision or Donut?

d4rkp4ttern
2 replies
18h32m

Still waiting for GPT4V but doubt it will do this. Yes I’ve tried Donut and other options but this is a very gnarly problem.

One option is to extract text blocks along with their coordinates (unstructured.io gives this, probably based on another pkg because it’s basically a container for many pigs). Then do the same with a blank template, and you then have an algorithmic problem of matching the filled values spatially with the key locations from the template.

brianjking
1 replies
18h30m

I'm fairly confident GPT-4V will do this just fine, tbh.

You just need to extract each of the elements into a structured JSON or something, right?

I'll try with your example later today.

d4rkp4ttern
0 replies
18h20m

Exactly, the form has filled values in named cells, so we need a JSON of cellName -> filledValue mappings.

Let me know how GPT-4V does!

qingcharles
0 replies
19h20m

I second trying GPT-4 Vision, though they have dumbed it down a bit since launch.

ylk
1 replies
18h50m
dave8088
0 replies
18h15m

Their scummy website doesn’t list their prices in any way I can see. Hard pass.

101008
13 replies
21h15m

I still couldn't find a tool for a difficult problem to solve. I have some magazines in PDF, with layouts in two columns, etc. I want them to be transformed into Markdown. I know, it should identify automatically the two columns, different layouts, etc.

I am not desiring something perfect - I can fix if ther are some errors, but so far nothing has come with a good result.

stavros
5 replies
19h3m

Depending on how much you're willing to pay, the OpenAI GPT-Vision API can definitely do this extremely well.

101008
4 replies
17h44m

I am willing to pay for this, I only have around ~80 files, with 30 pages each (average). Is there a quick way to test this without wasting too much on the code part?

stavros
3 replies
17h42m

Yes, ChatGPT will do it if you upload photos of the page.

101008
2 replies
17h20m

I just tried it with the API and it worked better than expected. Now I need to find to convert PDF to JPG with an API, and find the best prompt to ask GPT to only convert to markdown articles and not pages with columns, ads, etc. Thanks!

stavros
0 replies
17h17m

No problem! There are many Linux programs (some in this thread) to convert a pdf to images, and the prompt will hopefully not be too hard.

I started making app to read our board game cards out loud (with voice) for our horror board game nights (https://boardguru.net) and GPT-4 could read cards that I couldn't make out!

slig
0 replies
2h52m

Now I need to find to convert PDF to JPG with an API

You can do that very easily with a locally installed ImageMagick. ChatGPT can help with the commands needed, but should be just one to convert a PDF to a number of JPGs and a small shell script to run on all your files.

solardev
0 replies
19h1m

I use Briss for this: https://github.com/mbaeuerle/Briss-2.0

It overlays all the pages on top of each other, you the human draws rectangles around the stacked columns in the easy GUI, and then it processes them into pages.

rmbyrro
0 replies
17h4m

I bet GPT vision will do that like cutting a piece of cake. It'll even do the OCR for you and organize the text nicely.

qingcharles
0 replies
20h36m

This is a hard problem. Cut the PDF down so it's only the pages of the article you want and then try feeding it through GPT Pro or Claude?

pklee
0 replies
17h41m

Have you considered marker. Does a very good job of turning PDF into markdown. - https://github.com/VikParuchuri/marker

layer8
0 replies
21h4m

This can be arbitrarily difficult to do, depending on the PDF. This is generally called PDF reflowing. Another approach is to use column-aware OCR software.

jftuga
0 replies
20h28m

Have you tried this (for at least solving part of the problem)?

https://github.com/pdfcpu/pdfcpu

jbverschoor
0 replies
18h57m

The built-in apple preview application does exactly what you want..

It just looses bold etc.

rodlette
10 replies
22h3m

Nice. I've been looking for something like this to self-host, to avoid my partner uploading sensitive documents to random PDF manipulation websites.

Any better alternatives I should be considering?

jftuga
1 replies
20h29m

A really nice, stand-alone command line tool is pdfcpu.

https://github.com/pdfcpu/pdfcpu

rodlette
0 replies
9h21m

Looks great, but my partner needs something more convenient.

It needs to be web based and work on desktop/mobile.

Etheryte
1 replies
21h31m

If you happen to be on macOS, the Preview app does an absurd number of things to PDFs, and it does it well. To be honest I'm always surprised it isn't highlighted more by Apple, it's a great tool that pretty much always just works. You can split files, join them, rotate, add signatures, drawings, annotations, redact sections, etc. The feature list is long, especially considering that by the name of the application you'd think it could just preview files, not edit them.

sumedh
0 replies
16h29m

to be honest I'm always surprised it isn't highlighted more by Apple

Probably because its not so intuitive, I have to google how to use some of the advanced features of Preview.

tony69
0 replies
19h8m

I use pdftool.org which I saw on HN a while back

thephotonsphere
0 replies
21h51m

You can simply use poppler-utils on your on computer? It's a collection of commandline tools for PDF-manipulation. More information can be found here: https://pypi.org/project/poppler-utils/

somethingsome
0 replies
21h6m

I often use PDF Sam (basic) and usually it works quite well and is offline.

https://pdfsam.org/

jbc1
0 replies
19h18m

Does it need to be a self hosted web based tool or do you just need PDF software? If the latter I find PDF Expert to be powerful and nice to use.

cde-v
0 replies
21h19m

Edge is surprising decent for marking up PDFs.

bayindirh
0 replies
20h17m

KDE’s Okular. Works on Linux, Windows and macOS.

If you’re on already macOS, Preview already has you covered.

nirav72
8 replies
10h3m

Any recommendations for a desktop/cli PDF optimization tool that will reduce the size of a pdf? I've tried few and the best one so far is the one that is included in the subscription version of Adobe Acrobat. But I only need it occasionally and is not worth paying $20/month sub.

aidos
4 replies
9h49m

Ghostscript maybe? Depends on what you’re doing but it can downsample images etc.

nirav72
3 replies
9h45m

Yes, its mainly to reduce image size for scanned documents. I'll give Ghostscript a try.

aidos
2 replies
9h9m

Something like this is probably a good starting point:

    ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
From this gist https://gist.github.com/guifromrio/6390547#

nirav72
1 replies
8h46m

Very helpful tip. Thank you! - Ran it through GS and I got 30% reduction in a 50mb pdf file. I think if I play around with some options - such as converting images to grayscale, I might be able to reduce it by another 10-20%.

mijoharas
0 replies
6h59m

I guess it depends on your document, but I'm surprised you only get that much compression. For scanned to pdf documents I often get orders of magnitude.

I'm not at my computer, but try messing with the `/printer` in the above command, there are other options, (possibly `/ebook`?) that control the compression ratio from memory.

senjin
0 replies
36m

I recently found out Mac's built in Preview app can do this. Go to export then change the Quartz Filter to "Reduce File Size"

mpweiher
0 replies
8h16m

If you're on a Mac, (my) PdfCompress is fairly smart about doing a good job.

Haven't updated it in while though... :-/

evrflx
0 replies
9h49m

Did you already give ghostscript a try?

mderazon
5 replies
16h35m

I'm looking for a tool other than adobe pdf reader that lets me upload an image file of my signature to sign a pdf. Most of the tools I found let me draw a signature and I can't draw my signature on a track pad or with the mouse

whatisyour
0 replies
16h32m

Xournal++ does what you are looking for.

voltaireodactyl
0 replies
16h7m

PDF Expert is the best I’ve found that also does this, and while expensive, is a really robust and well done program. PDFpen also has this ability.

thebiglebrewski
0 replies
4h48m

Firefox can actually do this as long as your signature is in an image file.

gpff
0 replies
2h41m

I made myself such a tool: https://pdf.rere.re

It's js only, nothing is sent to the server. It automatically makes the background of signatures transparent. The result is a raster pdf as if you printed, signed and scanned the document. I use it on desktop, not sure if it works well on phones.

ajot
0 replies
15h48m

I use GIMP for that, and if I need it to look like printed-signed-scanned some ImageMagick incantation [0] or https://lookscanned.io/

[0] https://news.ycombinator.com/item?id=30024658

lordofgibbons
5 replies
21h3m

How easy or difficult would it be to turn this into an electron app so that non-technical users can use it easily too?

layer8
1 replies
20h54m

Better use existing applications like PDFsam [0] or PDF-XChange [1].

[0] https://pdfsam.org/pdfsam-basic/

[1] https://pdf-xchange.eu/pdf-xchange-editor/

ornornor
0 replies
11h29m

Why “better use some to big else”?

Froodle
1 replies
20h34m

Dev here, totally could, we dismissed it at first as electron is quite bulky containing a whole chromium instance inside the exe. instead we kept it small as possible for the exe version Truth is its not to hard to port to electron We have plans for a full UI version in V2. We are releasing V1 (SPDF is currently in beta) sometime this month. But have begun work on a V2 port to different language and framework.

cyanydeez
0 replies
5h47m

Quasar.dev provides a full "interoperable" solution to get to electron and others. All the code can be written as regular Vue3, then built for:

- SPAs (Single Page App)

- SSR (Server-side Rendered App) (+ optional PWA client takeover)

- PWAs (Progressive Web App)

- BEX (Browser Extension)

- Mobile Apps (Android, iOS, …) through Cordova or Capacitor Multi-platform Desktop Apps (using Electron)

Might be worth considering if you're going full client.

judge2020
0 replies
18h24m

It would be nice if it were WASM-based. Then someone could host that version of the app and it'd still be local processing.

adamnemecek
4 replies
21h30m

I have not looked into this yet but can someone recommend an application for repairing pdfs? For example, I have PDFs where selecting text highlights a line above or below.

me_jumper
1 replies
21h10m

Try converting it to PDF/A

adamnemecek
0 replies
20h52m

That's not it.

layer8
1 replies
20h49m

That doesn’t sound like the PDF is broken, just that it uses unusual font metrics or line displacements. Tools that could amend this are unlikely to exist.

More generally, the PDF format is too flexible to decide what is “broken” or really is as intended, in many cases. It’s l a bit like asking for a tool that repairs “broken” source code where it’s really just the business logic that is broken.

adamnemecek
0 replies
17h32m

Some ocr no?

zikohh
3 replies
21h32m
kleiba
2 replies
21h21m

What about pdftk?

eyegor
1 replies
17h25m

pdftk has an issue where it corrupts pdfs occasionally in modern windows server versions. Had a weird bug in a random helper service at work that we narrowed down to pdftk mangling documents sometimes. Never looked much into it since it was only a couple of hours to replace it with another tool, and haven't had issues since. I think all we used it for was merging and adding watermark text.

anjanb
0 replies
14h18m

"replace it with another tool" which tool was that ?

ziofill
2 replies
20h51m

it says this started as a 100% chatGPT project!

jwilk
1 replies
20h40m

What does it mean?

monospaced
0 replies
20h5m

From my understanding they mean the code was generated by instructing OpenAI’s ChatGPT (contrary to writing the code themselves).

rozman50
2 replies
21h40m

https://tools.pdf24.org/en/creator

This tools is not open source, but it’s free. Files should remain on local pc. Developers claim that they make money only by advertisement on their website.

cyanydeez
0 replies
5h57m

Great tool. The stirling looked exactly the same, except on a server.

I wonder why it's not open source by now.

Alifatisk
0 replies
19h0m

Wow, it covers almost anything, including redacting text.

pmarreck
2 replies
16h41m

No one commented yet on how this entire app was built by ChatGPT?

sagarpatil
1 replies
15h21m

Yep. That's the most interesting thing here. OP can you elaborate on how you got it to develop a full fledged application?

Froodle
0 replies
9h27m

So the whole app is not made in chatgpt It started like that 11 months ago though yes I made the website and 7 pdf operations with chatgpt as a test to investigate chatgpts power and applications Everything after that has been manual though and basically all the code has been changed by now

karol
2 replies
21h18m

Why can't this be an electron app?

Froodle
1 replies
20h34m

Dev here, totally could, we dismissed it at first as electron is quite bulky containing a whole chromium instance inside the exe. instead we kept it small as possible for the exe version We have plans for a full UI version in V2. We are releasing V1 (SPDF is currently in beta) sometime this month. But have begun work on a V2 port to different language and framework.

eyegor
0 replies
17h19m

As an alternative you could write some automation scripts to handle all the requirements for self hosted install. If you look at oogaboogas text-generation-webui [0] you can see what I mean. Although ease of install also leads to a larger user base, which can be a double edged sword for something as ubiquitous as a pdf app. It's much easier to get people to submit issues than to help solve them.

[0] eg, windows install script https://github.com/oobabooga/text-generation-webui/blob/main...

christkv
2 replies
21h59m

Can it add attachments to pdf files?. Until this year I did not even know that this was possible but a government agency asked me to add files as attachments to a pdf as their website only allowed uploading valid pdf files.

layer8
0 replies
20h51m

You can use Acrobat Reader for that.

alexzeitler
0 replies
21h26m

Have learned about it this year as well

alex_suzuki
2 replies
11h26m

Smallpdf [1] probably deserves a mention here. Not OSS and not self-hosted, but I‘ve used it occasionally and it has always worked really well. When I was running an agency, we inherited their first office – very cool folks.

[1] https://smallpdf.com/

buryat
1 replies
11h9m

damn, that's a huge team

https://smallpdf.com/about

gnyman
0 replies
10h30m

I'm not surprised... I mean just the the specification for PDF 1.7 is what ~1300 pages ?

And then there is 2.0, and all the extensions [1]

And multiply that with the number of implementations.

If the goal is to make something that "always" works, you probably need a big team to keep up with the moving field of various bugs and reimplementations

[1] https://www.loc.gov/preservation/digital/formats/fdd/fdd0000...

Ldorigo
2 replies
11h27m

I'm surprised no one mentioned LibreOffice Draw - it doesn't always work perfectly (I guess it doesn't support some parts of the spec), but when it does, it's by far the most powerful pdf editor I found allowing to do things like move elements around, edit them (as in actually edit, not just annotate), etc. It's cross platform and FOSS.

For page-level edits (rotating, reordering etc) pdftk in the cli (+ChatGPT to find the right incantations) works very well.

dpacmittal
1 replies
10h33m

Yeah, libreoffice draw and inkscape are usually my go to tools for editing pdf.

cyanydeez
0 replies
6h13m

inkscape works well for single pages.

The problems with PDFs I encounter, however, are large scale 1000 page PDFs that compile PDFs from multiple sources that clearly have multiple different types of encodings, fonts, etc.

I'd love to have a pipeline that properly 'shrinks' everything. Not sure thats what this thing does, but it looks like they're moving towards configuring pipelines that could get there.

nf3
1 replies
9h37m

As long as I can fill in forms and add a signature I'm sold. I loved being able to do this on MacOS but now that I'm on Linux I still haven't found any app that can do this.

thebiglebrewski
0 replies
4h49m

Firefox can actually do this as long as your signature is in an image file.

marwis
1 replies
11h15m

Looks like the pdf-lib.js used by the project can do most of the advertised features right in the browser and there is even a wasm build of tesseract out there.

Have you considered making serverless/browser-only version?

Froodle
0 replies
9h35m

Yes our v2 version we are working on is this! We plan to completely migrate functionality to be all client with a server side one for API requests as well

cryingpotato
1 replies
15h38m

This is really neat! Is there a paid product backing this or planned for the future? I'm curious what motivates all the dev hours going into it.

Froodle
0 replies
9h30m

No paid backing, just running on donations at the moment. I am tempted to try add a paid feature for AI integration or something or some high end office features as I have a fair few offices that use this software now. But to be honest I would always want it free and it's just been a hobby

ObscureScience
1 replies
19h37m

What I have mainly have been looking for in the free software ecosystem is a good tool to work with PDF tagging/structure/element attributes.

At work I really have only been able to do the work I need on random PDFs with Adobe Acrobat. It seems strange that this is the case as PDF is now an open standard.

Ldorigo
0 replies
11h26m

LibreOffice Draw can do that (not sure about tagging).

11235813213455
1 replies
20h2m

Does it support adding / managing named form fields?

Froodle
0 replies
19h54m

dev here, Not currently but its a planned feature

zeagle
0 replies
15h42m

Would be really interesting to use this with paperless-ngx to annotate or watermark documents.

tobinfricke
0 replies
19h58m

Probability density functions, presumably. Oh, partial differential equations?

For the document files, I love PDF Studio: https://www.qoppa.com/pdfstudio/

toasted-subs
0 replies
21h37m

Self hosted sites are pretty awesome. Love seeing these here.

tbarbe
0 replies
8h41m

I did a small only-front app for that, it's open source if you want to check it (disclaimer: im not a front end dev, the ui is not good) https://timothebarbe.github.io/pdfModer/

ryloric
0 replies
13h20m

Can this add paragraph numbers? I see page numbers in the README, but nothing on how I can number paragraphs.

nashashmi
0 replies
16h3m

Bluebeam PDF has an amazing Stapler tool. I can have a job that combines various pages of various PDF files and does a few other operations on them. When time for print comes, I run the job and output a PDF. For a kind of work that has to frequently put together various pages of various PDFs repeatedly to send as draft for review, this is a tool that makes life easy. https://support.bluebeam.com/articles/revu-21-revu-configure...

However, I have yet to find an equivalent tool from any other PDF application. And that includes this one.

nadermx
0 replies
18h10m

It's funny to see this #1 on HN. I have a PDF converter site[0] that I did a show hn [1] years back, and have been currently pushing updates too as I work on a entire site redesign since the PDF niche is massive. I'm alleviated to see that some one actually made a package for PDF to OCR[2]. And that they are using it[3]. It will finally make what I was doing less hacky.

[0] https://www.pdf.to [1] https://news.ycombinator.com/item?id=23238862 [2] https://github.com/ocrmypdf/OCRmyPDF [3] https://github.com/Frooodle/Stirling-PDF#technologies-used

lukew3
0 replies
12h58m

Does anybody know of a similar foss suite of pdf tools that runs as a static site only using local javascript? I would prefer that to something like this.

laurensr
0 replies
21h31m

What seems to be missing is an OSS tool to add/remove form fields

c0mbonat0r
0 replies
9h21m

reminds me of smallpdf.com but open source