PDF is a fabulous format. I mean, it’s an awful format in so many ways, technically speaking, but the net effect of having a self-contained static file in your custody stands in blissful contrast to the user-hostile dynamic/SaaS website that can be taken away at a moment’s notice. PDF/A is the true PDF - it strips out most of the dangerous cruft.
Anyway, if you like weird PDF hijinks, here’s a polyglot PDF/A CSV file that is also its own original soundtrack as a polyglot Amiga soundtracker mod:
PDF is an executable file. Many people are worried about running Javascript but still use PDF files without problems.
Yeah but the javascript can only do things inside the pdf.
What javascript can escape my browser, (edit: or an HTML page) for example?
XMLHttpRequest to send anything the site knows anywhere.
And row hammer, to breach the sandbox.
https://en.wikipedia.org/wiki/Row_hammer
Row hammer is an exploit. It wasn't "by design".
While that may technically be "escaping the sandbox" it's a different case, because it was never meant to work, will be fixed and often is fixed.
Almost every "escaping the sandbox" is due to some kind of bug.
Sure if the PDF standard exposed a "globalThis.runBlobAsNativeExecutable" function it would be worse, but it is still escaping the sandbox.
Are the non-browser PDF readers more vulnerable? Do most even execute the Javascript?
I would expect so simply because browsers are fairly hardened pieces of software. Adobe Acrobat is decently hardened but it seems to be far behind browsers.
It is worth noting that Chromium and later Firefox both added PDF viewers that live inside the browser sandbox. They are essentially web-apps that render the PDF. When I worked at Google they strongly recommended using Chrome for opening PDF files because they felt much more comfortable about its security and sandboxing than other PDF readers.
On another perspective is that you are likely browsing the internet anyways. In fact you likely got the PDF by visiting a website. So you have already exposed a huge attack surface (your browser) to a possible hostile adversary. It is better to expose them to the same attack surface again (plus whatever security the PDF reader itself provides) than to give them a fresh new attack surface.
It is not about JS. Look into BadPDF as an example.
Famous last words :)
For better or worse, the years I spent working on Preview for Apple (and PDFKit) I felt bad that our (Apple's) PDF implementation was far short of Adobe's.
Radars would show up with PDFs attached, "Preview Does Not Display 3D Image in PDF Like Acrobat" or similar. And I would feel so ... inadequate.
PDFKit could render and capture basic annotations ... and that was about it. We could show you forms, allow editing, but if the PDF had Javascript that would add two fields and put the sum in a third field I had to shrug and say, "Oh well." The effort of hoisting a JavaScript interpreter/runtime was beyond my skillset anyway.
But then I kind of came to see our subset of PDF support as a kind of feature. It's true, we left out the kitchen sink. Adobe was/is clearly interested in putting everything into PDF.
And I mean, as pointed out here, at least you could open a PDF in Preview and not worry about any Javascript executing. ;-)
If it makes you feel any better, Preview is by far the best PDF viewer and editor (I use it for signatures and adding text) I've ever used. I like that the PDF previews in Finder are instant and accurate. I like that it shows as much PDF and as little UI/menubar as possible. I like that it never asks me to upgrade or log in. The search tools work well. I can stitch PDFs together (if I google how to, always forget) and pull certain pages out as their own files.
For all of the PDFs I've ever encountered, Preview has been sufficient and capable. Thank you for your hard work!
And somehow Acrobat (current paid version from Creative Cloud) is the worst PDF form filing option.
Seconded. At least most pleasant to use for most things, and never balked at anything I needed to see, fortunately.
Thank you, thank you, THANK YOU for not having put all that cruft in, and by Apple's sheer size, effectively discouraging many from producing and circulating those abominations.
Adobe has an awful track record of security (how many exploits in the past 25 years were in Acrobat (not the PDF spec, the actual Acrobat software) and in Flash?) but PDF is an amazing gift to the world, and, thanks to people like you, effectively safer than how Adobe designed it :))
Unfortunately I have the full Acrobat on my work computer, mandated by my employer, sigh, but that's another story.
When I ordered an official PDF copy of my college diploma, the order form had an option to enable "tracking" in the PDF file. Sure enough, when the recipient opened the PDF file (and when I tried it myself on a different machine), I got a notification from the company that generated the PDF...
That's horrific! I had no idea that was even a feature of PDFs!
PDFs are roughly on par with web pages feature-wise, including JavaScript or other actions that execute on load. Adobe did this, of course, to stave off the competition from the early web. Nowadays, PDF readers disable most of that by default (if they even support it).
Is it really executable on the OS? Doesn't it require a native application to run it on an OS?
No, they are not executable by the OS (generally).
Formats are on a gradient between "completely code" and "completely data" and PDFs are quite close to the "completely code" extreme'; I guess this is what the parent meant.
“ PDF is a fabulous format”
I will never forgive the pain PDF caused me when I worked on a project to parse millions of PDF files from various sources. Just reconstructing paragraphs was a huge effort not even mentioning parsing tables. I think we should do better for something that’s basically a standard. PDF manuals also suck big time.
PDF is supposed to a be a printer format, not a word processing document format. While I too would love to nail down a PDF subset to be a standard (for example requiring the accessibility tags that make text extraction easy) perhaps trying to create a hybrid format, one that satisfies both printers and resizable windows, is already an impossible goal.
(I've always had to keep my love of PDF a secret from fellow nerds. But here's another secret, I like printing documents out from time to time.)
I really appreciate what PDF can accomplish, but I also really dislike that it turns into a black box. There really ought to be something that can describe a document structure and also describe document layout in a durable and portable manner. In the range of XML/JSON <-> HTML+CSS <-> PDF <-> PS <-> RAW, it really does feel like there's something missing between HTML and PDF.
And it can't be LaTeX, because the document shouldn't be a programming language at all. "The document is a program" has proven itself to be a terrible scheme overall.
PDF includes optional document structure information. Most PDF creation software chooses to not generate it, though.
ePub is kind of trying to be that? Or maybe that hews too close to HTML.
It can reflow but tries to paginate HTML ... the way printing a web page tries to paginate HTML, ha ha.
I wonder a bit if we wouldn't have a easier time extracting data, resizing pages etc if we sent HTML files instead of PDF. Are even half of PDFs printed at all?
The "in your custody" part is important, when Amazon starts yoinking books from your account.
https://www.nytimes.com/2009/07/18/technology/companies/18am...
I buy all my books paperback, even if I listen to them on audible, for posterity’s sake.
Do you buy them after you’ve finished listening?
Does it anything else? Maybe pwn me via my PDF viewer?;)
It contains Bitcoin hashes, rendering them one by one as it mines them.
about pdf/a... until recently there was not even an easy way to figure out if pdf is really pdf/a; now there is (verapdf) and it's crazy complex piece of software
and maybe I'm wrong but the only way to convert arbitrary pdf to pdf/a with open source software is to convert it to postscript and back with ghostscript - which is affero licensed... with all the possible problems it entails. (there is old version that is just gpl, works on most pdfs but is 15 years old or such.)
i needed to deal with pdf/a in a previous job... was not fun.
You could use the pdfium library as an alternative to Ghostscript.
As someone working in the graphic industry, I'd say PDF/X is the true PDF, but ymmv. :-)