I have been using this in a CI pipeline to maintain a business-critical PDF generation (healthcare) app (started circa 2010 I think), here is the RSpec helpers I'm using:
https://gist.github.com/thbar/d1ce2afef68bf6089aeae8d9ddc05d...
The code contains git-stored reference PDFs, and the test suite re-generate them and assert that nothing has changed.
Helped a lot to audit visual changes, or PDF library upgrades!
could you not just compare the source (or perhaps even the hash) of the PDF and assert on that?
I use some custom tools for PDF comparison (visual, textual, and perceptual hash) for my personal records/accounting purposes.
A number of the financial and medical institutions I deal with re-generate PDFs every time you request them, but the content is 99-100% identical. Sometimes just a date changes. So I use a perceptual hash and content comparison to automate detecting truly new documents vs. ones that are only slightly changed.
If the document is a legally required disclosure (like a bank's fee schedule for example) then you need to grade that document directly rather than its source code. PDFs are horrible and there is a lot that can go wrong with making them between writing and publishing.
Hashes can change regularly due to metadata. Source checks may also require some filtration or preprocessing before comparison. Visual comparison is the best option here, especially if you have a complex document with multiple third-party components that may change both the hash and source but keep the visual appearance the same.
Are you using singed digests in the PDFs?