return to table of content

Portable EPUBs

ijhuygft776
23 replies
15h54m

Portable epubs? All the epubs I ever downloaded are portable... not sure how it could NOT be the case... not reading an article with a title like this.

SilentM68
8 replies
13h24m

One thing I dislike about PDFs is that dark themes usually don't render good, especially the embedded images, whereas the EPUB format seems to render them just fine. If a new EPUB format is created, I would suggest that they support pagination, since post secondary courses usually ask students to site chapters, pages, etc. Most EPUBs that I've come across don't have pages. The last thing I'd suggest is that the new standard, if created, should incorporated accessibility features, so that the file is readable by screen readers. PDFs are rarely designed with accessibility in mind. Making them accessible is also a gigantic pain to do. The technology behind any new EPUB document standard should have native accessibility support by default. People with print disabilities will thank you.

steve1977
3 replies
12h26m

dark themes usually don't render good

PDFs should not render dark themes at all. PDFs should like exactly like they were produced. So if they were produced with black on white text, that's what they should render, in any circumstance.

o11c
1 replies
11h53m

Or it can just invert the L component of all colors in the HSL colorspace at the very last stage of rendering, which only requires a couple subtractions to do in sRGB.

Unlike the unfortunately common "invert directly in sRGB", this preserves the colors changing only the brightness, and honestly it's pretty good. Colorspace nerds will no doubt complain that there are better colorspaces available, but in practice, most consumer devices implement "sRGB" perceptually such that this works better than fancier methods (which only work for carefully calibrated displays in carefully calibrated rooms).

steve1977
0 replies
10h33m

I didn’t say PDF cannot do it, I said it should not. It defeats the purpose of PDF.

Vecr
0 replies
12h4m

Zathura[0] has a dark mode[1], it works pretty well.

[0]: https://pwmt.org/projects/zathura/

[1]: ^r Recolor (grayscale and invert colors)

starkparker
3 replies
12h31m

If a new EPUB format is created, I would suggest that they support pagination, since post secondary courses usually ask students to site chapters, pages, etc. Most EPUBs that I've come across don't have pages.

From the post:

I think we just have to give up on citing content by pages. Instead, we should mandate a consistent numbering scheme for block elements within a document, and have people cite using that scheme.

The point of a citation is to specifically reference an assertion. Any method of specifically referencing an assertion works.

If anything, referencing by section and paragraph is more portable than referencing by page number. It's more consistent across different print formats of the same text (hardcover vs. paperback, and mainstream print editions vs. large-text or braille editions) as well as different digital formats.

jxdxbx
2 replies
8h11m

this is an issue in the legal world. court opinions accessed only online are cited according to their “page number” in some reporter or another. it’s better to cite paragraph numbers when possible but most American legal documents are un-numbered.

dsr_
0 replies
3h32m

Laws already have problems with HTML: numbered lists are specified in a way which is incompatible with many jurisdiction's numbering schemes, including the US Federal standard.

crabmusket
0 replies
6h48m

It sounds like the legal world can continue to use PDFs. That's fine!

xtracto
7 replies
13h34m

Well... if you actually read the article instead of just the header, you would learn the reason for the need of a portable version.

emmanueloga_
6 replies
13h25m

"portable EPUB: an EPUB with additional requirements and recommendations to improve PDF-like portability."

IMO epub is fine for fiction but not for any sort of technical material. EPUB docs are slow to reflow and the layout is pretty much always broken in some way, specially when there are tables and graphics involved. PDFs are a lot faster to render and navigate, the fixed page size being one of the reasons.

lxgr
4 replies
12h42m

Having suffered through many PDFs on my phone I’d take slow reflow and sometimes broken layouts over no reflow and guaranteed horizontal scrolling any time.

auggierose
3 replies
9h37m

Not sure why anyone would want to read PDFs on their phone.

offices
1 replies
7h54m

Imagination error. My phone has hundreds of downloaded PDFs from emails containing things such as tickets, job specs, pseudo-letters, bills, etc.

Anything where one might wish to read a Document in a Portable Format.

auggierose
0 replies
6h27m

Yeah, I guess. The thing with these is I don't care about the "quality" of a bill or ticket, it's enough if the tax man / concert venue accept it.

Many people advocating for a "better" PDF don't understand the quality aspect of a PDF. I am not willing to compromise on that when reading a book. It beats all other aspects, including the fact that I cannot read it on my phone. Basically, PDF's are a perfect translation of books into the digital medium. Gimmicks and features on top of what PDF can do are fine, but never a replacement, given that books also don't have these features.

lxgr
0 replies
1h54m

I quite often find interesting research papers during the day that I don’t have time to read in the office, and there’s no stable cell signal on my commute, so it’s in a way the perfect reading environment for these for me.

My commute isn’t long enough and often too crowded to warrant pulling out a tablet though. Reading on a single-hand device is ideal, and I prefer physical to physical books for that reason. So why shouldn’t I read research papers the same way? I just want a portable document format for an actually portable device.

wolverine876
0 replies
13h14m

The OP addresses some of those issues.

zwayhowder
5 replies
14h35m

I had the same thought, but wanted to know why the Author thought they were not.

For example, a major issue for self-containment is that EPUB content can embed external assets. A content document can legally include an image or font file whose src is a URL to a hosted server. This is not hypothetical, either; as of the time of writing, Google Doc's EPUB exporter will emit CSS that will @include external Google Fonts files. The problem is that such an EPUB will not render correctly without an internet connection, nor will it render correctly if Google changes the URLs of its font files.

The article raises some interesting ideas. Much like PDF and PDF/A, I would say an EPUB/A standard would be potentially useful.

adamzochowski
2 replies
13h22m

But same font problem exists with PDFs. If font is not embedded into PDF, or rendered into a vector shape that embedded, then PDF will display garbage.

BHSPitMonkey
1 replies
12h57m

Isn't that solved in PDF/A, which the GP was implying could also be done for EPUB?

jasomill
0 replies
12h34m

Yes: among other things, PDF/A requires all fonts to be embedded.

BarbaryCoast
1 replies
10h52m

Thanks for that. I can't read the article, probably because I block WASM (and Javascript) for security. None of my ebook readers have Internet access (for security and for privacy), so none of those internet-only epub files would work for me.

This might be "legal", since XHTML was intended for the web, but I assume Google's using it to collect more user interaction data that they can sell to data brokers.

FWIW, PDF is simply Postscript that's been compressed. As far as I can tell, almost all documents these days are created with Microsoft Word, TeX, or Postscript. I'm lumping things like PageMaker and LaTeX in with the base they were derived from.

Symbiote
0 replies
10h13m

The article is a WASM EPUB viewer. There's a link shown in the viewer to the EPUB file:

https://willcrichton.net/notes/portable-epubs/epubs/portable...

geokon
18 replies
9h14m

It feels like a bit of doomed a project simply bc browsers don't open EPubs. You can link a PDF and while it's a bit of a context switch, the browser will open and display it

Since as described EPubs are basically HTML its kinda dumb browsers don't open them - but good luck convincing the Chrome/Mozilla bureaucrats

I think another discouraging aspect is HTML CSS are so huge and bloated at this point that few people can implement a "reader" for EPUB/HTML. It's basically "go implement a new browser". It makes one think a easy-to-parse markdown (like Djot) with some extra rendering bells and whistles would be a more likely long term solution

My personal interim compromise solution is embedded everything (CSS, svgs, scripts and base64 images) into an HTML file. It's similar to an EPUB. It's a bit bloated and ugly but with a bit of care it works and naturally browsers (and by extension basically every user) can open it

Unfortunately a user has no way to really know "oh I can download and store this web page offline". It'd be nice to have some thing like a .htmls extension that indicates it's an HTML but it doesn't have any external resources.

jbverschoor
5 replies
7h52m

You know.. HTML used to be hyper text. Some links. Add some figures/images, tables.

But then we 'needed' magazine-like design/layout. Still, it was document based, so actually pretty good.

After that, we tried to shoehorn HTML to an application distribution platform. Current css layouts are (finally) more like traditional layout engines for applications.

The last 20 years i.m.o. was pretty much a waste of effort because there was no proper way to distribute (cross-platform) applications, well besides java...

foofie
4 replies
5h11m

You know.. HTML used to be hyper text. Some links. Add some figures/images, tables.

It still is.

But then we 'needed' magazine-like design/layout. Still, it was document based, so actually pretty good.

Styling is not handled by HTML. It's a separate concern assigned to CSS. For convenience HTML offered default styling.

After that, we tried to shoehorn HTML to an application distribution platform.

It's not shoehorned. It's the use case: render documents. A document is a tree of ui elements. It's the same with GUI frameworks like Qt or WPF.

mapreduce
3 replies
4h15m

"It still is."

It still is but you are missing the point of the thread. HTML still is hypertext, some links, some images, some tables. No doubt about that. But HTML is also so much more than that. The spec is a beast. Anyone who wants to implement an HTML based reader has a mammoth task in front of them. It's like "go implement a new browser" like someone said in this thread above.

"Styling is not handled by HTML. It's a separate concern assigned to CSS."

Missing the point again. We know styling is not handled by HTML. The point of the thread was to tell how big of a task it is to create your own HTML based reader. If you want to create your reader like it or not you have to implement support for CSS too and that too is a mammoth spec.

So our only options are: A. Go implement a new browser. B. Use something like Webkit. C. Implement a small subset of the HTML and CSS specs.

foofie
1 replies
4h3m

The spec is a beast. Anyone who wants to implement an HTML based reader has a mammoth task in front of them.

That's true for basically any non-trivial document rendering format. For example, take a look at the PDF spec. Even basic things like parsing the document format is a formidable task. HTML in comparison is a trivial format. The same goes for technologies like TeX or even Microsoft's own Word format, which Microsoft famously had lots of problems supporting. It is a hard problem for all formats, not just HTML.

The point of the thread was to tell how big of a task it is to create your own HTML based reader.

You're confusing some things. A document format is one thing, but a renderer with specific capabilities is an entirely different thing. You're commenting on the document format and the styling and layout system, and now you're shifting the conversation to what it takes to implement a renderer.

Debates on document formats are entirely separate and orthogonal to debates on how to implement renderers. Renderers for the most trivial things are tremendously complex. There are a myriad of good reasons why we're seeing GUI frameworks built on top of webviews in spite of all the complains about the formats that webviews support, and in spite of the myriad low-level rendering frameworks already available.

To understand the poiny, try to think through the requirements list to implement a renderer for Markdown. It's a document format with a half dozen of features. Would you call it trivial?

mapreduce
0 replies
3h46m

You're confusing some things. A document format is one thing, but a renderer with specific capabilities is an entirely different thing. You're commenting on the document format and the styling and layout system, and now you're shifting the conversation to what it takes to implement a renderer.

If you follow the comment you replied to the discussion was about implementing a renderer. So no, I am not shifting the conversation to implement a renderer. The conversation is about implementing a renderer. That it is incredibly difficult to do today with the modern specs is the point.

math_dandy
0 replies
29m

What about option D. Use a WebView. This is exactly what the author did. The point of the proposal is identify which features of a WebView can be used (and which must not) if the goal is to produce nice text layouts in multiple form factors. But the rendering of HTML and CSS, and the execution of Javascript are solved problem.

zerof1l
4 replies
8h52m

It feels like a bit of doomed a project simply bc browsers don't open EPubs.

Not that long time ago, browsers could not open PDFs as well. Now all browsers come with PDF reader written in ASM/JS. I see nothing that prevents browsers doing the same for EPUBs. There exist browser extensions that do exactly this already. Its a matter of EPUB format gaining popularity.

geokon
2 replies
7h2m

I think Google is actively against offline data. It's not aligned with their business interests

My mental analogy is, you can also have offline apps on Android. You can specify this in app manifest. But internet access isn't exposed to the user as a permissions.

Like the author says, Google already injects online fonts into the EPubs they generate. Meanwhile PDF is a battle they've already lost

Shorel
0 replies
6m

Like the author says, Google already injects online fonts into the EPubs they generate.

I didn't notice that before, but now I will actively avoid Google generated EPUB files.

BlueTemplar
0 replies
2h55m

Indeed, and I would add that there's no reason for browser to be able to open PDFs : this sounds as yet another attempt for Google to wrestle with Microsoft over the control of the OS (by having everything happen in the browser instead).

And also probably why we still have to rely on 3rd party hacky browser extensions to be able to save web pages as a single file.

rchaud
0 replies
3h38m

Epubs usually are not accessed/downloaded in a browser. PDFs definitely are, as they are freely shared online, whereas epubs are usually DRM'ed and not freely shared.

leoedin
1 replies
8h2m

I think another discouraging aspect is HTML CSS are so huge and bloated at this point that few people can implement a "reader" for EPUB/HTML. It's basically "go implement a new browser". It makes one think a easy-to-parse markdown (like Djot) with some extra rendering bells and whistles would be a more likely long term solution

This feels like the biggest hurdle to me. The author says "Portable HTML generation principle: when possible, systems that generate portable EPUBs should output portable HTML.". I don't think this is going far enough. If the goal is for this format to be everywhere and repeatable then it needs to be standardised and easy to implement a new rendering engine. Relying on webviews doesn't feel like the way forward. The beauty of PDF is that it is incredibly reliable - a PDF from a decade ago still renders the same today as it used to.

I suspect if an effort like this is to get off the ground, the scope of the document needs to be scaled right back. The subset of XHTML allowed should be very limited. The ability to render a document that looks the same everywhere should be prioritised - fixed layout at a fixed page size first, reflowable second. It needs a standard with a comprehensive test suite of documents + render outputs.

crabmusket
0 replies
7h14m

The ability to render a document that looks the same everywhere should be prioritised

IMO actually this is the question the whole effort hinges on.

If the goal is to replace PDF for the uses that require pixel-perfect rendering on every client just as the designer intended, then this approach is dead-on-arrival.

But if that's not the goal, then that has to be extremely well-communicated by the project, so that people who need that know they need to stick with PDF. Indeed, the project needs to explicitly say that it's not a goal, and that clients should be free to make reasonable rendering decisions within certain specified bounds.

jasomill
0 replies
2h36m

It feels like a bit of doomed a project simply bc browsers don't open EPubs.

While browsers don't provide a convenient UI for opening EPUBs, they should have no problem rendering the chapter HTML files contained within.

In the absence of browser support, writing a server-side EPUB-to-browsable site proxy that adds chapter navigation controls and simple layout options shouldn't be too difficult.

Incorporating the necessary DRM support required to view the majority of commercial ebooks through such a proxy would very likely be legally problematic, of course.

Come to think of it, any form of publisher-approved DRM EPUB browser support sounds like it'd be about half a technical step away from DRM support for web pages in general, which is a horrifying prospect.

foofie
0 replies
5h17m

I think another discouraging aspect is HTML CSS are so huge and bloated at this point that few people can implement a "reader" for EPUB/HTML. It's basically "go implement a new browser".

I don't think that's true. In the very least, you can use a WebView and feed it regular HTML. If the whole industry uses webviews for GUIs, it's hardly a stretch to use one to render Epub docs.

eviks
0 replies
5h55m

There is an singlefile extension that can save a page in a single self-extracting zipped html where you don't need to waste base64 anything, and can unzip to a folder and view images as is without the page

baq
0 replies
4h11m

It feels like a bit of doomed a project simply bc browsers don't open EPubs.

I guess that's why the article is actually an epub opened with a WASM epub viewer :)

Shorel
0 replies
8m

It feels like a bit of doomed a project simply bc browsers don't open EPubs.

If you read the article, then you just did open an EPUB.

diebeforei485
18 replies
9h57m

I personally think PDF's are a terrible legacy format with unnecessary complexities[1] and most uses of PDF's do not involve printing so the typesetting arguments don't make sense to me. For the vast majority of use cases it's far more important to be readable on phone, tablet, and computer.

I was surprised when the author mentioned iBooks doesn't support scrolling view, so I tried it myself. Turns out iBooks on macOS does not support scrolling for ePub files, but it does on iOS and iPadOS. Very strange decision by Apple.

1. https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i...

adrian_b
13 replies
9h22m

PDF is an annoying specification, but there exists absolutely no replacement for it.

I have never seen any kind of technical documentation published in any other format than PDF that is comfortable for reading and searching, even when that is done on a mobile phone.

I do not want a document that changes appearance depending on the device used for reading or depending on its temporary state, like window size. I want a document whose layout has been well conceived by its author and which is fixed, regardless of what I happen to use for reading it.

When I happen to read it on a smaller screen or window, except for trivial text-only documents, I do not want changes in layout, but I only want a smart reader, with comfortable means for fast zoom and pan, and which does not have stupid behaviors (like some Android readers), for instance where scrolling vertically (including Page Up/Page Down) also moves the document horizontally (preventing the easy reading of a column of text).

The traditional recommendations for the maximum width of a text column are good enough, if observed, to ensure comfortable reading even on a mobile phone. Only when the author breaks the traditional typographic rules by making extra-wide columns, the reading on a mobile phone becomes inconvenient.

broscillator
6 replies
7h14m

I find reading PDFs on my phone and even on my kindle really uncomfortable.

On my phone I have to either zoom in or turn on landscape mode (which usually means turn it on globally, I can't do it just for the reader app).

On kindle, a full page has too small font due to so much margin, and fitting the width shows me 80% of the page, and then I have to scroll down for the last 20% and my eyes have to find where exactly I was reading.

baq
5 replies
6h24m

I'm keeping my not-sure-how-old iPad 5 around specifically because it's the device form factor to read pdfs.

broscillator
3 replies
4h39m

That kind of highlights how non-versatile PDF is despite some comments.

However it does sound handy, I kinda want a dedicated tablet for sheet music.

rchaud
1 replies
3h32m

It's plenty versatile. Not everything needs to be phone-friendly. Phone screens weren't designed for reading PDF-size documents. Even so, options exist to reflow the text, view in landscape or pan and zoom.

broscillator
0 replies
3h4m

There is one device which fits PDFs well, an ipad. It can be fairly awkward on laptops and deskptops as well.

view in landscape or pan and zoom.

This is awkward, that's the issue I mentioned above, how annoying is to have to do that if you're reading for a 30-60 minute session.

baq
0 replies
4h18m

You're absolutely right PDFs are super rigid, but that's kinda their point - so with the proper device, like a sheet of paper or a 10+ inch tablet screen it makes sense.

Would I prefer more content to be reflowable etc.? Yes - but with a tablet it isn't strictly necessary, just nice to have.

pseingatl
0 replies
1h4m

Or the Kindle DX, RIP.

crabmusket
5 replies
7h5m

I have never seen any kind of technical documentation published in any other format than PDF that is comfortable for reading and searching, even when that is done on a mobile phone.

Can you provide an example of what you mean? My experience is completely the polar opposite.

adrian_b
4 replies
5h53m

I refer to something like a 3000 page manual of some microcontroller, or the datasheets of some integrated circuits or the specifications of some Arm architecture variant, or the standards for some programming language, e.g. C++ or System Verilog.

These are concrete examples of documents that I might have read during some flights or when waiting for some flight, on a smartphone.

When reading something like a fiction novel, reflowing the text based on the window width may be acceptable.

On the other hand, the navigation through a huge document half of which are tables, figures, diagrams, schematics and graphics is extremely painful when it is in HTML format so the layout changes based on the device and window used and there are no means to jump quickly e.g. to page 1436, then to page 2117. When zoom, pan and scroll are correctly implemented, which unfortunately happens seldom, they are much less distracting than the random changes in page layout caused by rendering as done by a browser.

I strongly dislike whenever a company provides only a Web documentation that is hard to navigate, instead of also providing a PDF manual.

Web documentation may be acceptable for very small documents, but not for most of the current technical documentation, where many thousands of pages for a manual are common.

Perhaps an EPUB format extended with everything necessary to completely describe a fixed page layout might become competitive with PDF, but I will have to see an example to believe it.

For now, whenever I see a book or any other document both in PDF and in EPUB formats, I always choose the PDF variant, because without exception it provides a better quality of the rendered pages.

lxgr
2 replies
4h33m

I work with the same type of documents regularly, and I’d give up both exact referencing and stable rendering in a heartbeat in exchange for something reflowable that I can reliably search in and copy paste from.

adrian_b
1 replies
1h33m

The PDF documents allow reliable search and copy/paste, but unfortunately only when the author of the document has taken care to ensure this. Nevertheless, this usually happens automatically when the PDF has been created by exporting a document created with some Office suite, unless the author has changed the default options to forbid these features.

Even many of the PDFs created by scanning printed documents allow reasonably reliable search/copy/paste, if they had been processed by an OCR.

lxgr
0 replies
1h14m

The PDF documents allow reliable search and copy/paste

Are you sure about that? As far as I understand, extracting text from an ultimately vector-graphics-like PDF heavily depends on ORC-like heuristics on the PDF consumer's side.

The ToUnicode mapping table can help with the glyph-to-codepoint mapping aspect of this, but figuring out the difference between the gap between two letters and two words seems hard.

I've seen bothtypesofissues mentioned in the following article i n t h e p a s t, including in a specification document I use multiple times per day for my job:

https://web.archive.org/web/20220328102205/https://filingdb....

crabmusket
0 replies
5h14m

I accept your points and agree that the kind of documentation you're thinking about sounds like a poor use case for HTML/EPUB. I do not regularly encounter this sort of documentation.

I've been boosting the idea in the OP, but more for things like "your local council's meeting minutes" or "your English class assignment" or "a research paper".

Though I do want to point out that even moderately complex specs, when designed for the web, can work well. For example, the HTML spec doesn't reference page numbers, but has extensive internal hyperlinking: https://html.spec.whatwg.org/

Perhaps an EPUB format extended with everything necessary to completely describe a fixed page layout might become competitive with PDF

I highly doubt this will ever happen, for use cases which require fixed layout. But there are plenty of use cases where fixed layout is unnecessary and inferior.

jxdxbx
1 replies
8h24m

people don’t want fixed layout documents only for printing. they want them because they want to fix the layout of their documents more than they care about small screens.

crabmusket
0 replies
7h29m

Those people are welcome to continue using PDFs, and I really hope that in some utopian future that they will receive a lot of requests from their readers along the lines of 'can I please have a portable epub version too?'

baq
0 replies
9h51m

but but but... if I really need to print something, a PDF is the most reliable times portable route. I guess a multipage svg would work, too, maybe, if exported to a pdf to properly print multiple pages first. (Looking at you, inkscape...)

BlueTemplar
0 replies
3h1m

Not sure about this specific case, but I suspect at least some of these readers might do it for consistency with e-paper devices, where no scrolling is an hardware limitation (very low refresh rates, weak processor, battery savings).

So it seems to be a bad idea to try to have a one-size fits all standard : we're much better off with two digital document standards : one with full multimedia and interactive capabilities (short of networking), and another, a subset of the previous with the limitations like : monochrome, no multimedia, interactivity mostly limited to (still in-document) hyperlinks...

And guess what, we already have two formats that are almost there ! HTML (see also MHTML=EML) and EPUB.

(And of course a 3rd one for physical archival and the rare digital fixed layout documents, for which PDF/A already seems to be decent enough.)

simonw
15 replies
12h51m

Here's some really insightful feedback on this idea from Baldur Bjarnason, who has spent significant time working with various W3C groups relevant to EPUB: https://toot.cafe/@baldur/111819472053623911

Example note: "EPUB originally didn't support remote resources and people put a lot of work into changing. Loading stuff over the network is HTML's killer feature. Blocking network assets is a setback for format adoption, not progress."

starkparker
6 replies
12h14m

Oh for Christ's sake, someone pry Baldur off his cross again.

Blocking network assets is a setback for format adoption

People are standing here telling him that _allowing_ network assets _is a setback for format adoption_ and he's just going to keep pounding this stupid, obnoxious drum of his until he runs everyone off.

almost all of the problems described would be solved by getting OS vendors (Google, MS, Apple) to invest more money in EPUB

That's back-asswards. Google, MS, and Apple don't give a shit about EPUB, they never will, and it's arguable that we're better served with them not buying a seat at that table considering how poorly their "help" has helped web standards, as much of the rest of his dismissive thread helpfully notes.

If he wants money for EPUB standards he should shake the cup at IDPF members who rely on it, and particularly Amazon, to whom he quite vocally abdicated the publishing space 12 years ago.[1]

Barking at operating system companies is nonsense at best and how we wind up with another, even more avoidable situation where the space is held hostage by them at worst. At least Amazon can chuck some goodwill money at EPUB development while continuing to kick its ass up and down the market with MOBI.

(Aside from all this, his dismissals of the "clunky" reading system complaint, citing how EPUB has "too many divergences", only further proves to me how tunneled the vision is of the people involved. To hell with forking or improving EPUB, then, because it can't be improved if that's the attitude of the people most involved with or influential within it. What bloody point is there in the customizability of a format that _nobody_ can effectively build tools for or consume?)

1: https://www.baldurbjarnason.com/notes/amazon-wins/, in which he also admits that he has no idea how to work with IDPF, which is a really great sign of how long things have been going this badly in this space

the_lucifer
0 replies
2h6m

I will go so far after argue that Apple is only one of the major vendors actually adopting EPUB books.

sirsinsalot
0 replies
8h13m

I also don't want my e-reader phoning home (to publisher/author) read time and page turns because the EPUB loads a pixel.

mft_
0 replies
5h6m

> Blocking network assets is a setback for format adoption

People are standing here telling him that _allowing_ network assets _is a setback for format adoption_ and he's just going to keep pounding this stupid, obnoxious drum of his until he runs everyone off.

I don't know the background that you're frustrated about, but I'd suggest that the answer might be: 'it depends' - and it depends on the intended purpose of the format in question. PDF is self-contained, and can be read (mostly) reliably on almost any device with the right software; PDFs having to have internet access to be read or opened would be a bad thing; further, the same goes for most formats - including EPUB (as you say) and audio files, picture files, etc.

dsr_
0 replies
3h45m

It would be quite nice if Firefox opened EPUBs properly instead of requiring the just-good-enough EPUBreader add-on.

I'd value that a lot more than Pocket (which I always turn off).

crabmusket
0 replies
6h57m

I really enjoyed this response, though I feel your points could have been made with a little less personal vitriol. Do you have history with Baldur? I don't ask because I think that would undermine your arguments, I'm just interested why you had such a strong reaction.

Someone
0 replies
8h8m

Google, MS, and Apple don't give a shit about EPUB

https://www.w3.org/groups/wg/epub/former-participants/ certainly shows Google and Apple participated in the working group. Apple also has a book store selling EPUB books and has EPUB readers for both MacOS and iOS. Google also has an app that handles EPUB.

jxdxbx
2 replies
8h28m

My view as a heavy ebook reader: ePubs should be inert data. No javascript, no interactivity, no network resources. Just a fancy text file with some appearance settings all of which the reader can override.

harshreality
1 replies
6h7m

Think about what javascript exclusion means, and all the things a good universal ebook format needs to support. Nicely rendered math? Currently the best option to do that is embedded mathjax (maybe you could pre-build mathml and ship that, but I'm not sure that covers all cases). Graphs or charts? There are nice js libraries for that, while doing it manually means exporting images or svgs. Even static svgs are annoying and brittle to font-size changes without javascript to adjust the svg size appropriately.

Don't confuse what's necessary for standard fiction books with what the format should support.

JS and interactivity are fine, in technical books, reports, or niche fiction.

What I absolutely agree on is that epubs don't need is networking. Resources on the internet get stale after years or decades anyway, so inclusion of any network assets into an epub guarantees that the work will degrade over the years. References can be web links, but nothing from the internet should be embedded.

criddell
0 replies
2h12m

EPUB 3 includes MathML.

https://www.w3.org/TR/epub-33/

zaphirplane
1 replies
10h1m

Allow loading of network resources is not good for security. Surprised this isn’t a worry, tbh didn’t read baldur’s writing

criddell
0 replies
8h13m

But it’s good for tracking what books people are reading. If the history of the internet shows anything, it’s that if surveillance is possible, it will eventually happen.

Lots of governments around the world would love to know what their citizens are reading. Few would be bold enough to go after this directly, but if some company operating in their country has the data then there’s a path for that government to get the data.

bmacho
1 replies
8h7m

"EPUB originally didn't support remote resources and people put a lot of work into changing. Loading stuff over the network is HTML's killer feature."

And it is a feature of books that they stay the same.

Both can be a feature, the ability to change (e.g. they fix something in the cloud), and the disability to change (e.g. you can have it as you bought it).

criddell
0 replies
7h51m

I don’t get the impulse for homogenization everywhere. PDFs, EPUBs, Word documents, HTML documents, etc… all have different strengths and weaknesses and I think that’s a good thing. Never needing an internet connection is a strength of EPUB IMHO.

watwut
0 replies
9h28m

Just about the last thing I want is for epub to stop working offline on my phone, because the damm book needs to download something.

livrem
13 replies
7h42m

I use the SinglePage add-on for Firefox (think it is available for Chrome as well?) to save the current page DOM to HTML as a self-contained file (inlined CSS and data:-URLs for all images) with no dependencies and all the scripts removed etc. It is not perfect, and I do not trust browsers to always remain backwards compatible, but I prefer it to save pages as PDF or as multiple files.

Interestingly one of the few pages I ever saw it fail on was this article on portable EPUBs. Guess it has too much magic going on to make the formatting work. The saved page is perfectly readable, but the style is nothing at all like what the original page was for some reason.

I like how fbreader on Android just displays all books exactly the same, and as configured in the app rather than using any of the styling from the EPUB file. I never noticed that it tried to apply CSS or run scripts included in files and I hope it never tries to do either of those things. Loading external dependencies sounds like an even worse idea and I did not think that was even allowed.

actionfromafar
7 replies
6h30m

SingleFile, right?

Edit:

On that note, what's up with the Firefox Add-Ons?

Currently, they are all setup so that to do something interesting, they need all the permissions.

Which leads to a natural market being created, of bad actors go shopping for Add-Ons they can take over.

Can something be done about this? For instance for this "SingleFile" addon. It needs to access the rendered document in the DOM to be able to introspect and save it all to a file.

But why does it need access to everything? Can't it have just permissions:

- "snapshot DOM once"

- "write to a single file"

gildas
4 replies
5h17m

Author here, I wished SingleFile would use less permissions, but it's unforntunately not possible from a technical point of view. Anyway, if you run some code I've written but are suspicious, you have to trust me or review the code which is open-source.

darkteflon
1 replies
4h59m

SingleFile is amazing - one of my most-used extensions on both desktop and mobile by far. It’s elegant, unobtrusive and it just works. Thanks for making!

gildas
0 replies
4h43m

Thank you for trusting me and for your kind words ;)

vertis
0 replies
3h33m

I use SingleFile ALL the time. Thank you so much for it.

I Perma-web anything that I find interesting, after discovering by going back in my notes that half the bookmarks I'd added no longer existed 5 years later.

I don't think I could do half as good a job if it wasn't for your extension.

I owe you a coffee/beer -- actually i just found your donation page, but still a drink IRL if we ever run into each other at a conference/etc.

actionfromafar
0 replies
1h12m

I have no qualms about you or SingleFile. I used it today, it's great!

I just think market pressure created by the permissions systems is unfortunate, in aggregate. With x thousands of add-ons, bad stuff has happened, and is going to happen. Any improvement to the permissions which could mitigate that at least somewhat, would be nice.

pbronez
0 replies
5h38m

Agree. There are many extensions that I want to use on arbitrary websites, but rarely. This could be handled by the browser locking the extension out from everything, until the moment I manually invoke it, at which point it’s allowed access to the page I’m currently looking at.

Now, maybe that’s how it already works, but I have no confidence in it.

livrem
0 replies
5h15m

SingleFile, yes. Thanks. Did not see it within the edit-window.

I agree about permissions. In this case it looks like it needs a bit more, since it has some options like enabling auto-save after page-load for tabs for instance. Not a feature I have used, but I am sure it can be useful for semi-manually scraping sites.

JadeNB
1 replies
4h56m

I use the SinglePage add-on for Firefox (think it is available for Chrome as well?) to save the current page DOM to HTML as a self-contained file (inlined CSS and data:-URLs for all images) with no dependencies and all the scripts removed etc. It is not perfect, and I do not trust browsers to always remain backwards compatible, but I prefer it to save pages as PDF or as multiple files.

Ha, I think that one of my first HN comments, 10 years or so ago, was how I wanted to be able to save HTML web pages as HTML, not as PDF. I'm sure I didn't explain (or understand) my reasoning well, but it was roundly regarded as a ludicrous thing to want to do. I'm glad to hear that I was just a decade out of sync.

gildas
0 replies
4h37m

Actually, the first release of SingleFile is 13+ years old but it was less popular at the time because Chrome (it didn't yet support MHTML) had a negligible market share. People generally saved their pages in MAFF or MHTML format in Firefox or IE. It was when Firefox abandoned XUL extensions that SingleFile was able to rise from the ashes, because there was once again a real interest in it.

dsr_
0 replies
4h12m

FBReader uses CSS from the document by default; you can turn it off in, IIRC, four stages.

KOreader gives you more control but in a less friendly manner: in addition to choosing specific CSS from several supplied files, you can write your own.

(Also wins for KOreader: excellent OPDS support, and easy self-hosted sync server.)

dotancohen
0 replies
6h9m

Another comment explains why the page is difficult to parse:

  > It's a very well thought through article by the developer of Nota, trying to bring EPUB format up to parity with PDF. It's a serious start and they've already written a viewer. In fact, the article itself is displayed in a browser-based wasm port of the viewer (and looks good!).

Shorel
0 replies
9m

Yes, but this article has an EPUB download button.

This feature makes SinglePage unnecessary for any page using this system.

asimpletune
13 replies
4h29m

This is something I deeply care about, as I'm also very interested in the intersection of ebooks, security, and a LowJS web.

1. absolutely we should have a single-file, portable ebook format, and since PDF doesn't reflow text then it's not that.

2. HTML + CSS in 2024 is capable of reproducing virtually any kind of printed medium, but it can also reflow text.

3. I don't personally think JS should be a requirement, but a lot of conversations break down at this point so let do my best to explain and please understand I'm not simply being religious about this. In my view, an ebook is a book that can change its size in a way that makes sense. Bearing this in mind, I believe that if a reader has JS turned off, this book should work as intended. In other words JS shouldn't be required for it to perform its basic function. However, if there is some necessity to add interactivity or to augment the book, then yeah, why not, use JS. That's what it's there for. However, from the perspective of being a book, if it doesn't work as a book without JS then it's a bug in my view. (for standard ereader capabilities, I think the browser should offer that, but the book itself shouldn't be shipped with an ereader)

4. I think it's a mistake to embed all the styling as that may violate some people's CSP's. It's safer to specify separate styling as resources relative to the HTML, and ebooks should forbid loading resources from a separate domain. In this way, ebooks would always work offline, but it has the added benefit of working online too and would automatically adhere to the strictest possible CSP. This achieves the same goal as offline, but in a safer and more universally compliant way.

5. Finally, just distribute it in a zip file. That's how ebooks already work right?

charlieyu1
5 replies
2h1m

Not a fan of zipping everything.

One thing I missed about working with old .doc files instead of .docx, was that it was very fast to search a folder with hundreds or thousands of files for a specific word. Not possible for zip formats. (I just saved a file in .doc right now and open it with a hex editor and it does contain the contents in plain-text.) It is a problem to search zip formats or pdf files with grep, and I really doubt that the zipping is necessary for text with maybe a few thousand characters.

kimixa
1 replies
1h25m

Zip files don't need to be compressed, if there's a need maybe you can promote adding the option upstream to whatever apps you use (if it's not already there somewhere). Or a .zip equivalent for zgrep.

Though pattern matching in files feels very fragile - simple text patterns would still rely on the text being kept in a contiguous chunk and no embedded data/markup within the section you're searching for...

Though I generally see the prevalence of zip files a result of people assuming that collections of files/streams need to be packed into a single object for "user simplicity", but we already have this with OS support in the form of directories. It just seems people have embedded the assumption that tools and UI handle directories as single units poorly, when that doesn't really need to be the case.

lupusreal
0 replies
1h14m

Browsers don't really have good support for downloading a whole directory.

lupusreal
0 replies
1h15m

Just use zipgrep to search inside zip files? Or just unzip it then use regular grep. I don't think it makes sense to gimp a file format just to make life slightly easier for people using archaic Unix utilities.

WorldMaker
0 replies
1h0m

ZIP is a real simple format under the hood.

There are zip-aware grep tools that you could investigate if they fit your workflows.

There's also nothing stopping you from storing the contents of zips as unzipped directories and rezip when/as necessary. (I wrote a tool called musdex years ago to automate that flow in the context of source control: source control the contents of something like .docx expanded into a full directory structure, but preserve the ability to "double click the Word file to edit".)

As a wrapper of "collections of deeply related files, some which may be text and some may be binary", ZIP is one of the better choices that we have (compare to TAR or MIME envelopes, for instance).

Arelius
0 replies
1h31m

To be fair. Iirc, an epub is stored uncompressed. So you should be able to get a directory of them no problem.

gildas
4 replies
4h23m

5. Finally, just distribute it in a zip file. That's how ebooks already work right?

What about self-extracting ZIP files like this page? https://gildas-lormeau.github.io/ (note that it includes the CSP to make it safe)

asimpletune
3 replies
4h3m

I don't think there's any CSP? https://observatory.mozilla.org/analyze/gildas-lormeau.githu...

In any case, ideally, any ebook solution should have all resources loaded as files relative to the current document, and nothing inline. Like this, the book would be compatible both offline and to be hosted online, without requiring any changes to the book.

The one file thing is cool, but again it requires JS to show anything, so that's not really inline with what I was talking about. In my view, an HTML document should work like a book, and any JS is purely to augment and extend that book if necessary. Those situations are rare though, and most JS is just to give a reader like experience.

I think if someone is going to go through the trouble to host a book, they probably don't mind unziping a file before putting it on their server. The one file thing I said earlier was more about sharing the book, similar to an app package. but once it's on someone's servers presumably it's ready to be read.

Basically .webarchive

gildas
2 replies
3h37m

The index page is protected via CSP. The "bootstrap" page which unzips the page isn't but it could. I just did not include it for some reasons I've forgotten.

The JS is not essential, there's nothing to stop you treating the file as a ZIP file and unzipping it beforehand to view it.

asimpletune
1 replies
3h22m

Ok, well maybe the example provided doesn't communicate the intention, because when I try to 'extract' the page, following the download links, they don't work, except the png. Cool concept though.

gildas
0 replies
3h16m

Unfortunately, the file poses a problem for basic unzipping software (e.g. a file explorer). However, the file is 100% valid with respect to the ZIP specification. You just need to use "true" unzipping software like 7zip, unzip etc. to read it as such.

Turing_Machine
0 replies
2h55m

That's how ebooks already work right?

Sort of, in the sense that any unzip tool will unpack an EPUB file (possibly after renaming it to a .zip extension rather than .epub).

However, it doesn't necessarily work the other way. You can't just zip the files at random and wind up with a valid EPUB, even if that exact same set of files was a valid EPUB before it was unzipped.

The "mimetype" file in an EPUB zip is special. It has to be the very first entry in the archive. It also has to contain the string "application/epub+zip" (and nothing else) and must be stored uncompressed.

A surprising number of zip tools make it hard to do this (e.g., by altering the file order as more files get added to the zip, or making it hard or impossible to store one file uncompressed while compressing the others). Most of the command line zip programs can do this with the proper command line flags, but zip libraries often make it a PITA.

Source: have written EPUB generation software.

Shorel
0 replies
33m

About #3: I would say that usability without JS should be a requirement. We don't need JS, especially in a low-power device like an e-reader.

#5: Why change something that is already working well? The extension is .epub, even if it is a zip file.

mr_mitm
9 replies
13h3m

I'm fully on board with the author's "I want to replace PDF" sentiment.

It's true that running code in the document has some downsides, but the vast majority of people does it all the time in their browsers. And it comes with tremendous upsides. Just imagine large amount of data presented in interactive tables which can sort, filter and export or interactive graphs inside the document. We already use HTML+JS so much, why should we stop at documents? Yes, they can't be printed, but in my observation less and less people even own a printer these days, and I see no reason why this trend should not continue. I bet the future will be mostly living, interactive documents.

It's funny that I just mentioned this in the other thread [1], but I also felt that there is a need for a format that is self-contained and widely supported by standard software (by which I mean browsers). A well-specified open format would be great, but until then I tackled the self-containedness problem with JS and wrote a Python script that zips and bundles all assets and embeds them as a SPA into one HTML file [2]. The focus is on Sphinx docs but it should work in general with all distributed HTML docs.

[1] https://news.ycombinator.com/item?id=39138444

[2] https://github.com/AdrianVollmer/Zundler

jxdxbx
3 replies
8h17m

I understand all this but there needs to be a simple format for just regular books without all this complication. I thought that’s what ePubs were for. What I want it basically an ebook format that is mostly zip files of plain text.

mr_mitm
2 replies
6h4m

What are you missing in epub?

jxdxbx
1 replies
3h59m

Simplicity? A guarantee that the file will be readable in 20 years? Project Gutenberg still treats plain text as the default format for a reason.

velcrovan
0 replies
3h18m

The ePub standard is 17 years old, it consists of HTML which is 31 years old and CSS which is 27 years old, packaged in ZIP format which is 34 years old, and all are still in widespread active commercial use and very easy to write parsers for. I think you'll have problems with the physical media you use to store your plain text files before you ever have problems finding software to read ePub file contents.

fodkodrasz
2 replies
11h27m

It's true that running code in the document has some downsides, but the vast majority of people does it all the time in their browsers.

This probably has to do something with them having nothing to do, as the big companies managed to convince the frontend dev community that the single best thing to generate layout is on the client machine on the fly. Of course they did it so the users will have a hard time selectively blocking the layout scripts from the ad/spyware most contemporary (web)software development is about.

This led us to the point where saving (or God forbid printing!) an article needs a lot of effort in many cases.

My observation is: when I need to go to work on the field, I need printed documents. Printed documents don't need firmware updates, their batteries don't run out, and no, I don't need interactivity in documents.

Self contained HTML is a good - and necessary - step, but interactivity and executable code is not something we usually need in documents, I only saw somewhat legit need for it on corporate abomination of documents (and some teaching materials possibly).

morelisp
0 replies
8h38m

Some of us are equally nonplussed by modern web dev but still quite miss PostScript.

jxdxbx
0 replies
8h16m

Thank you. It’s really frustrating that people want to make documents as unreliable and annoying as the web.

larme
1 replies
5h43m

It's just a fucking book. Don't push your shits like js or SPA or d3 or webgpu to a fucking book. I just want to read it like a dead tree book.

mr_mitm
0 replies
5h26m

Not all documents are books. And I'd appreciate it if you stated your criticism in a more civilized manner.

wolverine876
7 replies
13h7m

It's a very well thought through article by the developer of Nota, trying to bring EPUB format up to parity with PDF. It's a serious start and they've already written a viewer. In fact, the article itself is displayed in a browser-based wasm port of the viewer (and looks good!).

One issue is how precisely EPUB, which is really XHTML, can reproduce layout. What are the possibilities here? The OP's standard is that the document will look "reasonable". The imply that HTML would need new layout capabilities to match PDF, at least for line breaking:

There's two ways to make progress here. One is for browsers to provide more typography tools. Allegedly, text-wrap: pretty is supposed to help, but in my brief testing it doesn't seem to improve line-break quality. The other way is to pre-calculate line breaks, which would only work for fixed-layout renditions.

Also, though the author mentions annotations, I don't see how they intend to implement them.

nine_k
1 replies
3h21m

PDF does not have any capabilities of line breaking. It is a picture format, similar to SVG, only more rigid. That's why it can't have text reflow, etc.

What an ebook format needs is a semantic form of markup, which adapts to devices it is rendered on. HTML + CSS were invented for this goal.

With that, book layout authors should consciously relinquish some control on how the book looks, and hand it to the reader. Slight visual imperfections are a small price to pay for this. Who needs visual perfection should go for a PDF.

This, of course, becomes hard if any interactive stuff is involved. I would suggest that larger interactive elements should open in a dedicated view when needed, and tiny interactive elements should embrace reflow.

kps
0 replies
2h2m

HTML + CSS were invented for this goal.

HTML (with SVG and MathML) is probably fine for most books, but CSS has spent 30 years resolutely resisting basic typography, i.e. default text baseline alignment.

joshjob42
1 replies
11h27m

The author discusses fixed layout epubs. Effectively, the epub can give a default pagination, line-breaks, font, font size, page size, and positioning for images etc., making it render identically on everything (one might optionally omit pagination if opening in a browser but keep everything else). This can be done already in epub3. But that's not ideal, because then it doesn't look good anymore on a phone, etc. Depending on the reader though, you could override the default, but then you have to hope that your reader does a good job of making a nice document. An alternative is for the epub to specify multiple renderings, for various common screen types.

I don't think this is unreasonable as a solution. By all means let's try to get a smart reader, but letting people create defaults for their documents that can be overridden if desired by the user is a good middle ground.

idoubtit
0 replies
9h2m

Indeed, EPUB3 provides all the features that the author wishes. His "portable EPUB" format is just a loosely specified subset. It's unclear if some extra features are included in the format as they are in his "Bene" tool, like the rendering of references (i.e. links with a data-target attribute).

The EPUB3 standard is much more complex than EPUB2 (media overlays, mixing fixed layout with reflowed, MathML…). In my experience the implementations are much more varying, and most of them aren't complete. So a "Portable EPUB" may not render as expected because the reader tool lacks some specific feature. The author also requires full JS support, which I supose does not help with portability.

zozbot234
0 replies
10h13m

Doesn't CSS support layout capabilities for paged media out of the box? An EPUB reader just has to implement good old-fashioned "Print Preview" display mode, and you're set.

zdunn
0 replies
5h46m

Also, though the author mentions annotations, I don't see how they intend to implement them.

It's discussed at the very end of section 8 and all of section 9 that interactive functionality would use web components.

BlueTemplar
0 replies
3h16m

Ironically, the very example the author uses for annotations doesn't work properly for me : on touchscreen Android Firefox I get a link instead of a popup when press-holding.

And aren't annotations (and references) already part of the EPUB specification, and probably even the HTML specification ?!?

Finally, I disagree with the press-and-hold for popup being better than the usual practice of hyperlink anchors, IMHO their jumping around is much less disruptive. (As long as the reader's "return" function is working properly, and/or - for the bijective ones - they provide a "back" hyperlink.)

thayne
6 replies
10h41m

A PDF is a single file that contains all the images, fonts, and other data needed to render it.

A PDF can include the fonts. But it often doesn't, and relies on system fonts. One reason for that is because including fonts in the PDF can dramatically increase the size of the file. In some cases a single font could be larger than the entire rest of the file. I've also worked on implementing embedding fonts in some software that generated PDFs. It was surprisingly difficult to figure out how to get it to work reliably.

PDFs are rendered consistently.

Not as much as you would think. There are several cases where the same PDF will render differently depending on which PDF viewer you use. Usually the differences are pretty subtle, but occasionally there are edge cases that result in pretty significant differences. I've even run into a case where the same version of Acrobat reader will render a PDF differently depending on what OS you are using.

geraldhh
2 replies
9h19m

A PDF can include the fonts. But it often doesn't, and relies on system fonts.

found this out, after 20-something years of consistent pdf renderings, in a job interview because my docs allegedly looked odd :/

the daily wtf ...

pseingatl
0 replies
1h6m

The Arabic glossary of legal terms distributed by the State of California is unreadable unless you open the file in Adobe Reader, search for the name of the font used, download and install the font on your system, close the file and reopen it. I suppose there are many instances of this happening.

jxdxbx
0 replies
8h20m

Yeah, MS Office PDF generation (at least some time ago) did not generate PDFs with embedded fonts, and I’d often come across weird-looking documents where the system is using a font with the wrong characteristics. Print-to-PDF usually avoids this.

EE84M3i
1 replies
9h25m

Is there software that minimizes the fonts by removing code points that aren't used in the document?

adrian_b
0 replies
9h0m

This is a standard feature of the PDF format.

Normally all PDF documents include only the glyphs corresponding to the code points actually used in the text rendered with that font.

That is why you can go for instance to any site of a vendor of fonts and you can download freely a PDF sample text of an expensive font. You can easily extract the font from the sample PDF, but it will be useless, as it will contain only the few letters that had been used in the sample text.

adrian_b
0 replies
9h8m

There are only a few standard system fonts that can be omitted from a PDF file and the document assumes that whatever fonts will be used for rendering match in metrics the traditional Times, Helvetica, Courier, etc., typefaces. Therefore with compatible system fonts there should be no changes in the layout of the rendered document. There are of course examples of system fonts which are advertised as compatible in metrics with the ancient Adobe PostScript fonts, but which nonetheless have subtle differences.

Except for the small number of standard system fonts, for the other fonts the PDF document normally includes only a small subset of their glyphs, corresponding to the characters that are actually used in the text that is to be rendered with that font.

mrich
5 replies
10h26m

Ironically this did not render in Firefox on Android (just the spinner kept spinning) Worked in Chrome.

That said, epubs are great for reading books on mobile. The advantage for pdfs is that they contain highlights/notes, so you can directly import them into Zotero and all your annotations are there. For epub, you have to hope there is a way to export the annotations that are stored by the reader app, and then you have to process them further. Readera is a great reader for mobile that makes this possible. I'm currently working on a script that will convert an epub to pdf, extract the annotations from Readera, and mark them in the pdf. Then I can import the pdf into Zotero, while still retaining the great reading experience of epubs.

Symbiote
1 replies
10h18m

Works fine in Firefox for Android 122.0 for me.

mrich
0 replies
5h6m

Also loads instantly for me now, didn't make any changes.

zozbot234
0 replies
10h11m

There is a Web Annotation standard that could be used to export the notes to.

staz
0 replies
10h8m

It is working for me on my Firefox on Android.

One of the nice benefits I can already experience in his document it the working TOC sidebar which allow navigation in the document. (Compared to classical HTML not PDF)

mwilliamson
0 replies
9h25m

I had a similar problem loading the page on Firefox for desktop with private browsing. It turns out service workers don't work in private browsing, which it seems Bene (the software rendering the page) requires. Switching to a normal Firefox window solved the problem.

znpy
4 replies
7h14m

ePub aren't that great either. PDF might not be perfect, but it's likely the best we have and the best we'll have in a long time.

ePub renders wildly different on my eReader (kobo), my linux laptop (various apps), my iPad and my iphone. And i'm not talking about screen sizes, i'm talking about various elements being rendered largely incorrectly (and there's a matrix of incorrectness across implementations.

PDF documents on the other hands... they render just right, everywhere. I have to zoom and scroll, but I will never have to ask myself "will I be able to actually read this document?" when dealing with PDF.

Oh and by the way... I still print stuff from time to time. Yeah way less than it was needed in the past, but it's still a necessity. Can you even print stuff (sections? pages? selections?) from an ePub?

Bene is designed to make opening and reading an EPUB feel fast and non-committal. The app is much quicker to open on my Macbook (<1sec) than other desktop apps.

This is elitist at best. Claiming something "is fast" on top-class hardware is misleading at best (if not dishonest).

Try and running that on low class hardware (stuff like chromebooks but also laptops from at least 7-8 years ago) and let's see if it's still "fast".

I'm not convinced.

broscillator
2 replies
7h10m

PDF documents on the other hands... they render just right, everywhere. I have to zoom and scroll,

To me it feels the other way around, if I have to zoom and scroll, they render just almost right.

And that almost is extremely important for actual reading. If you're quickly skimming a PDF, sure. But to sit down and read for 30 minutes? One hour? Fuck zooming and scrolling. My kindle just displays a page, and I tap it and it goes to the next page. Can't really get a much better reading experience than that.

znpy
1 replies
4h20m

If I’m reading that long, I don’t need zooming and scrolling on my ipad.

Epub on ereaders work well but only if you’re reading fiction. Most images and almost all tables and charts have been messed up by epub rendering anyway. And ereaders are black and white, so you’re losing information anyways.

broscillator
0 replies
3h32m

right, this goes against what you said about rendering well everywhere, given how you mentioned your ipad. In any other device you will likely need to zoom and scroll constantly.

Tables and charts are also a specific use case. There's no mention of them whatsoever on the website so one can assume this is talking about mainly text.

In other words, you described the only time when PDF are more comfortable: if you have an iPad and you need to read charts/images/tables. Far from the claim of "they render just right everywhere".

arp242
0 replies
2h42m

I have a six year old laptop and regularly read ePubs on it (also my PocketBook e-reader) – it's fast enough. The initial page calculations can take a few seconds, but this can (and is) cached and done in the background. It's not that bad and "time to something useful on screen" is more than acceptable. Large PDFs also aren't exactly fast by the way.

Most e-readers are "low class hardware". ePubs work fine on most of them.

In terms of performance there isn't a clear winner here; both can be somewhat slow at times for large documents, but are also "fast enough" for the common case, even on older low-spec hardware.

I do think the general software ecosystem surrounding ePubs is not quite there yet, but that's mostly a matter of UX and "software that hadn't been written yet". As a format ePub is the clear winner for many (not all) scenarios. I struggle reading many PDFs because "zoom and scroll" that you mention is a right pain if you have to constantly do it (which you often do if you zoom text). Comfortably reader PDFs on my phone or e-reader is basically impossible.

morelisp
4 replies
8h35m

The bar for epubs is so fucking low I have a hard time believing this matters at all. Just last week I bought a book set in the late Middle Ages which managed to transcribe all “þ” as “p”. Until publishers care about that stuff, none of these high-falutin technical discussions change anything.

offices
3 replies
7h58m

I don't see how this relates to the link.

morelisp
2 replies
7h45m

The file format doesn’t matter one bit when the reading and authoring tools are shit and the editors can’t/don’t fix anything. And papers will generally have a lot fewer resources to deal with this than major book publishers, who have been epub-focused for over a decade now and actually make money from it.

harshreality
1 replies
5h50m

Any decent publishing or html editing tools fully support utf-8 by now. It's not the tools.

Publisher and editor laziness may be a reason to be cautious about epubs currently for niche or esoteric works, but that's not the same thing.

I bought a book set in the late Middle Ages which managed to transcribe all “þ” as “p”. Until publishers care...

The book market these days makes it challenging to do high-quality editing up front for republishing niche books in a new format. Publishers try to cut corners, outsourcing epub conversions to people who don't care and don't know what they're doing, or they OCR it, have an in-house editor (who also doesn't have a personal affinity to the subject) give it a once-over (maybe), and release it.

BlueTemplar
0 replies
1h46m

As an aside : Unicode support was still an issue in TeX last I checked, because most of the LaTeX tools don't support it (well, having been made before it was expected).

Now, there are some attempts to fix this situation by Xe(La)TeX and Lua(La)TeX, but since TeX seems to be so much tied to PDF these days, it should probably just be abandoned by most scientific publishing in favor of the likes of GNU TeXmacs (note : it's NOT TeX in GNU Emacs) and HTML with MathML.

reacharavindh
3 replies
10h54m

The thing I wish the most with epub or technically the epub readers is the ability to scribble and hand write notes in them using a stylus and for them to keep them while reading again. I do that with PDFs on my iPad, but have a lot of tech books for which I took manual notes nowhere to be found again - even if I did, they are not inline with what I was reading and thinking.

eviks
0 replies
10h44m

In general, the modern docs should be easily editable, not just allow annotations, since it's easy to preserve the original content/layout

beckerdo
0 replies
6h18m

I agree, I would like an ePub to have a robust note taking and exporting ability.

For instance, if I highlight in Chapter 8 "In 539" [next paragraph] "Belisarius" [next paragraph] "marched on Ravenna" [10 paragraphs later] "In 540 Belisarius entered Ravenna".

I would like to export this with the Chapter header and detailed highlight locations OR just as one sentence with subtle links to the locations.

AlanYx
0 replies
1h35m

I'm on the same page. I convert all my ePubs to PDF because I want to keep my handwritten annotations in-place alongside the text I'm annotating, including things like circled words. Recent Kobos (Elipsa and Sage) take a decent stab at solving this problem while retaining the ePub format and reformattability, but it breaks too easily.

Finnucane
3 replies
4h6m

I see a couple of roadblocks for this. One is that is suggestion of a restrictive subset of HTML for coding seems like a potential accessibility problem, which is to say, you'd have to make your documents less semantically rich. For instance, he seems to be suggesting using. It's already hard enough to get epubs to work right when reading systems lag behind what browsers can support, saying 'let's have less' is not going to make things easier if you have complex content. The problem is not that there is too much html or css, the problem is that reading systems don't support them properly.

Also, most dedicated reading systems (Kindle, Kobo, etc) don't allow javascript, which means your components will not work. That might of course change, but I wouldn't hold my breath for it.

velcrovan
2 replies
3h37m

One is that is suggestion of a restrictive subset of HTML for coding seems like a potential accessibility problem, which is to say, you'd have to make your documents less semantically rich.

"less semantically rich" than what? Web pages? Or less rich than PDFs, which is what he's actually proposing to replace?

Finnucane
1 replies
2h47m

Than the epub standard as it exists. I guess I don't understand what the advantage here is, really. If you want to make an epub file that works universally, you can do that, within the existing standard. If makers of reading systems and software would actually support the full standard, which they currrently don't. If you don't want to use pdfs, don't use pdfs. pdfs get used for a lot of things they aren't actually very good for.

velcrovan
0 replies
2h4m

If you want to make an epub file that works universally, you can do that, within the existing standard.

Yes, that’s what the author is proposing we do.

DeathArrow
3 replies
10h24m

I didn't know that EPUB is based on HTML. I always had the impression that it has its own binary format.

Using HTML as a base has a lot of sense.

simongray
0 replies
9h1m

W3C standards basically always build on top of other existing W3C standards.

arp242
0 replies
2h39m

It's just a zip file with HTML documents and some (ePub-specific) XML files to define metadata, chapters, and a few things like that. I use this "epub-edit" script to edit them:

  #!/bin/zsh
  #
  # Extract epub file to a temp directory, launch shell to edit it, and re-zip
  # it. Nothing about this is really epub-specific as such.
  echo " $@" | grep -q -- ' -h' && { sed '1,2d; /^[^#]/q; s/^# \?//;' "$0" | sed '$d'; exit 0; }  # Show docs
  [ "${ZSH_VERSION:-}" = "" ] && echo >&2 "Only works with zsh" && exit 1
  setopt err_exit no_unset no_clobber pipefail
  
  full=$1:a
  
  tmp=$(mktemp -d)
  bsdtar xf $1 -C $tmp
  
  cd $tmp
  print "Editing $1; press ^D to exit"
  zsh ||:
  
  mv -f $full $full.orig
  zip -f $full *
  cd -
  rm -r $tmp
And then I use vim to edit the HTML files and such.

anthk
0 replies
6h9m

It's just a zip file. Under Linux/Mac/BSD you can trivially write a script which unzip's and outputs the ebook's HTML files into a large text stream and that output can be used as the input of a text mode web browser, allowing you to read ebooks everywhere with just two lines of code.

zvmaz
2 replies
9h11m

The author is a post-doc advised by Shriram Krishnamurthi [1], the author of Programming Languages: Application and Interpretation (PLAI), and one of the authors of Data-Centric Introduction to Computing (DCIC). I am currently reading both PLAI and DCIC and I am truly delighted by the minute care the authors have put into making the books pedagogical works of art. That's true love!

[1] https://willcrichton.net/

verisimi
1 replies
9h8m

the minute care the authors have put into making the books pedagogical works of art. That's true love!

Works of art. True love! That's very high praise.

zvmaz
0 replies
8h59m

I mean it. Both books are free, and PLAI has an interactive tutorial on the language used in the book called SMoL [1] done by one of Shriram's students. The tutorial is not a passive one by any means; it forces you to think and highlights pitfalls students often fall into when reading the material.

This whole ethos on readers learning is in stark contrast to books that feel like the authors show off how smart and clever and profound they are instead of caring about their readers comprehension. I include very highly praised books even on HN.

[1] https://www.plai.org/#direct-links-to-the-tutor

N.B. The author of this post, portable EPUBs, also works on language learning. A whole ethos... https://arxiv.org/abs/2401.01257

xoac
2 replies
5h11m

I love this unreservedly. It's almost an embarrassment that something like this does not exist already. Big thank you to Will Crichton for putting all of this together and actually giving this idea a chance to take hold.

foofie
1 replies
4h31m

It's almost an embarrassment that something like this does not exist already.

To be fair, Epub has been largely ignored and neglected by everyone in the world. Virtually no reader supports rendering math notation, and virtually no significant publisher on earth, including the likes of Arxiv, offers Epub downloads. Commercial publishers force DRM onto every format, which excludes Epub, and non-commercial either stick with PDF or don't care.

julielit
0 replies
4h10m

Sorry, but this is not true. Every significant publisher produces EPUB these days. Most reading apps support EPUB (including apps from Apple and Google), Readium and EDRLab offer open-source SDKs that ease the development of mobile, desktop and Web reading software with strong EPUB 3 support, including MathML. Readium LCP is a DRM for EPUB, especially for public libraries that need an e-lending end date. More, EPUB is much more accessible than PDF for blind people and other people with disabilities. PDF has no interest for ebooks (but for short documents, yes).

BonoboIO
2 replies
7h2m

Wow, I m blown away how fast this site is. It loads instant on my iPhone. Perfect text sizes … really nice.

crabmusket
1 replies
6h55m

You didn't get a huge multi-second loading spinner?

BonoboIO
0 replies
4h25m

Nope. Normally i'm used to surfing the web even with adblockers to get slow loading times from bloated websites with megabytes of useless libs. This was refreshingly quick.

xnx
1 replies
13h5m

8 days ago, 134 points: "Portable Web Documents – An Alternative to PDF Based on HTML5 (2019)" https://news.ycombinator.com/item?id=39036774

crabmusket
0 replies
7h27m

And this current post is exactly what I was wishing for in my reply[1]. Really glad this was posted!

[1] https://news.ycombinator.com/item?id=39037135

harshreality
1 replies
5h37m

I think precisely dictating layout is the wrong objective, although some areas (legal documents, academic papers) are still obsessed with that. Arxiv recently started offering a subset of papers in html, which is a short step from epub.

If quick high-resolution referencing (page x, yth paragraph, zth line) is necessary, I think the way to handle it is to reference a phrase on that line ("the cat ran"...), which the reader can search for. If the search interface is lacking, that's an epub reader failure, not a format failure. Or, if the search option is considered insufficient (it does require typing with a [virtual] keyboard), paragraphs can be numbered—as many ancient works or works in translation already are, because such works have many editions and can't be layout-exact copies of each other.

If paragraph referencing is necessary, visibly styling the paragraphs with numbers helps dramatically. There's no reason it has to be exclusive to high-profile ancient or classical works, like Plato [1].

Classic poetry and plays, where referencing needs to be most exact and fast, already tend to give up any hope of everyone using the same edition, and simply avoid flowed text and then number the lines.

[1] https://en.wikipedia.org/wiki/Stephanus_pagination

foofie
0 replies
5h24m

Classic poetry and plays, where referencing needs to be most exact and fast (...)

I believe that paragraph referencing is, by far and without any contest, used primarily by any text subject to reviews and revisions. This means technical reports and academic papers.

All academic papers I subjected to review were forced to use templates that enforced paragraph numbering. Even though each version of those documents were only read by a dozen readers or so, all papers submitted to those journals had to use the template. This means hundreds of documents (see half a dozen revisions per paper submitted per each edition) had that hard requirement, and this took place for each edition of a single journal.

watwut
0 replies
9h27m

PDFs are rendered consistently. A PDF specifies precisely how it should be rendered, so a PDF author can be confident that a reader will see the same document under any conditions.

And that is why PDF sux for reading on the phone. And why epub is massively better if you want to read articles and books.

upofadown
0 replies
5h59m

You might dislike the idea that document authors can run arbitrary Javascript on your personal computer.

How I feel about this depends a lot on how much I trust the people who created the document and/or the person who sent it to me. I would trust a website I specifically selected with a defined TLS web of trust more than I would trust a random spam email.

When we think about the risks associated with the complex and inherently insecure format known as HTML we tend to assume the level of trust available on the web. If we package up a bunch of HTML in a standalone document then we lose that assumption.

teekert
0 replies
10h36m

As someone who occasionally tries to read scientific literature on their e-reader, which is nice, I can just mail it to my PocketBook account and it shows up, I have a deep hate for PDF. Please let this be a popular thing.

sirsinsalot
0 replies
8h15m

I agree with most of the content of this post. One of the key things for me would be a requirement that figures and diagrams which can be expressed as SVG should be.

Images, limited to things that need grid-of-pixels representation like photographs, should be limited to that.

ryukafalz
0 replies
3h55m

I did a double-take when I reached this part:

Therefore I decided to build a lighter EPUB reading system, Bene. You're using it right now. This document is an EPUB — you can download it by clicking the button in the top-right corner.

Because, reading this on a desktop browser, I didn't even notice until it was pointed out. It's more obvious on mobile because the header takes up more of the viewport, but it otherwise behaves pretty much like a normal web page.

This is probably a good thing.

For what it's worth, I didn't see (or at least didn't notice) a spinner when loading the doc for the first time like some other people in the comments reported. I did notice it on my phone, but it went by pretty quickly. I'm not sure if that's the WASM program loading and if it only happens the first time you load the page.

nmz
0 replies
11m

* PDFs cannot easily express interaction. PDFs were primarily designed as static documents that cannot react to user input beyond filling in forms.

I am glad about this, I do not want to download a document and it require any input. a document should be a document, nothing more. If I'm getting a book to read (pdf), I expect a book, not a webapp.

gmuslera
0 replies
4h33m

Another standard? (https://xkcd.com/927/). It is not like the publishing world will switch to it overnight, a lot is tied to the devices they sell, so it might be not enough motivation. It is not simpler to convert epubs into epubs with all remote content embedded and relinked?

Another single file book format that used to be popular many years ago was chm, that had its own security problems. Maybe adding the possibility of executing (js) code is not the best for something that should be mostly static, and css be used to enable some level of safe interactivity.

eviks
0 replies
10h50m

Commendable effort of trying to get rid of the ancient paper-based legacy in the digital world that is PDF

Though I'm curious whether the clunky old-but-still-living HTML (especially in its ugly XML variety) + CSS are the right foundations for the portable format of the future? Since the author has also developed the whole new document language would be nice to read a more in-depth overview on that subject. Or why limit to the ugly duckling of JS in the future when WASM exists?

content by pages. Instead, we should mandate a consistent numbering scheme for block elements within a document, and have people cite using that scheme.

that's indeed the proper and more precise approach, though we could still have those "fixed layout epub" pages as a backup coordinate system

emayljames
0 replies
10h16m

The download of the page epub dispays out of the viewport on google books app.

Bene seems to be in alpha stage.

czierleyn
0 replies
9h43m

When the IDPF merged with the W3C a couple of years back they tried to develop a new standard called PWP, Portable Web Publications, which was supposed to be a new 4.0 version of EPUB, as far as i know. But there was much resistance from the publishing community and the project was shelved a couple of years ago.

See: https://w3c.github.io/dpub-pwp/publishing-snapshots/FPWD/Ove...

crabmusket
0 replies
7h6m

There's two ways to make progress [on document aesthetics] here. One is for browsers to provide more typography tools. ... The other way is to pre-calculate line breaks, which would only work for fixed-layout renditions.

The third way is to develop non-browser clients, like the author's own Bene. While it currently uses Tauri and therefore the system webview, there's no reason it should always do that, or that another client couldn't be developed with a focus on typography.

Want your documents to look super nice with all the kerning and line breaking your heart desires? Get a proper reader app.

Just want the content? Sure, open it in a browser or a basic reader.

aabdulllah
0 replies
1h29m

p.s. sorry I am new to engineering.

aabdulllah
0 replies
1h29m

Hi, I was wondering if someone could explain what software solution I need to create and implement in my calculator for it to have a fast response time when calculating. How does this work, in other words what enables it to work so fast?

SamBam
0 replies
58m

One thing I'm very interested in, as a grad student who has to consume a huge number of PDFs, is whether there are good tools for converting existing PDFs to portable EPUBs or HTML documents.

If I use, for instance, CloudConvert [1], I generally get a document that gets flowing text roughly right, but still interrupts the text with page numbers and book titles (that were originally at the top of each page) and includes additional bizarre line breaks, etc.

Every so often I wonder if this is an LLM problem ("please reformat the following text to...") but I think that one shouldn't reach for an LLM for these kinds of things.

1. https://cloudconvert.com/pdf-to-epub

BlueTemplar
0 replies
5h39m

See also : "The decades long quagmire of encapsulated HTML" (2022) :

https://www.russellbeattie.com/notes/posts/the-decades-long-...

(Which still hasn't been posted as a "news" it seems... should I just submit it myself ??)