return to table of content

Standard Ebooks

acabal
66 replies
1d

Editor-in-Chief here, happy to answer any questions!

Of interest might be my blog post on how SE runs on a small VPS using classic web tech: https://alexcabal.com/posts/standard-ebooks-and-classic-web-...

(This post is slightly out of date as there is a database now; but it's used for managing Patrons - and soon a cover art listing and approval system - not for serving the actual ebooks, which are still served as described in the post.)

Our volunteers have spent the last few months preparing a few notable books published in 1928 to be released today, Public Domain Day. Those are the top 5 books in the ebook list, starting with The Mystery of the Blue Train. Check them out!

We welcome new contributors if you'd like to work on producing a new ebook. In the next week we'll also have a brand new cover art database launched, so if you'd rather help by cataloguing new cover art for future ebooks, get in touch at our mailing list!

devashishp
14 replies
21h24m

I’m curious, why do you have a policy against hosting religious books?

weijiacheng
9 replies
21h7m

The site actually hosts several "religious books" (try filtering by the "Spirituality" tag -- I've even produced several books on religious topics myself for SE). What it doesn't host are "Religious texts from modern world religions" (what some might call "scriptures," e.g. the Bible or the Quran) which is a much narrower category than "religious books."

As a religious person myself, I actually think this policy is very sensible. Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best." That quickly enters very murky theological territory, where one side of a given religion might push for one particular translation, whereas another side would push for another translation.

To give the Bible as an example, Catholics and Orthodox Christians include the deuterocanonical books (e.g. Tobit, Judith, Sirach) in their canons whereas Protestants exclude these. Would the SE version of the Bible include these? Some American fundamentalist Christians claim that the King James Version is the only valid English translation of the Bible, whereas the Revised Version (also available in the public domain) is based on more reliable Greek manuscripts. But some conservative Christians reject the Revised Version and its descendants based on certain theological premises...

Do you catch my drift? IMHO it's very sensible for SE to avoid these sorts of debates entirely by sticking to books where you could argue (with some degree of handwaving) that there really is a "best version" :)

devashishp
5 replies
13h47m

I think that makes sense, but it still seems a bit arbitrary, I don’t see bookshops having these issues

pasc1878
2 replies
7h24m

Part of the issue would be that the nooks are translations and the copywriter data would be from the translation date.

So modern versions of e.g. the Bible could not be in Standard Ebooks. So easiest to not carry any translations.

Bookshops have no problem with this as part of the purchase price will go to the copyright owners of the translation.

mahalex
1 replies
7h17m

Modern versions of e.g. Tolstoy's "War and Peace" could not be in Standard Ebooks. So easiest to not carry any translations?

weijiacheng
0 replies
3h35m

One of the funny things about Bible translations is that more modern translations are based on older manuscripts than older translations, due to advances in archeology. SE can't carry any translations that incorporate the insights of the Dead Sea Scrolls, and having access to some of the oldest Hebrew manuscripts is a pretty big deal when it comes to translating the Tanakh.

It's true, modern versions of War and Peace can't be hosted at SE, but those modern versions generally don't reflect revolutionary leaps in archeology :)

weijiacheng
0 replies
13h2m

Yes, bookshops will sell one version of the Bible to Catholics, another to Protestants, another to fundamentalists, another to progressives, etc. :)

In contrast, part of the SE editorial philosophy is that it tries to host the best (based on academic scholarship, translation quality, academic acclaim, etc.) version of each text available in the public domain, which excludes that "something for everyone" sort of play available to a commercial bookstore. You could rightly argue that this is losing something (it's good to have multiple translations to compare if you're reading a text for critical purposes), but the SE editorial philosophy avoids a certain amount of confusion and clutter for the general reader. So there's a deliberate (you could call it "arbitrary" in some sense, if you wish) tradeoff being made here.

opminion
0 replies
10h2m

US Barnes & Noble can have a few meters of shelves with different versions of the Bible, and a buying guide. It is quite striking if you are not used to it.

azangru
1 replies
7h18m

Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best."

Is there a technical reason to disallow multiple translations of the same text? I can see on the "wanted ebooks" page a number of translated titles[0]; so the project does seem to make editorial decisions about which translations to work on. Obviously, where one translation exists, there may be others that have other advantages.

[0] - https://standardebooks.org/contribute/wanted-ebooks

robin_reala
0 replies
5h7m

We try to pick the “best” translation that’s in the public domain in the US. Quite often, that’s a single translation unfortunately, but if there are multiple we do try to evaluate them from a readers point of view.

mahalex
0 replies
7h18m

Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best."

The site already hosts a number of works that were originally written in languages other than English, and yet it had no problems making an editorial call about which translations of those texts are the "best." The obvious solution would be to just allowing multiple translations of foreign-language books.

trillic
1 replies
21h19m

I'd imagine that if they host one religions books, many more religions will come out of the wood work and demand their books also be included, leading the site to be largely religious texts.

krapp
0 replies
21h6m

Numerous sites, platforms, stores, etc. host religious books, and that has never happened.

pard68
1 replies
21h5m

My thought was that many/most religious works are public domain and are already readily available elsewhere.

kyawzazaw
0 replies
16h36m

Actually all of what SE has now has content on different sites

darkflame91
9 replies
21h58m

I have a suggestion: You could optimize the website to be easily readable and navigable on the Kindle's web browser, and recommend it as an option. I've often found it to be the easiest way to get non-store books on my Kindle. I've also noticed that cover images are handled correctly when the ebook is downloaded straight onto the device, with no need for a separate image file.

A hurdle for this though, is that building a good website for the Kindle browser is a pain, as the browser's support for various html/css/js features and standards is all over the place, with no debugging tools available.

acabal
8 replies
21h42m

I believe our website does have some basic Kindle browser support. The problem, as you noted, is that Kindle's browser is terrible.

I say the same thing in every ebook thread: On a purely technical level Kindle is a terrible ereader designed by people who seem to hate books. Buy almost anything else.

donw
5 replies
21h33m

Recommendations?

meristohm
3 replies
21h9m

Kobo, with either stock OS or KOReader (I use this, in part because the font size can be easily increased for my daughter who so far needs text larger than stock) or Plato.

willdr
2 replies
17h41m

Is the build quality and backlighting as good as the Kindle? And do they have a seamless option (no notched screen)?

themadturk
0 replies
14h42m

I no longer have a Kindle to compare, but I'm very happy with build and lighting on my Kobo Libra 2. I've used Kindles since the Kindle 1, and there are some Kobo things that I don't like as well as Kindle, but it's a better-than-decent e-reader and I'm glad to be out from under Amazon's thumb.

femto
0 replies
7h46m

I've been happy with the half dozen Kobos in my house.

Weirdly, about half of them have developed a problem after about 5-7 years, whereby they intermittently stop charging. Replacing the battery doesn't seem to fix it. Might be a problem with the soldering of the USB connector to the PCB?

As a bonus they are Linux based, and you can do fun things like replace internal SD cards with bigger ones, login using telnet and install new applications.

AB1908
0 replies
21h30m

I imagine the Kobo is high on that list.

deadly-penguin
0 replies
7h0m

A jailbroken kindle is okay - they make an adequate PDF reader and they can be found easily for less than the alternatives, at least in Britain. I do agree they're somewhat poor when used as intended.

They're also quite a nice embedded ARM Linux machine for a lot less than I could make one or buy one from elsewhere, but I suspect that isn't the core market for a kindle...

Machado117
0 replies
6h9m

What’s the problem with the kindle? I use an old paperwhite and I have no issues reading epubs in it that I send by email

noitamroftuo
6 replies
16h1m

It says on each book page "Compatible epub — All devices and apps except Kindles and Kobos." - but i think this is incorrect bc epub is now the preferred format for Kindle.

acabal
3 replies
15h48m
boznz
2 replies
14h35m

I email epubs to my kindle frequently and they open and read just like any ebook. I last tried this a week ago and it was fine.

acabal
1 replies
14h8m

When you use 'Send to Kindle' to send an epub to your device, you are not reading an epub. In the link above, it mentions how Kindle converts epub to an Amazon format before allowing your device to read it. Amazon's formats on the whole are inferior, with poor rendering capabilities, and an automatic conversion means all bets are off in terms of what the ebook will look like.

Kindle will not natively support epub until you can connect it to a USB cable, transfer an epub using a file manager, and it does not get secretly converted.

boznz
0 replies
13h9m

I did not know that and does kind of explain one book with pictures that looked a bit weird. On the whole though, text epubs are totally readable as I get most of my books from non-amazon sites such as smashwords and email them to the kindle to read.

nacs
1 replies
15h36m

Kobos also support epub.

Almost all my Kobo books are EPUB and work great.

Pet_Ant
0 replies
10h26m

Real EPUBs can crash Kobos and you need to specifically reformat them with a plug-in in Calbre. It may be a recent update that broke it, since I used to have less problems.

growingkittens
3 replies
22h4m

I see that you use public domain images for books - do artists also contribute work from scratch (with an appropriate release)?

acabal
2 replies
21h44m

Nobody has offered as of yet, and if someone did I think the quality would have to be extremely high for me to consider it.

growingkittens
1 replies
21h32m

Do you happen to have a wishlist of artwork or a particular project that would benefit from custom artwork? I would like to contribute art to the project, whether it ends up used or not. I used to work as a digital artist professionally.

acabal
0 replies
21h14m

Sci-fi works are the hardest to find cover art for as naturally there is zero public domain sci-fi themed fine art. If you can paint in a fine art style, contact me via email and let's chat.

NelsonMinar
3 replies
1d

Thank you for Standard Books!

I remember when Manybooks used to be what you want. But quality dropped precipitously with self-published new novels, I suspect some money is changing hands somewhere.

What happened to Manybooks? Does Standard Books have a plan for avoiding that?

boznz
1 replies
14h17m

Good to know that as a self-published author myself that the quality of any site is going to drop as soon as I put my book on there.

Every other book I read now is by an author with NO rating, I have read six this year, none were memorable, or my cup of tea I will give you that, but two of the four- or five-star offerings on Amazon were just as bad. As they say, if you don't open an oyster, you will never find a pearl.

TheCoelacanth
0 replies
1h27m

If you want to go to the work of creating a curated selection of high-quality, contemporary, self-published, public domain books no one is stopping you.

That's not the niche SE has chosen to target. You can't expect them to serve every possible use case.

acabal
0 replies
23h57m

I don't know anything about Manybooks' history, sorry.

At SE we focus exclusively on US public domain titles; that's one of the major philosophical points of the project. The other major point is a high quality standard, so it's in our best interest to keep pursuing that. SE became known due to its quality standard, not because it's more free ebooks. Therefore if we strayed from those points then we'd be just another free ebook site, of which there are no shortage.

Quality is also why we reject self-published books that have been dedicated to the public domain, as those are typically low-quality content to begin with. (Though I wouldn't call every single book we host "high quality content" in the sense that each one is up there with Shakespeare. But books that have survived a hundred years tend to have survived because they're not slush.)

pauloxnet
2 replies
19h18m

Hi Alex, I shared the SE link here to help with donations and I hope it's working.

Thank you for your beautiful project.

For a few years, every January 1st, for public domain day, I have been promoting SE on social media, the thread on Mastodon is the one with the most involvement. https://fosstodon.org/@paulox/111680544393923401

It would be nice to have an SE account on Mastodon that posts about every new book published, since IMHO it's the social network more aligned with the spirit of SE.

squigz
0 replies
1h58m

For what it's worth, they have an Atom feed - https://standardebooks.org/feeds/atom/new-releases

acabal
0 replies
16h52m

That's great, thank you! We've had various people ask about us getting on Mastodon but frankly I really dislike social media and have only the vaguest understanding of how Mastodon works.

If we did, then someone would have to volunteer to run the account, and also the account must be able to delegate posting powers to another user without exposing the account's master password (like Tweetdeck or Facebook are able to do). If that's possible and you're interested in helping, please send me an email!

krapp
2 replies
21h11m

Inevitably, like everyone who rejects PHP frameworks because "PHP is already a templating language", you just wound up reinventing the framework anyway.

I'm not complaining - It's just, there's a reason everyone goes for the existing frameworks and it isn't addiction to complexity. Raw PHP code is legendarily insecure and prone to XSS and other issues if you don't do things exactly right.

Nice site, though.

cowsandmilk
1 replies
20h14m

prone to XSS and other issues if you don't do things exactly right.

Not any more so than sites with frameworks. I’ve found XSS issues in Java Spring framework built sites that didn’t “do things exactly right”. A framework doesn’t magically fix that.

krapp
0 replies
19h39m

No one mentioned magic. Frameworks are designed to do what PHP developers wind up implementing in an ad-hoc, haphazard way themselves, and tend to be better at doing it on average. Any code can have security issues but I'd trust a battle-hardened open source PHP framework over some random coder's hubris any day of the week.

cxr
2 replies
21h34m

The page <title> for collections could stand to lose the "Browse free ebooks in the" preamble. It makes it harder to distinguish when looking at a list of open tabs. Consider:

- "Browse free ebooks in the Encyclopædia Britannica’s Gateway to the Great Books set[…]"[1]

- "Browse free ebooks in the Modern Library’s 100 Best Novels set[…]"[2]

- "Browse free ebooks in the Modern Library’s 100 Best Nonfiction set[…]"[3]

(Indeed, the titles are even much longer than that. It feels SEO-ish; not sure why that would be a priority for a free culture project like Standard Ebooks, especially give the momentum and cachet it already has.)

Collections should also have placeholders for unavailable titles. For example, currently the "Utopian Trilogy" collection[4] contains exactly one item, in spite of the true size of the set it actually belongs to. When an item is not available because of copyright, that (along with the year in which SE will first be allowed to make its own edition available) should be made clear. Where it's unavailable because no one has yet proofed the text for an SE edition, a clear call to action can be made.

And it's seemingly minor, but on the subject of editions, I wish SE followed closer to the print tradition instead of the modern Web millieu and clearly identified its microeditions as exactly that: distinct editions of the same text. (Yes, that means there are possibly dozens (or hundreds?) of different editions, given that errors can be found after the fact and the SE house style may even change, necessitating updates. No, that's not a problem.)

1. <https://standardebooks.org/collections/encyclopaedia-britann...>

2. <https://standardebooks.org/collections/modern-librarys-100-b...>

3. <https://standardebooks.org/collections/modern-librarys-100-b...>

4. <https://standardebooks.org/collections/utopian-trilogy>

Jiro
1 replies
17h16m

The page <title> for collections could stand to lose the "Browse free ebooks in the" preamble.

That looks like it might be search engine optimization.

cxr
0 replies
17h11m

It seems like maybe it's for SEO.

carlosjobim
2 replies
21h23m

Hi and thanks for the great work! Have you considered offering .mobi or .azw file formats of the books? With the 2023 browser update, even old Kindles now have a fast and functional web browser. It is almost possible to find and download Standard Ebooks directly from the Kindle browser, but for the file format.

acabal
1 replies
21h15m

We do offer azw3 files for all of our books. https://standardebooks.org/help/how-to-use-our-ebooks#kindle...

carlosjobim
0 replies
21h7m

Yes, Amazon has changed the game and they only allow downloads in .AZW, .PRC, .MOBI or .TXT format now.

I understand that this is their fault and not yours, but maybe it could be interesting for you to offer one of these formats now that the Kindle browser is actually usable?

Lukas_S
2 replies
1d

This is such a cool project. Every time it hits the front page I browse the selections like I’m at a book store.

Have you considered making books sortable by popularity? It might be more approachable for new users if they see books they recognize at the top.

acabal
1 replies
23h55m

That's a frequent request but it would also require having our catalog in a database, which we don't have right now. I do think the time is soon for doing that for several reasons, but there's no spare time in my day at the moment.

wood_spirit
0 replies
20h19m

Perhaps there’s no need for a db? If you have basic web logs, some volunteer can find out how many times a book was downloaded etc, and use that to do a one-off “best of 2023” etc? A kind of SE Wrapped thingy?

veridies
1 replies
23h11m

I've been eagerly awaiting the new Lord Peter Wimsey novel! To avoid burnout, I've been reading them as they enter the public domain instead of reading the whole series all at once, and I was hoping that it would be in the first batch this year. Thank you so much for your hard work!

zem
0 replies
15h31m

heh, that reminds me of when I used to eagerly hunt used bookstores for anything by henry cecil (an out-of-print humorous writer). it was always exciting to find one I hadn't read before. and then his entire works got reprinted and you would have thought I would just buy and binge read the lot, but somehow the excitement went out of it and I just ended up reading a couple more. I should go back and catch up on him, actually, it's been years and years since I last read one.

pseingatl
1 replies
23h6m

What are the dimensions produced by se build-images?

robin_reala
0 replies
21h39m

The expected size for the JPG for the cover is 1400x2100.

harwoodjp
1 replies
1d

You could probably drop the server and use Cloudflare Pages and a SSG. I use Astro for https://sabine.press/

Edit: oh and Lambda for a total of 2 server functions

acabal
0 replies
23h47m

Well, the point is not to jump at the new-fangled tech and AWS cloud lock-in :)

ebooks-ta
1 replies
23h55m

Do you have any thoughts on providing manually pre-formatted PDF files? Em-dashes, curly quotes, etc. are all nice, it's a step in the right direction, but in the end the EPUB file needs to be interpreted by the ebook reader on the fly and in terms of typesetting quality the outcome is far from what physical books provide, since you still get orphans, weird hyphenations, ugly/misaligned chapter titles. For me, nothing beats reading a print-ready PDF file.

acabal
0 replies
23h49m

That's a common request but there are no plans to officially offer PDFs. We offer a variety of reflowable file formats, and each format is more burden to maintain; since PDF is a famously difficult format, maintaining it would be even more burden. A reader requiring a PDF can use a tool to convert any of our files to PDF. That's basically what we'd do at the end of the day, anyway.

There's been some mailing list chatter lately on how to best format PDF editions, but that's not being pursued on a project level.

crabmusket
1 replies
21h15m

Thank you so much for the work you and the whole team, and the contributing community, do! I've read a bunch of classics thanks to your editions, and have donated in the past. This post is a reminder for me to do so again!

aorth
0 replies
13h13m

Came to say the same. I have a recurring donation set up and I'm always happy to see updates and mentions of the project.

pauloxnet
0 replies
19h13m

In addition to the Newsletter and Feeds, it would be nice to have a Blog or News section where you can publish news every now and then, for example an article for the public domain day would have been very useful for making new publications known, simplifying sharing and attracting new volunteers or donors.

NKosmatos
17 replies
1d

For other curious HNers, what differentiates [0] them from Project Gutenberg [1] is the improved typography/styling and the full usage of modern reader techniques. Think of it like, etext != ebook.

[0] https://standardebooks.org/about/what-makes-standard-ebooks-...

[1] https://www.gutenberg.org

chrismorgan
16 replies
1d

So why don’t they contribute these things back to Project Gutenberg? Particularly the typography ones like curly quotes and proper dashes, as those are almost always corrections where the overly-ASCII Gutenberg source doesn’t match the original.

acabal
8 replies
23h53m

Like PG, our editions are blends of other editions, along with our own updates. Often our edition winds up looking nothing like the PG edition, for example when we combine volumes, extract footnotes into endnotes, remove pagination, and so on.

So submitting back to PG would be more like replacing a PG edition, instead of updating it; and I doubt the original PG submitter would like it if their hard work was simply replaced by someone else who thought their version was an improvement.

Our volunteers do sometimes submit typos they find back to PG. We don't require that, so some producers do, and others don't.

chrismorgan
7 replies
23h12m

Yeah, I was just looking through A Christmas Carol and observed a handful of editorial changes in the commits <https://github.com/standardebooks/charles-dickens_a-christma...> (bran-new → brand-new, frouzy → frowzy, and “Lowercase some gratuitously uppercased words”). Frouzy → frowzy I’m mildly in favour of. Ditching bran-new definitely loses character (he omitted the d on purpose!). One or two of the lowercased words were mildly strange capitalised (e.g. Idol was inconsistent with the previous paragraph); but the lowercasing of many introduces broad stylistic inconsistency, and direct local inconsistency sometimes; and most of the capitalisations were not gratuitous. In fact, more than a few were clearly to be pronounced, as a form of emphasis (e.g. Poor, One, Us); and some were distinctly proper nouns in the context, the removal of which increases the parse difficulty (e.g. One¹, /(Cold )?(Roast|Boiled)/); and some reflect customs still common or even preferred in their domains (e.g. Act, Angelic, Apostles, Star). I just reckon that commit should be reverted, because from my perspective it’s mostly actively bad, and the rest subjective. I’m curious what your reaction is to my opinion here.

But yes, I see that you’re practising some editorial oversight and not aiming to faithfully represent the original in all regards, which I gather is more generally Project Gutenberg’s goal; and this would obviously contraindicate upstreaming.

On the other hand, when it comes to more stylistic matters, I tend to wish Project Gutenberg had more consistency. There’s too much gratuitous variation in presentation and ridiculous 256-colour backgrounds. It’s often too obvious much of it is the work of a group of individuals rather than a coherent effort.

I’m curious about the footnote-to-endnote thing, because I’m not sure how the various formats in question handle them all, but in print endnotes are almost always just awful. If anything, I’d be expecting to replace endnotes with footnotes. (Me, I’m partial to sidenotes.)

—⁂—

¹ Hickory dickory dock, three mice ran up the clock; the clock struck one, and has been charged with assault and battery.

wharvle
6 replies
19h58m

Yeah, any edition of a book that's "updating" modern English loses me, including messing with capitalization. Not interested. I love the formatting on Standard Ebooks, but they're no use to me if they're "updating" language, aside from things like repairing typesetting and formatting lost or mangled in PG editions.

Agree on notes in print, side notes (on very-wide editions) are best, then foot, then end of chapter endnotes. Full end-of-work endnotes are awful. Maybe they're better in ebooks, than footnotes, though? E-readers' poor UX for not-even-that-advanced features of books is part of why I barely use them, and practically never for any work that'd have notes of any sort.

bentley
4 replies
18h50m

As someone who regularly compares different scans of old books, I counter: for centuries it’s already been common practice for publishers to update spellings, recapitalize, and even make more drastic changes. You just never noticed because print books don’t have a public commit log.

In the case of Standard Ebooks, “sound‐alike” changes are allowed (so spelling and capitalization changes are allowed when they make sense). Censorship, and even innocuous grammatical changes, are not. Despite generally appreciating old works in their own context, I find the tradeoff in readability for such a widespread practice to be worth it given how minor SE’s alterations are.

kevin_thibedeau
1 replies
12h13m

The problem is when irregular spelling is intended to capture a vernacular. It does a disservice to everyone, erasing the author's intent with homogenized language.

bentley
0 replies
11h52m

If the spelling is intended to be vernacular, the SE policy is not to change the spelling. I (a mere reader) have successfully reverted dialectal spellings in SE several times.

chrismorgan
1 replies
14h17m

Sometimes capitalisation matters are close to purely stylistic, but other times they really are part of the content, guiding pronunciation or emphasis, so that lowercasing them harms the work. What is your opinion of my assessment in the above comment of some of the specific changes in <https://github.com/standardebooks/charles-dickens_a-christma...>?

bentley
0 replies
13h29m

I haven’t looked into your example, but certainly it can be true that lowercasing can be harmful. It goes without saying, I think, that the SE policy is only to lowercase words when doing so doesn’t harm.

When I see erroneous changes in SE books, I argue to revert, and have generally been successful. In my experience it’s drama‐free, like fixing any other typo.

jcurtis
0 replies
7h9m

Editorial commits are all marked as such and contain no non-editorial changes. The tools for compiling ebook files are available at https://github.com/standardebooks/tools, so creating your own versions with only the work you're interested in is straightforward (and can be at least partially automated).

weijiacheng
5 replies
22h37m

In addition to what Alex has said, as an SE contributor I do try to submit errata to Project Gutenberg where I can find the time and energy. Part of the problem, though, is that PG's errata process (https://www.gutenberg.org/help/errata.html) is quite cumbersome since you have to write an email to their errata team with each individual error. That's a real hassle to try to keep track of and submit. Ideally, if PG had something like a pull request system, I would just be able to find those errors in their code and submit the changes directly, but unfortunately they don't have that, so far as I am aware.

That is one major advantage SE has, I think, which is that we do allow people to make pull requests against any of our ebook repositories and any PRs that get merged are automatically deployed to the site. This makes it much, much easier for tech-savvy people to submit proofreading corrections!

cxr
4 replies
20h30m

Part of the problem, though, is that PG's errata process (https://www.gutenberg.org/help/errata.html) is quite cumbersome since you have to write an email to their errata team with each individual error. That's a real hassle to try to keep track of and submit. Ideally, if PG had something like a pull request system, I would just be able to[...]

On the other side of the coin, Standard Ebooks's heavy endorsement/buy-in of GitHub-based workflows are offputting to broader audiences. (It's pretty offputting to me, and I'm not even non-technical; I just recognize it as a sort of Conway's Law + Law of the Hammer sort of thing, and it chafes.) I.e., for others what you describe is far less than "ideal".

acabal
2 replies
20h14m

You don't have to use Github if you don't want to, but you do have to use Git. We've had more than a few producers successfuly produce ebooks without using GitHub or Google Groups.

starkparker
1 replies
18h40m

We've had more than a few producers successfuly produce ebooks without using GitHub or Google Groups.

Can you share or document how? https://standardebooks.org/contribute suggests that "Technically inclined readers can produce ebooks themselves" but doesn't provide any point of entry to do so other than a link to the GitHub org, and "No technical experience is necessary. Contact the mailing list if you want to help." just links to the Google Group.

acabal
0 replies
18h37m

It's very uncommon, if you want to do that then just email me privately and we can set something up.

bentley
0 replies
18h43m

Typos can be reported by email on SE too. Git is only required when you’re publishing a new book. My observation from watching the mailing list is that emailed typos are fixed quickly. (I always fix typos using pull requests, and those are acted on quickly too.)

mouse_
0 replies
1d

Also wondering this.

Handprint4469
16 replies
1d2h

I would love if they offered a download option for a file you could just upload to Lulu (or similar service) to have it printed and mailed to you.

Every time I buy one of these public domain books from Amazon, they are invariably shitty, low-quality "printed by Amazon" versions. I miss the time where you could get a high-quality hardcover, but more and more those seem reserved only for the current week's NYT best-seller books.

dimmke
11 replies
1d1h

This could be a cool monetization strategy. I don't really read physical books, but the "classics" on Amazon are often complete ripoffs. Here's Crime and Punishment for $10 just to get the Kindle version: https://www.amazon.com/Punishment-Penguin-Classics-Fyodor-Do...

I feel like these open domain novels published by big publishing houses have the veneer of legitimacy, but projects like the one this thread is about I think could accomplish much more. Especially for authors where the work is translated into English. Plus the cover designs are much cooler.

I will say, the search on their website is kind of slow and could use some work.

paulcole
8 replies
22h52m

Why is that book a ripoff?

cxr
7 replies
20h1m

Without discounting the point made by smogcutter about Penguin's edition not actually being public domain: for a classic work, I'd expect to be able to get a paperback for less than $10.* And that involves a real-life physical artifact which (a) necessarily has lower margins than an ebook, and (b) doesn't come with the omnipresent threat that it will evaporate from your device (or your managed online locker or whatever), nor that you'll have to stop reading if your battery dies, nor that you're unable to easily hand it to someone else to let them thumb through or borrow it. For an ebook, $3 or $4 sounds about right. Maybe $5 for a relatively modern translation, as in the case here. Recall that Netflix in comparison is $X per month (fill this is in; I don't actually know, but I know the number is not high) and libraries are free-ish. Price points at or around $10 per work or more feel like a shameless ploy to trigger the sensation of "economy" in "false economy" and push people into rent-seeking platforms where they consistently hand over a continual stream of monthly payments in perpetuity for "unlimited" access—to select items within the very limited one month term that the payment gets you.

* NB: whether this is actually the case or not is a separate matter

dimmke
3 replies
18h46m

Yes basically. I understand if a publisher commissioned a translation and put in work but $10 for a DRM digital copy is too steep. Maybe $3-$4 would be reasonable? Crime and Punishment was written in the 1800s, the author is long dead. And it's considered a historic and important piece of literature.

Regardless, it's great that these works are available in high quality for free.

paulcole
2 replies
17h8m

If it was $3 would you buy it today?

dimmke
1 replies
16h32m

Because there are high quality alternatives that are free and have no DRM there's no price point I would buy it. The only scenario I would is if I wanted to read a specific translation.

A physical copy if I wanted one I'd be willing to pay ~$6 for, less if used.

paulcole
0 replies
13h53m

If there’s no price point at which you’d buy it why weigh in with what you believe a reasonable price to be?

Qwertious
1 replies
16h54m

Physical media tends to be ~30-40% of the costs, so I think it's more like $6 or $7.

cxr
0 replies
16h12m

You think what's more like $6 or $7?

paulcole
0 replies
17h9m

Eh, I disagree. $10 to be able to have it this second on my Kindle vs. waiting to get the physical copy feels like a good value.

zuminator
0 replies
1d1h

In fairness though, if you sort by price, you can always find classics on Amazon for dirt cheap. E.g.

https://www.amazon.com/Greatest-Works-Dostoyevsky-Punishment...

Although there's no saying as to whether or not they will have proper spellcheck, TOC, if they are legitimately in the public domain, or even if it's the right book with all the pages. That's where a service like Standard Ebooks is superior to the potluck you get from Amazon.

smogcutter
0 replies
1d

Not the greatest example, as the translation is not public domain.

acabal
2 replies
1d

I've considered running a campaign to finance a print run of some of our (SE's) books. But the fact is that it's just so easy to find super cheap paper copies of these books almost anywhere. As long as you buy a copy that was printed before, say, 2005 - or from a reputable publisher like Oxford or Penguin - then the edition will already be pro quality. (After that, it's much more likely that you're buying a print-on-demand copy of a raw Project Gutenberg text.)

If we did offer print books, I think the value-add would be making them extremely ornate, one-of-a-kind editions like Arion Press or Folio Society make, and we'd charge a lot for a copy. But even then I'm still not sure the juice would be worth the squeeze, because that's also been done to death... how many more fancy editions of Dracula or whatever does the world need?

zopa
1 replies
23h41m

I think you might be underestimating the value-add---at least based on the existence of this thread! Yes quality copies are out there, but easy to find for the Editor-in-Chief of Standard Ebooks doesn't mean easy for everyone. I suspect plenty of people would find a trusted, no hassle source for a quality print copy worthwhile, just for the simplicity and convenience. Though I totally respect not wanting to waste ink and kill trees reprinting something that's already widely available.

acabal
0 replies
21h31m

Folio Society basically already does this, at a premium price. Used copies of well-set PD classics from respected publishers like Franklin Library or Modern Library go for pennies and can be shipped to your door fron places like Abebooks, or you can easily find them at your nearby library sale/used bookstore/charity shop/etc.

I've been toying with the idea for a while but I think the market is just too saturated, even for premium editions. Maybe the focus should be on reviving more obscure works... not sure.

moonchild
0 replies
9h6m

Lulu does not make 'high-quality hardcovers'.

harwoodjp
5 replies
1d

I wonder if a scan -> OCR -> LLM proofreading pipeline is possible?

weijiacheng
2 replies
22h47m

I am one of the SE editors/regular contributors and I did play around with this a bit for a poetry collection: https://groups.google.com/g/standardebooks/c/IUvGLmvZrmM/m/s...

I'm sure someone sufficiently determined and good at prompt engineering, and integrating LLMs into a larger toolset, could come up with something even better. I'm personally very skeptical of LLMs as a technology, but even I have to admit that this was a pretty ideal and unobjectionable use of LLMs.

That being said, though it was a fun experiment, I later found that it was easier (and less wasteful of natural resources) to just do the same thing with a bit of custom markup and a search and replace script.

duskwuff
1 replies
18h50m

I don't think that's quite what the parent had in mind.

The most natural application of a language model in proofreading is to compute perplexity across the text; if all goes well, errors should be detectable as points of unusually high perplexity. (In principle, this should even be able to spot otherwise undetectable errors like missing words.)

weijiacheng
0 replies
18h1m

I could see how that would be helpful, but at least for my use case I'm more interested in seeing how LLMs integrated with computer vision can speed up transcriptions. Since a thorough proofread by a human is already baked into the SE production process (and is indeed one of the major selling points), having more automated tools to aid proofreading is nice but doesn't do anything fundamentally different, from my point of view. Whereas if LLMs can be leveraged for transcription SE producers no longer need to depend on external projects like Project Gutenberg or Wikisource to produce texts (which can take months) or transcribe texts from OCR results by hand (very tedious and error-prone--believe me, I'm speaking from experience!). It would drastically open up the range of possible books someone could reasonably produce (in a timely fashion) for SE.

eigenvalue
0 replies
4h15m

I made a tool like that, and I bet with a more powerful LLM like GPT4, and perhaps a better baseline OCR tool (like GPT4 vision), it could work really well for this sort of thing:

https://github.com/Dicklesworthstone/llama2_aided_tesseract

Forge36
0 replies
23h44m

As a first pass I'm sure it'll save some effort (i.e. l -> 1 in some fonts). I can't imagine it fully replacing and editing/proofreading passes.

flxfxp
4 replies
23h42m

Very cool project. Does anyone know of something similar for audiobooks?

rsanek
1 replies
22h48m
jodrellblank
0 replies
14h54m
acabal
0 replies
22h57m

Librivox creates audiobooks of PD texts. I've heard good things about their work but I personally don't listen to any audiobooks in general.

Uvix
0 replies
23h14m

Any audio recording will have its own copyright separate from the base text, so it'll be a while before any quality audiobooks enter the public domain.

For now, your best approach would be to take high-quality ebooks like what Standard Ebooks offers, and use text-to-speech software.

contrarian1234
4 replies
14h17m

There are HTML versions but they don't seem to reflow

example:

https://standardebooks.org/ebooks/rudolph-erich-raspe/the-su...

A bit offtopic, but I never understood why .epub is a thing. For instance the linked HTML/XHTML version seems to work just fine (except for the reflow thing.. but I assume that a CSS issue)

.epub seems to be mostly HTML with a few pieces missing. I guess I don't understand why we needed a new format? and not just use a strict HTML subset?

I'd love some strict HTML subset that indicated the file can be used offline. I personally try to make all my webpages so that they can be saved to disk and opened from a single file (though if you embed images/videos this becomes problematic). But I don't have a way to indicate to a reader "Hey you can Ctrl+S this webpage". I'd publish .epub, but the browser won't open them

boznz
3 replies
14h12m

epub is just a collection of zipped HTML, CSS FONTS and images and as bog standard as you can get. You can open it with a Zip extractor and see.

contrarian1234
2 replies
14h8m

Huh.. yeah.. so then why doesn't Firefox/Chrome open it?

robin_reala
0 replies
12h14m

Edge used to support ePubs directly, but they removed that functionality for some reason. There is a little more to it than just rendering. E.g. ideally you’d want some popup table of contents support.

boznz
0 replies
13h52m

Chrome opens the html pages fine once the epub is unzipped. Some epubs may however have DRM I dont have any of those but they probably wont work.

dflock
2 replies
1d2h

A well run open source ebook project, producing the highest quality ebooks. Always looking for volunteers as well as donations.

dalanmiller
1 replies
19h38m

What’s the best way to volunteer?

acabal
0 replies
19h33m
knbknb
1 replies
2h13m

Who decides or sets the difficulty level of "reading ease" (which is a sortable metadata attribute on the search page) ?

Some classiications seem a bit ...nonintuitive. For example, the Autobiography of John Stuart Mill is classified as "very diffcult" whereas "The Tempest" by Shakespeare is classified as "fairly easy".

I would classify it the other way around, but what do I know, I'm a nonnative speaker anyway.

robin_reala
0 replies
1h9m
zinsn1
0 replies
7h31m

To have access to what I read and also remember it, I created recently a web app called bookeeper that exports your Kindle highlights to notion, generating a personalized summary of it with AI as well. Try it here if you are interested: https://bookeeper.io/?utm_source=hacker_news&utm_medium=book

wazdra
0 replies
23h23m

This is very nice ! I’d love to see this for French literature too

pseingatl
0 replies
23h7m

What are the standard dimensions produced by se build-images?

mmastrac
0 replies
1d1h

I published a couple of books for the project during a sabbatical in 2021 (The Devil's Dictionary [0] and a cheesy, small H. Beam Piper book named Four-Day Planet).

The process and tools are quite nice and it's very rewarding to see your work in ebook form. It takes a _long_ time to proof and re-read a book, but it's surprising how many times you can do this and how differently you need to read to catch errors versus just enjoying the damn book.

The fascinating part of the project is a _strong_ editorial opinion, which IMO makes the project successful. There is a core group of people that upholds the standards for the project, and the resulting consistency of quality of output derives from that. The team clearly cares about the quality, and has demonstrably maintained that over the huge number of releases.

I even went to the archives of the "San Francisco Newletter and California Advertiser" to collect some of Bierce's original work, making it the most complete, and most corrected open-source version of the book. [1] The one previously hosted by Project Gutenburg was quite old and, frankly, quite riddled with transcription errors.

I haven't tried reading the Devil's Dictionary back-to-back since I published it, but I might one day. There's a lot of detail in this work that I never saw until I had it under a microscope.

[0] https://standardebooks.org/ebooks/ambrose-bierce/the-devils-...

[1] https://archive.org/details/san-francisco-newletter-dec-11-1...

dimmke
0 replies
1d2h

This is really cool. I'm going to donate.

dang
0 replies
23h3m

Related:

Standard Ebooks - https://news.ycombinator.com/item?id=32215324 - July 2022 (256 comments)

Free and liberated e-books, carefully produced for the true book lover - https://news.ycombinator.com/item?id=25138534 - Nov 2020 (106 comments)

Standard Ebooks: Free public-domain ebooks, carefully produced - https://news.ycombinator.com/item?id=20594802 - Aug 2019 (129 comments)

Standard Ebooks: Free and liberated ebooks, carefully produced - https://news.ycombinator.com/item?id=14570035 - June 2017 (96 comments)

andrewedstrom
0 replies
23h14m

Standard Ebooks is fantastic! In fact, I love what they're doing so much that I actually built a little SaaS product on top of their ebook collection.

The site is called Modern Serial, and it lets you read books from Standard Ebooks in 10 minutes a day as Substack-style email newsletters.

https://modernserial.com/

100k
0 replies
23h8m

I’m happy to see Standard Ebooks here! I’ve read their editions of Nostromo by Joseph Conrad and Vanity Fair by William Thackeray and the quality great. I recommend it if you’re interested in classic literature.