return to table of content

Apple built iCloud to store billions of databases

tombert
86 replies
1d2h

Sadly I never got to work on this when I was at Apple (interviewed for it though!), but hearing about this a few years ago sort of made me realize something that should have been obvious: there’s not really a difference between a database and a file system.

Fundamentally they do the same thing, and are sort of just optimizations for particular problem-sets. A database is great for data that has proper indexes, a file system is great for much more arbitrary data [1].

If you’re a clever enough engineer, you can define a file system in terms of a database, as evidenced by iCloud. Personally, I have used this knowledge to use Cassandra to store blobs of video for HLS streams. This buys me a lot of Cassandra’s distributed niceties, at the cost of having to sort of reinvent some file system stuff.

[1] I realize that this is very simplified; I am just speaking extremely high level.

gumby
15 replies
1d1h

there’s not really a difference between a database and a file system. Fundamentally they do the same thing, and are sort of just optimizations for particular problem-sets.

Conceptually that is quite true, though the domain dependencies make a lot of the code end up looking quite different.

But the first true database (pre-relational!) was developed for SABRE, American Airlines' computerized reservation system, in the early 1960s. Before that tickets were issued manually and the physical structure of the desks and filing systems used to make reservations reflected the need!

Unfortunately I can't find the paper I read (back in the mid 80s) on the SABRE database but I remember that record size (which is still used today!) was chosen based on the rotational speed of the disk and seek latency. Certainly there was no filesystem (the concept of filesystem barely existed, though Multics developed a hierarchical filesystem (intended to be quite database-like, as it happens) around the same time. The data base directly manipulated the disk. I don't know when that changed -- perhaps in the 1970s?

Like I said I can't quickly find the paper on the topic, but here's a nontechnical discussion with some cool pictures: https://www.sabre.com/files/Sabre-History.pdf. A search for "American Airlines SABRE database history" finds some interesting articles and a couple of good Wikipedia pages.

oneplane
3 replies
1d1h

I think direct manipulation never went away, but the abstractions that were provided for general use were too useful to pass up for most workloads.

Some kinds of storage like cloud-scale object storage use custom HDD firmwares and custom on-disk formats instead of filesystems (±2005-era tech), we also have much newer solutions that do direct work on disks like HMR (not to be confused with HAMR or HAMMER2) where the host manages the recording of data on the disk. There are some generally available systems for that, but we also have articles like this: https://blog.westerndigital.com/host-managed-smr-dropbox/ (Which mostly focuses on SMR but this works on CMR too).

As for the record size in the DB vs. Disk attributes, that's probably not used like that anymore, but I do know that filesystem chunks/extents/blocks are calculated and grouped to profit from optimal LBA access. If you run ZFS and have it auto-detect or manually set the ashift size to make it match the actual on-disk sector size. This was especially relevant when 512e and 4Kn (and the various manufactures 'real' and 'soft' implementations) weren't reliable indicators of the best sector access size strategies.

kevin_nisbet
1 replies
22h48m

I could be wrong, but I sort of think when I learned Oracle back when I was in school (mid-2000s) supported dropping a database on a raw block device. So it's been around a long time, but would be uncommon in some tech circles.

HillRat
0 replies
22h24m

Yeah, until the mid '00s you would run your db directly to raw disk devices, both in order to optimize the use of larger contiguous disk regions (disk drives were slow in those days!) and, crucially, because if/when your server went down hard any pending OS-buffered writes would result in a corrupted database, lost data, and lengthy rebuilds from logs (generally after having to do a long fsck recovery just to get back into the OS). It wasn't until journaled filesystems became common and battle-tested that you saw databases living in the filesystem proper.

yencabulator
0 replies
3h52m

I believe the "least proprietary" interface to this, that looks like it'll cope with both SMR rotating disks and flash, is Zoned Namespaces.

With ZNS, you have a fixed number of fixed size append-only zones, each of which can only be erased as a whole. It starts to look a lot like a typical LSM tree..

https://zonedstorage.io/docs/introduction/zns

rbanffy
2 replies
23h49m

I love that Amdahl mainframe (page 6) with that humongous 20" CRT console.

Most likely showing a 24x80 3270 console session, with 8x16 character cells (if that much), but, still, quite awesome.

I'm not aware of any that ended up in a museum, sadly.

For those with sufficiently cool IEEE memberships, there is quite a lot about Sabre in the Annals of the History of Computing magazine archives.

https://ieeexplore.ieee.org/document/397059

https://ieeexplore.ieee.org/document/1114868

https://ieeexplore.ieee.org/document/279229, which is not about Sabre, but Air Canada's system.

If you think about it, modern IBM mainframes have a lot of weirdness about their filesystems and the concept of a file. Those machines are very alien for people who grew up on Unix.

Scoundreller
1 replies
20h14m

seems like a good time to remind people that using sci-hub might be unlawful and/or blocked in your country

gumby
0 replies
16h19m

Looks like some people failed to understand your comment.

ChuckMcM
2 replies
1d

Good old ISAM (Indexed Sequential Access Method) before DASD (Direct Access Storage Device) took over. (Aren't you glad IBM didn't win the "name the things" contest? :-))

I'm going to guess that by "domain dependency" you're talking about how

   handle = open("foo.txt");
Looks semantically different than

   err = db->exec("SELECT * from DIRECTORY where NAME = 'foo.txt';", &result);
So yes in that regard they certainly "feel" different, although at some point I needed a file system for an application than built a wrapper layer for sqlite that basically gave you open/read/write/delete calls and it just filled in all the other stuff to convert specialized filesystem calls into general purpose database calls.[1]

The best thing you can say about the way UNIX decided to handle files was that it forced people to either use them as is or make up their own scheme within a file itself (and don't get me started on the hell that is 'holey' files)

[1] In my case the underlying data storage was a NAND flash chip so the result you got back which was nominally a FILE* like stdio had the direct address on flash of where the bits were. read-modify-write operations were slow since it effectively copied the file for that (preserving flash sector write lifetimes)

rbanffy
1 replies
23h48m

Funny enough, DASD is now, for the first time, more accurate than "disk".

But yes. Talking to mainframe people is a bit like talking to astronauts, in that their jargon is completely impenetrable to the uninitiated.

jasomill
0 replies
13h52m

In addition to disks, IBM direct-access storage options available in the middle sixties included a variety of magnetic drum devices and the short-lived, tape-based Data Cell Drive[1].

[1] https://en.wikipedia.org/wiki/IBM_2321_Data_Cell

jahewson
1 replies
1d

Not disk drives but tape drives. Most likely these:

https://en.m.wikipedia.org/wiki/IBM_729

gumby
0 replies
23h23m

SABRE was specifically disk drives, though given the capacity of drives in those days I'm sure tapes were very important (and you see a lot of them in the photos from the link I included)

randomdata
0 replies
14h7m

> Certainly there was no filesystem [...] I remember that record size

Sounds like a record-oriented filesystem to me.

Which comes as no surprise as there is no difference between a database and a filesystem.

lchen_hn
0 replies
21h20m

And I thought SABRE sells printers and acquired Dunder Mifflin

eep_social
0 replies
1d

Yep and this is why you still get a six character Passenger Name Record (PNR) for your flight booking.

paganel
8 replies
1d2h

there’s not really a difference between a database and a file system.

That was the promise of WinFS back in the day, which would have been really something had MS managed to bring it to fruition.

I still remember the hype from back then, in my opinion totally justified, too bad that things didn't come to be. I legit think that that project could have changed the face of computing as we know it today.

mike_hearn
7 replies
1d2h

They tried to adapt SQL Server iirc but it wasn't the right approach for a desktop OS.

The issue with the filesystem-as-database concept is that unless you're doing it as a serverside thing to get RDBMS features for files, it doesn't give you much more power without very serious changes to applications.

The first problem is that databases are most useful when they index things, but files are just binary blobs in arbitrary formats. To index the contents you have to figure out what they are and parse them to derive interesting data. This is not best done by the filesystem itself though - you want it to be asynchronous, running in userspace and (these days) ideally sandboxed. This is expensive and so you don't want to do it on the critical file write path. Nowadays there are tools like Spotlight that do it this way and are useful enough.

If you don't do that then when it comes time to sell your shiny fs-as-a-db feature for upgrade dollars, you have to admit that your db doesn't actually index anything because no apps are changed to use it. Making them do so requires rewriting big parts from scratch. In that era I think the Office format was still essentially just memory dumps of internal data structures, done for efficiency, so making Office store documents as native database tables would have been a huge project and not yielded much benefit over simple text indexing using asynchronous plugins to a userspace search service.

Another problem is that databases aren't always great at indexing into the middle of blobs and changing them. Sometimes db engines want to copy values if you change them, because they're optimised for lots of tiny values (row column values) and not small numbers of huge values. But apps often want to write into the middle of files or append to them.

Yet another problem is that apps are very sensitive to filesystem performance (that's why the fs runs in the kernel to begin with). But databases do more work, so can be slower, which would make everything feel laggy.

So yeah it was a beautiful vision but it didn't work out. Note that operating systems started with databases as their native data storage mechanism in the mainframe era, and that was moved away from, because there are lots of things you want to store that aren't naturally database-y (images, videos, audio, source code etc).

edgyquant
3 replies
1d2h

So basically there is a difference between a DB and an FS

orand
1 replies
1d

In theory, there is no difference between theory and practice. In practice, there is.

edgyquant
0 replies
1h17m

In practice a file system is not at all like a traditional database as it lacks querying of the data itself

randomdata
0 replies
22h37m

No. The takeaway is basically that there is no reason for Windows to use a relational database for storing information about files when a hierarchal database does it better for the vast majority of use cases its users encounter.

It is, perhaps, possible another product with a different set of users with different needs could still find value in a relational filesystem, but Microsoft was unable to find that fit.

bombcar
1 replies
1d2h

Even now we see many cases where "files are stored in the database" eventually migrates to "we store files on the filesystem and pointers to them in the database". I know at least a few projects that have done that migration at some point.

vel0city
0 replies
22h38m

Some databases even ship with this out of the box.

https://learn.microsoft.com/en-us/sql/relational-databases/b...

yencabulator
0 replies
3h38m

Another problem is that databases aren't always great at indexing into the middle of blobs and changing them. Sometimes db engines want to copy values if you change them, because they're optimised for lots of tiny values (row column values) and not small numbers of huge values. But apps often want to write into the middle of files or append to them.

Then again, there's no such thing as overwrite on flash storage (just write-once or erase larger chunk), so maybe the next generation of storage for large objects as extents that are write-once is the way forward. Plenty of filesystems have already switched to this model.

jwr
5 replies
1d2h

there’s not really a difference between a database and a file system

Having written an interface to FoundationDB in preparation to moving my app over to it, I couldn't disagree more.

Even "has proper indexes" is not something we'd agree on. In my case, for example, I am extremely happy with the fact that my indexes are computed in my app code, in my language, and that I am not restricted to some arbitrary concept of database-determined "indexable fields" and "indexable types".

Then there are correct transactions, versionstamps (for both keys and values), streaming large amounts of data, all of that in a distributed database, it's really nothing like a filesystem.

toolz
3 replies
1d

I'm interested in having you expand on these thoughts, so I'll play devils advocate here. I personally don't have strong opinions on the subject.

has proper indexes

Does it matter where in the code the index lives? Are you arguing that databases don't have proper indexes or that filesystems don't? I'm not sure I'd agree with either argument.

correct transactions

filesystems and databases have transactions, which one is "incorrect"?

versionstamps (for both keys and values)

filesystems have timestamps, not sure what a versionstamp is but I suspect it's some domain specific name for a more general concept that both databases and filesystems utilize.

streaming large amounts of data

many databases stream massive data and filesystems certainly do

all of that in a distributed database

every major PaaS has some form of distributed filesystem

drewbug01
2 replies
20h59m

Versionstamps are not a simple timestamp; it’s a cluster-wide order-able unique id.

https://apple.github.io/foundationdb/data-modeling.html#vers...

toolz
1 replies
20h1m

How do they differ from vector clocks? Just a different implementation of the same thing maybe? Either way, distributed filesystems definitely have the same general concept.

azurelake
0 replies
18h45m

Vector clocks give a partial ordering. FDB version stamps give a total ordering by having a single process issue them for the entire cluster. There's a good breakdown here: https://blog.the-pans.com/notes-on-the-foundationdb-paper/

timc3
0 replies
21h30m

I suggest you write a file system, then write a database and then re-evaluate whether you still think the same way.

mike_hearn
4 replies
1d2h

You don't really need to be a clever engineer, there are pre-made implementations out there for you.

For example if you have an Oracle DB, then it has a feature called DBFS that does this already:

https://docs.oracle.com/en/database/oracle/oracle-database/2...

You can instantiate a POSIX compatible FS using database tables, and then mount them using FUSE. From there you can export it via NFS if you wish. You can also export the FS via WebDAV and thus mount it over the network using the WebDAV support built in to Windows or macOS.

If you want to work with the FS transactionally, you have to do that using PL/SQL. POSIX doesn't define APIs for FS transactions, so some other approach is needed.

Because it's stored in the DB you can use all the other features of the RDBMS too like clustering, replication, encryption, compression and if need be, you can maintain indexes over the file content.

hansoolo
1 replies
1d

Thanks for that one. I just started a new job, where they use only Oracle DBs and that could be useful.

j33zusjuice
0 replies
1d

I’m so sorry.

tombert
0 replies
1d2h

Absolutely, I was referring to the cleverness of the engineers that actually made those implementations.

Making a FUSE file system is sort of a bucket list thing I haven’t gotten around to doing yet. Maybe I should hack something together while I am still unemployed…

divyenduz
0 replies
9h55m

Plug, wrote something along these lines. It is a FUSE file system and the storage is SQLite

https://github.com/divyenduz/zoid-fs

shuckles
3 replies
1d2h

I disagree because querying is an important feature of most databases as usually conceived, so I think filesystems are only a subset of a database.

travisgriggs
2 replies
1d1h

Grep, locate, find… aren’t these all query tools for file systems?

shuckles
1 replies
1d

Database queries are a lot more complex than a pattern match search. In addition, grep et al aren’t part of the file system in both the simple sense (they ship separately) and the meaningful sense (filesystems are rarely designed to facilitate them).

randomdata
0 replies
1d

> grep et al aren’t part of the file system in both the simple sense (they ship separately)

It seems you are confusing database with database engine or possibly database management system. Querying is not a function of a database.

In fairness, the lazy speaker often says "database" in place of "database engine" and "database management system" to save on the energy of having to make additional sounds when one can infer what is meant by context, but in this case as "database" means database...

> (filesystems are rarely designed to facilitate them)

Facilitating querying is a primary objective of a filesystem database. What do you think "path/to/file.txt" is? That's right. It's a query!

mannyv
3 replies
1d1h

"there’s not really a difference between a database and a file system"

The BeOS filesystem was basically a database.

But there are a lot of differences between a database and a file system. A better way of thinking about it is that a filesystem is just a specialized database.

From an old school, a data base is really just a collection of data. An RDBMS = relational database. A filesystem is just another kind of database. etc etc.

randomdata
0 replies
1d

> A better way of thinking about it is that a filesystem is just a specialized database.

Aren't all databases specialized?

mike_hearn
0 replies
21h24m

BeFS wasn't really a database as we'd normally understand it. It had no transactions, for one. It only understood string and numbers as datatypes as well.

It had what was basically a normal UNIX filing system, complete with /dev, /etc and so on, and it had support for indexing extended attributes. Your app was expected to create an index with a specific API at install time, and then after that writes to the indexed xattr would update a special "index directory". The OS could be given a simple query predicate with range and glob matching, and it would answer using the indexes, including a live query.

This was neat, but you could implement the same feature in Linux pretty easily. Nobody ever has, probably because xattrs historically didn't work that well. They don't get transmitted via common network protocols and have a history of getting lost when archiving, although I think these days every archive format in real use supports storing them.

There's also the question of how it interacts with POSIX file permissions. BeOS was an aggressively single user system so just didn't care. On Linux you'd need to think about how files that you can't read are treated in the indexing process.

Multiple devices also poses problems. BeOS simply required that apps create an index on a specific device themselves. If you plugged in a USB drive then files there just wouldn't show up in search unless the files had been created by not only BeOS, but an app you had previously installed. Note that installing an app post-hoc didn't work because creating an index didn't populate it with existing files, even if they had the right xattrs.

And of course it only worked with files. If you had content where the user's conception of a thing didn't map 1:1 to files, it was useless. For example you couldn't index elements within a document this way. Spotlight can index app states and screens, which also obviously BeOS couldn't do.

So there were a lot of limitations to this.

The modern equivalent would be writing a search plugin:

https://developer.apple.com/documentation/corespotlight/maki...

The API is more complex but lets you create search results that aren't directly tied to specific filing systems.

lsaferite
0 replies
22h52m

I was about to bring up BeOS and decided to search to see if someone else had mentioned it already. Glad to know I'm not alone in remembering BeOS. :)

kennethrc
3 replies
1d2h

That was the idea behind the (ill-fated) ReiserFS, IIRC?

mike_hearn
2 replies
1d1h

Reiser argued that if you optimised a filesystem for very tiny files, then many cases where apps invent their own ad-hoc file-systems-in-a-file could be eliminated and apps would become easier to read/write and more composable.

For example, instead of an OpenOffice document being a zip of XMLs, you'd just use a directory of XMLs, and then replace the XMLs with directories of tiny files for the attributes and node contents. Instead of a daemon having a config file, you'd just have a directory of tiny files. He claimed that apps weren't written that way already because filesystems were wasteful when files got too tiny.

Git is an example of a program that uses this technique, to some extent at least (modulo packfiles).

In reality, although that may have contributed, there are other reasons why people bundle data up into individual files. To disaggregate things (which is a good place to start if you want a filesystem-db merge) you also have to solve all those other reasons, which ReiserFS never did and as a project that "only" wanted to reinvent the FS, could not have solved.

Apple hit some of those issues when they tried making iLife documents be NeXT bundles:

1. Filesystem explorers treat files and directories differently for UI purposes. Apple solved it nicely by teaching the Finder to show bundle directories as if they were files unless you right click and select "Show contents". Or rather partly solved ... until you send data to friends using Windows, or Google Drive, or anything other than the Finder.

2. Network protocols like HTTP and MIME only understand files, not directories. In particular there is no standardised serialisation format for a directory beyond zip. Not solved. iLife migrated from bundles to a custom file format partly due to this problem, I think.

3. Operating systems provide much richer APIs for files than directories. You can monitor a file for changes, but if you want to monitor a directory tree, you have to iterate and do it yourself. You can lock a file against changes, but not a directory tree. You can check if a file has been modified by looking at its mtime, but there's no recursive mtime for directory trees. You can update files transactionally by writing to a temporary file and renaming, but you can't atomically replace a directory tree. Etc.

So the ReiserFS concept wasn't fully fleshed out, even if it had been accepted into the kernel. Our foundational APIs and protocols just aren't geared up for it. I've sometimes thought it'd be a neat retirement project one day to build an OS where files and directories are more closely merged as a concept, so files can have sub-files that you can browse into using 'cd' and so on, and those API/protocol gaps are closed. It wouldn't give you a full relational database but it'd be much more feasible to port apps to such an OS than to rewrite everything to use classical database APIs and semantics

zer00eyz
1 replies
23h45m

>> 2. Network protocols like HTTP and MIME only understand files

Love when someone says something that makes my brain work!

For the most part you're spot on. HTTP has multipart messages that in theory could be extended to be composite of anything. So we could have those bundles! Oddly we can send to the server with a multipart message (forms)!!

I think that MIME is an interesting slice the OTHER way. You could store versions of the same document in a directory so HTML and JSON and XML OR a video or image in two formats and serve them up based on the MIME request.

Now if we could make one of those a multi part message...

mike_hearn
0 replies
21h50m

The problem is the case where you want to upload or attach >1 document that's actually a directory. You need a way to signal that the first 3 files are a part of document A, and the next 5 are part of document B, and although you could invent a file name convention to express this nothing understands it. Email clients would show 7 attachments, web server APIs would show 7 files, browsers would need to be patched to let you select bundles in the file picker and then recursively upload them, how progress tracking works would need to change, etc.

And then how do you _download_ them? Browsers don't understand MIME at download time.

None of it is hard to solve. But, nobody ever did, and the value of doing things this new way is usually going to be lower than the value of smooth interop with everyone's different browser/OS/email/server combos.

hiAndrewQuinn
3 replies
1d1h

It's true. One of the projects in my little "Ridiculous Enough To Work" folder is SQLiteOS, which uses a giant SQLite database as the underlying filesystem.

hnlmorg
2 replies
1d

I once built a FUSE file system that used MySQL as the RDBMS. The idea being a remote file system.

IIRC read only access worked well but I had issues getting write access working.

sroussey
1 replies
1d

Somebody else must’ve done that as well, because I remember playing around with a MySQL fuse system, with both read and write.

folmar
0 replies
23h25m
fzeindl
3 replies
1d1h

The difference is that file systems need a lot of “mechanical sympathy” to account for the many quirks inside syscalls and actual physical disks.

There was a nice video about how it is really hard to implement file systems because disks just don’t do what you expect.

Databases are a layer up and assume that they can at least write a blob somewhere and retrieve it with certain guarantees. Those guarantees are a thousand hacks in the file system implementation.

mr_toad
0 replies
18h35m

Unfortunately those mechanical sympathies related to spinning disks, and now we have SSDs that have to fake like they are spinning disks for file system compatibility and all the software that expects file systems to behave that way.

jandrewrogers
0 replies
21h37m

Most non-trivial databases run on what is essentially their own purpose-built file system, bypassing many (or all) of the OS file services. Doing so is both higher performance and simpler than just going through the OS file system. Normal OS file systems are messy and complex because they are serving several unrelated and conflicting purposes simultaneously. A database file system has a fairly singular purpose and focused mission, and also doesn't have the massive legacy baggage of general purpose file systems, so there are fewer tradeoffs and edge cases to deal with.

The more sophisticated the database kernel, the more the OS is treated like little more than a device driver.

aeyes
0 replies
1d1h

What database are we talking about? Oracle best runs on ASM which is basically it's own filesystem.

And most journaling filesystems actually get in the way of databases which try to commit their own changelog to disk.

MenhirMike
3 replies
1d2h

If you look up WinFS (which is a cancelled Windows file system originally intended to ship with Windows Longhorn), its basic principle is exactly that, be a database that happens to work as a file system.

Not sure why exactly it failed, I assume that it just wasn't a suitable idea at the time given that most consumer devices (especially laptops) had very slow traditional hard drives, but in the age of NVMe storage, maybe it would be worth revisiting, assuming that Microsoft is still interested in evolving Windows in meaningful ways outside of better Ad delivery mechanisms.

bombcar
2 replies
1d2h

IIRC WinFS didn't precisely fail as much as get cancelled along with Longhorn, and parts of it migrated into other projects.

Much of the consumer-facing niceties of it got implemented in search tools that track metadata separately.

zeusk
0 replies
1d2h

ReFS has some learnings from WinFS

MenhirMike
0 replies
1d1h

It did fail in devliering the actual product that was intended, but yeah, they did salvage a lot of it and also AFAIK helped the SQL Server team improve a few things. So it's a bit like Intel's Larrabee (which did technically come out as a product, Xeon Phi) as well: A high profile R&D project.

tw04
2 replies
1d1h

Microsoft had the same idea in the early 2000s:

https://en.wikipedia.org/wiki/WinFS

GeekyBear
1 replies
1d1h

Prior to Longhorn, Microsoft had previously attempted "database as a file system replacement" as part of Cairo.

Cairo was the codename for a project at Microsoft from 1991 to 1996. Its charter was to build technologies for a next-generation operating system

https://en.wikipedia.org/wiki/Cairo_(operating_system)

pjmlp
0 replies
22h0m

Somewhere on my parents attic there is a Windows magazine about Cairo project and all the cool things it would bring.

In both cases, Longhorn and Cairo, the only thing that survived were a bunch of COM libraries.

kzrdude
2 replies
1d2h

If I understand correctly, bcachefs, that new hot filesystem, is pretty similar to a database - maybe someone knows more about this.

Tuna-Fish
1 replies
1d

The on-disk layout is very similar to many modern databases, but the interface that is offered to the user is pretty much just a normal filesystem.

lanstin
0 replies
19h29m

That's the difference - the API; as much as you can store a lot of data in either, SQL is not much like Posix. The lower level "distributed" APIs are like OS implementations of the Posix API.

jerf
2 replies
23h15m

"there’s not really a difference between a database and a file system."

It depends on how abstracted you're getting. I sometimes talk about the 30,000 foot view, but in this case, I might stretch the metaphor to say that from Low Earth Orbit, there is indeed not much difference between a database and a file system. In fact, there's not much difference between those things and some function calls. You put some parameters out, you get some stuff back.

From just slightly higher one realizes or remembers, it's all just numbers. You put some numbers into the system and get some other numbers out. Everything is built out of that.

You can build a database out of functions, a file system out of a database, functions out of a file system (albeit one beyond a blob store, think /proc or FUSE rather than ext2), you can mix network streams into any of these, anything you like.

And while it's helpful to be aware of that, at the same time, you are quite into architecture astronautics at that point and you are running low on metaphorical oxygen, and while the odd insight generated from this viewpoint might help here or there, if one wishes to actually build iCloud, one is going to have to come a great deal closer to Earth or one is going to fail.

Still, in the end, it's all just numbers in response to other numbers and the labels we humans put on exactly how the numbers are provided in response to other numbers are still the map and not the territory, even in the world of programming where arguably the map and the territory are as close as they can possibly be and still be in reality.

randomdata
0 replies
22h48m

And, of course, if you go the other way and get closer where databases and functions are different enough to be considered different things, the filesystem is still a database. It is meant to be a database in every sense of the word.

blago
0 replies
18h41m

One can probably say that there exists a level of abstraction where there’s not really a difference between a database and a file system. That's not a lot :-)

dhosek
2 replies
1d1h

I remember back in the 80s thinking that a file system that was organized like a relational database¹ would be a really wonderful thing. Files could live in multiple places with little difficulty and any sort of metadata could be easily applied to files and queried against.

1. I had read the original paper on database normalization over the summer and was on a database high at the time. I was young.

rad_gruchalski
0 replies
1d1h
notaharvardmba
0 replies
1d1h

AS/400 was doing this in the 80’s…

daemonk
2 replies
1d

I tend to agree. I see databases as a type of file system with more strict constraints in terms of reading/writing.

One can maybe argue that file systems are just an address book and databases are a more much complicated address book.

randomdata
1 replies
1d

Pedantically, it is the file system that is a type of database. Traditionally, database is the low-level generic term, referring to any type of structured data stored on a computer. File system, also known as the hierarchical database, adds additional specificity, referring to a particular structuring of data. Another common one is the relational database, offering another particular structuring of data.

naikrovek
0 replies
23h13m

LDAP and the Windows registry are hierarchical databases, just like a traditional file system, so the “file system = database” makes a lot of sense to me.

0xbadcafebee
1 replies
1d

If anything a database is a form of filesystem, as the name filesystem comes from 'file system', a system of organizing files or records. But filesystems officially came after databases, as early databases were designed to make best use of hardware and storage devices to store and retrieve data efficiently, making it easier and faster for computers of the time to use the data. So databases were, effectively, the first filesystems.

But the distinction is pretty small. Both filesystems and databases are just wrappers around a data model. The former is primarily concerned with organizing data on a disk (with respect to space, speed, location and integrity), and the latter is primarily concerned with organizing and querying data (with respect to ease-of-use, speed and integrity).

People today seem to think relational databases were the first and only databases. But there many types of database: flat, hierarchical, dimensional, network, relational, entity–relationship, graph, object-oriented, object-relational, object-role, star, entity–attribute–value, navigational, document, time-series, semantic, and more.

The earliest filesystem, CP/M filesystem, was basically a flat database. Successive filesystems have taken on other data models, such as hierarchical, network and navigational. Since filesystems are used as a low-level interface to raw data, they didn't need more advanced data models or forms of query. On the other hand, IBM DB2, Hadoop, and Google File System are all forms of database filesystems, combining elements of both databases and filesystems.

dboreham
0 replies
23h43m

Quick note that CP/M isn't even close to the "earliest filesystem".

wayfinder
0 replies
1d

And you start on the journey when you first learn about hash maps or binary trees.

theGnuMe
0 replies
1d2h

Yes but move beyond the file system view and head straight to objects..

qntty
0 replies
1d

See this talk for someone who tried to do this with MySQL on Linux: https://www.youtube.com/watch?v=wN6IwNriwHc

lupusreal
0 replies
1d1h

Filesystems are hierarchical databases, as opposed to relational databases (relational is usually implicit when people simply say "database", but this wasn't always the case.)

embit
0 replies
22h52m

If I remember correctly the Indian Railway Passenger Reservation System was built using DEC VAX/VMS file system

api
0 replies
1d

AFAIK theoretically any database can be built on top of a key value store, and any transactional database on top of a key value store that also has transactions.

TiDB is an example of a distributed SQL on top of a transactional key value store called TiKV.

acchow
0 replies
18h35m

It's not about the indexes. Databases support transactions and ACID properties.

File systems do not.

They have some similarity in that they both store data, but that's about it.

Vicinity9635
0 replies
17h35m

Sadly I never got to work on this when I was at Apple (interviewed for it though!), but hearing about this a few years ago sort of made me realize something that should have been obvious: there’s not really a difference between a database and a file system.

Many years back I came to the realiziation that a database is just a fancy data structure. I guess a file system is too.

Mutttttioi
0 replies
1d2h

Thats how Amazon made Aurora. Move all state onto the object storage layer which is also at the end of processing (you go through the lb, than frontend, than backend, than database and land on disk).

Stateless is basically moving everything to the back.

Im pretty sure google is doing the same thing/started with it.

Also this makes it 'easily' scalable horizontal: As soon as you are able to abstract on object level, you can scale your underlying infrastructure to just handle 'objects'.

BugsJustFindMe
27 replies
1d2h

I wish they'd build iCloud to store my Time Machine backups.

crazygringo
23 replies
1d1h

Agreed.

I'm utterly baffled why my iOS backups can live in Apple's cloud but not my Mac ones.

I honestly expected them to launch it years ago. The fact they still haven't seems to mean they've firmly decided not to for some reason, but I'm totally clueless as to what the reason could be.

Especially when making more money off services is a strategic priority for the company.

arrowleaf
17 replies
1d1h

I'm mildly surprised they haven't, but the reasons seem pretty obvious. Redundancy (in offerings), storage costs, and home network upload speeds.

Redundancy because the thing most people care about backing up is media and important documents, which are likely already stored in iCloud. If you care about Time Machine back ups you probably want your whole filesystem with point-in-time restores. That's a lot more data for Apple to hang onto, for a small segment of its target market. Of course, Apple does have 2TB+ iCloud+ plans, but I would bet that the average iCloud+ subscriber is using nowhere near their limit.

Snow_Falls
6 replies
1d1h

But apple charges for storage space? Surely people needing more storage is a huge plus for Apple. Maybe they had worries about scaling storage capacity? A company like Aple could certainly figure it out though so that seems unlikely

arrowleaf
5 replies
1d

My point is that I'm sure the only way iCloud is profitable or even break-even for Apple is if they rely on over-provisioning storage to users of the more paid plans. I started paying for the 200GB iCloud+ plan, and once my photos exceeded 200GB I ponied up for the 2TB plan. Unless I take up a photography hobby it'll be a long time until I get close to that 2TB, and I'd wager this is what Apple expects. Raising that baseline usage with Time Machine backups would mean it would need to be more expensive for end users, either by making iCloud+ more expensive or rolling out a new subscription product.

rexelhoff
3 replies
23h39m

it'll be a long time until I get close to that 2TB I thought the same thing until I realised that, with Family Sharing and a house with teenage kids sending each other embedded videos in iMessage, the time wouldn't be that long...

Suddenly I find myself 1TB in, and desperate to find a fix!

killingtime74
1 replies
23h5m

Boot them out of family sharing and get them to pay for it themselves

spacedcowboy
0 replies
22h14m

Ninja economics for the win.

crazygringo
0 replies
23h17m

There's an interface on the phone to sort message attachments, by size, and delete -- exactly to reclaim that space.

On the other hand, that would require convincing your kids to do that...

crazygringo
0 replies
23h19m

But sure then -- just charge more, or a new subscription product as you suggest just for Time Machine. They can even tie pricing to the size of your Mac's disk if they want. They can definitely make the economics work if they choose to.

crazygringo
4 replies
1d1h

Of course, Apple does have 2TB+ iCloud+ plans, but I would bet that the average iCloud+ subscriber is using nowhere near their limit.

But that's my point. To sell the 2TB plans to people who are merely on the free 5 GB or paid 50 GB plan.

And yes -- I don't even keep many files on my Mac, it's mostly in the cloud already. But if it gets lost/stolen, I want to restore all my apps and preferences the same way I do with my phone. Which is why I use Time Machine with a NAS, but it's silly to need a NAS at all. I just want to use the cloud.

arrowleaf
2 replies
1d1h

I agree with you here and that's why I'm mildly surprised they haven't come up with a solution rolled into iCloud yet. Syncing apps and preferences shouldn't be that difficult, but unless they're App Store applications the binaries would take up a lot of space. Most of the apps I care about are from outside of the App Store. AFAIK our iOS backups don't actually back up application binaries.

The way I was looking at it is that Apple has successfully sold iCloud+ 2TB plans to a lot of people who don't need much more than 200GB. If everyone on the 2TB plan used even close to 2TB, I'd bet they'd have to charge me a lot more to make up the provisioning and usage costs of storage.

hot_gril
1 replies
1d

Wonder if there are economies of scale storing multiple users' backups that may partially contain a lot of the same data. If 10000 separate users' backups contain the same 10GB app binary...

crazygringo
0 replies
23h25m

Yeah, I mean Time Machine backs up the entire OS as well.

I would have no problem if Time Machine separated out OS and known signed application packages and basically just stored pointers to standard versions of them, as long as all that detection is done client-side.

There's no reason the backup would need to store anything but the list of those files (that list being encrypted), and then everything unique to me -- my configurations, my files, etc.

hot_gril
0 replies
1d

I haven't bothered with this in a while, but back in the day, I used to use Carbon Copy Cloner to get a true 1:1 backup. Time Machine was never exactly the same.

rollcat
3 replies
1d1h

At one org, we went for the highest-tier Google Drive plan (with unlimited storage), because we've had this 1% of our internal users who would really, really benefit from having it. We could only go all or nothing (and the lower tier would meet the needs of the 99%), but the cost-benefit of enabling it for everyone was still pretty good.

I suppose Apple is keeping track of these numbers as well (keep in mind they know exactly how much storage each Mac has - because you can't expand it). I am also hoping it's under intensive internal testing; the quality of their software has been going downhill for a while, no power user would ever care if they shipped another broken product.

lelandfe
2 replies
1d

you can't expand it

They’ve brought back SD slots in recent years: https://support.apple.com/en-us/102352

rollcat
0 replies
21h23m

Neither this nor an external SSD are very practical - ask me how I know.

Meanwhile NVMes are a dime a dozen, and some laptops can fit two.

arrowleaf
0 replies
1d

Even better, external NVMe SSD enclosures over Thunderbolt 3 can reliably read at 2500 Mbps and write at over 1500 Mbps. That's faster than internal SSD R/W speeds a few years ago. The newer generation of enclosures coming out claim to use the full bandwidth of USB4, 40 Gbps, and get >3000 Mpbs R/W.

baby_souffle
0 replies
1d

I'm mildly surprised they haven't, but the reasons seem pretty obvious. Redundancy (in offerings), storage costs, and home network upload speeds.

I'd bet that the rigid APIs on iOS also play a huge role here. Compared to the "anywhere you have permission to `open()` on disk" approach on macOS, iOS developers don't have as many options for where/how to store data. This probably makes backup / restore an order of magnitude simpler / reliable.

newsclues
2 replies
1d

Perhaps it’s because iCloud is based on AWS and Azure and the economics don’t make sense at this scale?

pulisse
0 replies
14h42m

The overwhelming majority of Apple's cloud operations are in Apple-owned data centers.

crazygringo
0 replies
23h21m

I'm sure Apple is getting excellent rates unavailable to you or me.

If anybody can decide to start building out massive datacenters of their own, it's Apple, and AWS/Azure know that.

And Apple just passes its rates along to the consumer. It costs what it costs.

eh8
1 replies
1d1h

It isn't as polished as whatever first-party solution Apple has the potential to develop, but I just use OneDrive to restore my personal data + chezmoi to reprovision my dotfiles and it works pretty well.

About every six months I do a fire drill and completely factory reset my macbook. Takes about 10 minutes for me to go from a fresh device to one that has all my apps, data, and developer tools ready to roll. Only annoying thing you can't really automate is signing into services like OneDrive or Dropbox, but this isn't a problem if you use iCloud Drive.

https://github.com/eh8/dotfiles

greggsy
0 replies
21h58m

I’ve rebuilt using brewfiles a few times. Surprisingly painless.

hinkley
0 replies
16h27m

I don’t have that kind of bandwidth and I’m a developer.

crossroadsguy
0 replies
19h4m

It’s the Hanlon’s at work.

camel_gopher
0 replies
1d1h

Same here, but the “lots of Cassandra instances” approach isn’t really oriented for continuous versioning. One may notice the availability lags with the current iCloud implementation which sometimes come across as inconsistency.

colesantiago
12 replies
1d2h

Does anybody other than Apple use FoundationDB in production?

jwr
2 replies
1d2h

Snowflake is the big well-known user, but it seems there are many smaller production users as well.

I'm planning to migrate to it. It's quite simply the best distributed database out there today.

meowtimemania
1 replies
21h3m

Would you self host foundationDB? It seems there aren't many providers.

jwr
0 replies
1h40m

Yes, definitely! I have been self-hosting everything for years now, and I'm very happy. Even a three-machine bare-metal cluster has impressive computing power and is difficult to grow out of. I can't envision growing out of a five-machine cluster.

I found higher-level solutions (like AWS) to be slow, complicated and expensive, and I really can't see any reasons to use them.

mk12
1 replies
1d2h

Snowflake used it when I worked there and I assume still does.

yukIttEft
0 replies
1d2h

Would you mind sharing what made you quit Snowflake? (I'm considering applying there)

theythem
0 replies
1d2h

Deno KV

superaking
0 replies
1d1h

Exoscale

lokar
0 replies
1d2h

Wavefront

jeffbee
0 replies
1d2h
esafak
0 replies
1d

SurrealDB

FelipeCortez
0 replies
1d2h
DASD
0 replies
1d2h

Open Source Stalwart E-mail(IMAP/JMAP) server recommends using FoundationDB for distributed setup backends.

https://stalw.art/docs/storage/backends/foundationdb

Y-bar
11 replies
1d2h

…and still can't show which 114 images in my iCloud photo library cannot be synced.

Phone says this since many years and iOS updates back: 10365 photos synced, Mac says: 10251 photos synced.

troupo
3 replies
1d2h

...and still can't sync read/unread and deleted status in iMessage between Mac and iOS

latexr
1 replies
1d2h

I may be misremembering, but I think the deleted status does not sync on purpose unless you have “Messages in iCloud” turned on. On the Mac it’s under System Settings > [Your Name] > iCloud > Show More Apps…

troupo
0 replies
1d1h

I have Messages in iCloud turned on, and still...

Deleted messages would disappear in Big Sur. But read status wasn't synced for unread messages

In Catalina if you delete an unread message on iOS, it will disappear on MacOS, but Messages will still have an "1 message unread" badge.

crossroadsguy
0 replies
18h59m

Now sit back and wait for a million attempted interpretations on how this could be just not accepting it and that this is how it is suspected to behave. And just marking any message unread and read triggering probably some job fixing it so normal and needed step for this flawless feature to work.

You might also be told that you are supposed to delete those messages on every device and that if you expect it to work automatically then you don’t get it.

crimbles
2 replies
1d2h

This is easily fixed on the mac: https://support.apple.com/en-us/HT204967

Had the same problem and this fixed it.

Y-bar
1 replies
1d2h

I remember that tool, it did not work last time it was recommended.

crimbles
0 replies
1d1h

You have to leave it for a couple of hours to consolidate the local SQLite DB and iCloud. The results are not immediate.

throwaway_08932
0 replies
1d

Mom had an issue where all her iCloud photos were syncing except the ones she'd taken after they renovated the kitchen. She had photos of everything but the kitchen synced.

latexr
0 replies
1d2h

If you’re feeling adventurous, something you can try on the Mac is to trash `~/Library/Application Support/CloudDocs` and then restart the daemons by running `/usr/bin/killall bird cloudd`.

I only used that once, but it fixed all the months of odd syncing I had experienced.

koolba
0 replies
1d2h

So they’re both synced, but they’re not in sync?

Perhaps it’s a Heisenberg type situation where measuring whether a file is synced itself changes the sync status.

ipython
0 replies
22h25m

I had a similar issue when I was trying to back up all my iCloud photos to S3 through the PhotoSync app[0]. I had about 600 photos that could not be downloaded from iCloud photos onto my iPhone. I ended up disabling iCloud Photos on the iPhone, then re-enabling it. This did end up making those photos available for download and the sync worked... it was rather nerve wracking though.

[0] https://www.photosync-app.com/home

chazeon
6 replies
21h9m

With iCloud Apple indeed handles well update conflicts in Apple Notes. I have tried to set up Obsidian or any other Markdown-based notetaking system, the sync is so often and I had to give up. Apple Notes does handle this pretty well. So I finally moved to Apple Notes.

noname120
1 replies
20h20m

I haven't had any issues with Obsidian Sync. If you attempted to synchronize your vault with iCloud I'd recommend to give Obsidian Sync a try instead.

yosito
0 replies
20h15m

Honestly, Obsidian with iCloud is so bad, that I'm afraid to pay for Obsidian Sync because half the time the errors and freezing of the Obsidian app seem like they have nothing at all to do with iCloud. It's really hard to tell, because Obsidian doesn't surface any errors, it just randomly freezes and has trouble opening files that should be there.

jimmydoe
1 replies
16h57m

My experience is the opposite. I lost data twice with Apple iCloud Notes, once with its major upgrade deleted many of my notes, in the other case most my attachments became blank, I'm not on that boat ever again.

lawgimenez
0 replies
16h39m

I experienced data loss on Messages lately. But, I understand it might take some significant time since I set it to never delete forever.

a_wild_dandan
1 replies
20h39m

I can't do without Obsidian now. Its default graph representation of knowledge matches how my scatterbrain works. It has the creature comforts I've come to expect: simple (local) text storage, a fast command/search palette, gobs of integrations (e.g. Excalidraw for my tablet). Watching one of my knowledge vaults evolve is incredibly satisfying.[1]

Obsidian is the only note app that I've stuck with. Notion/Apple Notes/Goodnotes/etc just had excessive pain points. Obsidian "just works" for my brain. Which is a relief, since the productivity app treadmill is exhausting.

[1] https://i.imgur.com/IrVy2mk.mp4

steve_adams_86
0 replies
17h39m

Something I really appreciate about Obsidian is that they seem to be keeping the core application constrained and clearly defined. I worried they would adopt plugins into the application and have things kind of bloat out of control, but they've maintained a clear separation (even now with many plugins not working with Obsidian Publish). That can be a hard line to maintain and protect when you have paying customers and they're doing a great job sticking to what they're good at.

ultra-jeremyx
4 replies
1d

This sounds a lot like AWS Aurora, which (I'm simplifying here) is a database interface on top of a distributed file store, (S3).

gwright
2 replies
1d

This is the first time I've seen it suggested that Aurora is implemented on top of S3.

This overview doesn't mention S3: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

ultra-jeremyx
1 replies
23h37m
mr_toad
0 replies
17h41m

Athena is a query engine, not a data store. It’s actually built from Presto (or maybe it’s Trino now, I’m not sure).

jahewson
0 replies
23h8m

I assume you mean AWS Athena - but no this is quite different from FoundationDB. Athena separates compute from storage (it’s Presto https://prestodb.io/ under the hood). Think of it as an on-demand SQL compute cluster. FoundationDB is a traditional combined storage/compute cluster. The Record Layer does provide some ability to scale-out the higher-level aspects of querying but it’s just a client library, not a separate compute service.

monstrado
4 replies
1d1h

I leveraged FoundationDB and RecordLayer to build a transactional catalog system for all our data services at a previous company, and it was honestly just an amazing piece of software. Adding gRPC into the mix for the serving layer felt so natural since schemas / records are defined using Protobuf with RecordLayer.

The only real downside is that the onramp for running FoundationDB at scale is quite a bit higher than a traditional distributed database.

sidcool
3 replies
1d

Sounds cool. Any write up on this? How did you approach the design? What was the motivation to use foundation db? How much did you/your team needed to learn while doing it?

monstrado
2 replies
23h37m

No write up, but the main reason was reusing the existing database we were comfortable deploying at the time. We were already using FDB for an online aggregation / mutation store for ad-hoc time-series analytics...albeit, a custom layer that we wrote (not RecordLayer).

When RecordLayer launched, I tested it out by building a catalog system that we could evolve and add new services with a single repository of protobuf schemas.

sidcool
0 replies
15h7m

Thanks. What are the typical use cases for FDB? What can it do that, say, Cassandra can't?

extractionmech
0 replies
16h56m

can you do a concise +/- on FDB? I’ve always thought it was a fantastic architecture but never tried it. tia

alberth
3 replies
1d

SQLite & HCTree

Given that FoundationDB is built on top of SQLite, I wonder if that team is eyeing the HCTree engine for it.

It's still in experimental mode but provides literally 10x improvement on read/writes to SQLite.

Given Apple size & scale of iCloud, that seems like a massive win for them if that engine for SQLite can mature to production stability.

https://sqlite.org/hctree/doc/hctree/doc/hctree/threadtest.w...

rapsey
1 replies
1d

FoundationDB only uses the b-tree implementation and even that I don't know if it is still used as they switched storage engines I think.

pulisse
0 replies
14h40m

Yes, to RocksDB.

georgelyon
0 replies
1d

They have built their own storage engine named Redwood, which has some very FoundationDB-specific optimizations (like prefix compression). Check out the "Storage Servers" section in this doc: https://apple.github.io/foundationdb/architecture.html

zaking17
2 replies
1d1h

How would you handle schema migrations in a system like this?

ananthakumaran
1 replies
1d1h

It depends on the layer, some of the layers might be able to take advantage of how the data is persisted. For example, if you use avro/protobuf, the decoder will handle it for you. If that's not the case, you would have to implement the migration by yourself. There is a paper[1] on this subject called "Online, asynchronous schema change in F1", which explains how to implement it.

1: https://dl.acm.org/doi/abs/10.14778/2536222.2536230

zaking17
0 replies
17h32m

thanks, i'm really enjoying that paper

yosito
2 replies
20h12m

Great! If only I could manage which of my files stay local, and which get offloaded to iCloud I might be impressed. But it seems that iCloud likes to offload recently used files, apps and photos to make room for my massive library of old photos. It frequently makes my iPhone unusable unless I'm on wifi, and then I still have to wait for everything I want to use to re-download from iCloud.

thirdsun
1 replies
7h52m

At least for apps and photos you can disable it. For files I'm not sure.

yosito
0 replies
5h42m

Yes, but I don't want to disable it. I want to be able to mark specific items to never be offloaded, and I wish offloading would offload old things before recently used things.

jelder
2 replies
21h39m

Very cool. This is the architecture that inevitably results from when you start with boxed, native, desktop software and incrementally move towards cloud based storage and collaboration. You have to be really good at doing schema changes and version migrations, because they're happening at fantastic scale without administrator intervention: not when you launch, but when each individual customer chooses to use the next version.

Quite different from a SaaS-first approach where it actually makes sense to do "customer id column"-based multi-tenancy and one-migration-at-a-time schema changes that I think most of us at less-than-Apple scales are familiar with.

meowtimemania
1 replies
21h5m

Is there any writing about these types of schema changes? It's something I run into using dynamodb.

AtlasBarfed
0 replies
14h1m

At least with Cassandra, there are cell-level timestamps which are very useful for doing data migrations while active writes are still incoming.

You can simply mirror the writes to both systems, and then migrate the old data underneath. As long as the data transfer preserves the cell level timestamps, the read path resolves any differences and compaction will eventually clean up any duplicates. (and sstable loads will have the timestamps)

Dynamodb does NOT have cell level timestamps, I believe they have row level timestamps. How it is doing globally replicated data and mutation merges: I have no idea. It seemed like a handwave when they were announcing it about two or three years ago.

throwitaway222
1 replies
1d

2005 - We need 1 database

2010 - We need 2 databases

2015 - We need 500 databases

2020 - We need billions of databases

2025 - Prediction: We need 1 database.

Spivak
0 replies
23h47m

I mean this is pretty much your prediction, one ginormous database that creates the facade of billions of logical databases within.

redbell
1 replies
21h7m

On an unrelated note, having the original title edited by the system after being submitted without the OP being noticed really annoys me, especially when the title starts with How, Why and other terms. It just made it a little weird to read, and sometimes it breaks the meaning. I once submitted a story and had some people complaining about the title being somehow misleading. When I noticed this, it was too late to edit the title.

In the HN guidelines, you read: "Otherwise, please use the original title, unless it is misleading or linkbait; don't editorialize."

I hope this will be taken into consideration.

CharlesW
0 replies
20h45m

Your feedback may not be seen here, but the admins are supernaturally responsive to notes sent to hn@ycombinator.com.

oblib
1 replies
1d1h

CouchDB implements a DB per user approach. Personally, I've found it much easier to use than an SQL DB for web apps I've made, but I've heard others who've always used SQL say they were frustrated with it.

randomdata
0 replies
23h21m

The thing with SQL databases is that the API they offer is designed for low-latency operation. This is not a big deal (ideal, even!) when the application and database share the same memory space where latency is imperceptible. And when it was originally designed, that was the norm, but at some point someone got the idea that they could expose the same API over the network. The network where latency is higher. That is where things start to fall apart.

It is nothing you cannot overcome with the right hacks (what can't be overcome with the right hacks?), but it is frustrating that the network-based API wasn't designed for high latency use from the start. It didn't need to use the exact same API that was designed for a low-latency environment, but that's what we got. As SQL and web apps typically means MySQL or Postgres, that means you are apt to encounter the API design problems.

Granted, it seems there is renewed interest in SQLite to move the SQL database back to the way it was designed to be used. Which isn't surprising as all things in computing come and go in cycles. Once we round out that cycle and get back to "database on the network", maybe we can get a more well designed API meant for high latency to remove those frustrations.

clintonb
1 replies
1d2h

…and still struggles to sync one file from my laptop to my phone.

morelish
0 replies
22h9m

Yeah I gave up on it trying to sync photos. The apps on the desktop and mobile gave no indication of its state processing files. So I was waiting after a large upload for replication to occur days later and I didn’t know if it would ever complete.

citizenpaul
1 replies
22h29m

This reminds me of years back when I worked in banking. I vaguely recall there was a report system called Hyperion(an?) (IBM?). The system generated a new database for every single report it made. I thought that was kinda crazy at the time but I guess it was ahead of the times.

Someone feel free to correct my memory if needed, I was not the primary person for this system or anything so I could be totally wrong.

therein
0 replies
19h0m

Funny Apple has an internal Hyperion that happens to be related to the iCloud aspect of Photos.

thund
0 replies
14h27m

We gave up on iCloud for file sync, it’s broken on dozens of devices trying to “optimize” storage even when asked not to. Imagine having 4Tb (size doesn’t matter) mostly empty hard drives and not being “allowed” to keep a file copy offline, because iCloud knows better…

Now Apple is asking all file sync products like Dropbox to do the same, see Fileprovider API, breaking those as well. Really annoying

sporkland
0 replies
22h39m

Previously:

"FoundationDB: A Distributed Key-Value Store" [https://news.ycombinator.com/item?id=36572658]

"FoundationDB Record Layer" [https://news.ycombinator.com/item?id=18906341]

"Apple Acquires FoundationDB [https://news.ycombinator.com/item?id=9259986]

"How FoundationDB works and why it works" [https://news.ycombinator.com/item?id=37552085]

kjkjadksj
0 replies
22h17m

If only large downloads from Apple like iCloud pulls didn’t time out and you weren’t trapped in these databases as a result