A random event from my calendar serializes to 740 bytes of iCalendar. An extremely busy calendar containing one event every 15 minutes 9am-5pm 7 days per week only contains 11680 entries for a full year, which works out to around 8MB.
Would suggest instead of designing a schema at all, a calendar is a good example of a problem that might be far better implemented as a scan. Optimizing an iCalendar parser to traverse a range of dumped events at GB/sec-like throughputs would mean the above worst-case calendar could be scanned in single-digit milliseconds.
Since optimizing a parser is a much simpler problem to solve once than changing or adding to a bad data model after it has many users, and that the very first task involving your new data model is probably to write an iCalendar importer/exporter anyway, I think this would be a really great trade off.
iCalendar is designed as an interchange format not as a storage format for fast access of calendar data.
The format shows its age - you can tell it was designed before XML/JSON were "hot".
see https://en.wikipedia.org/wiki/ICalendar
or you can tell it was designed by people who care more about performance.
> > you can tell it was designed before XML/JSON were "hot
I doubt parsing iCal is significantly more performant than JSON for most use cases. In fact I can image it being less so in more cases than it is more so, and as close to the same as makes no odds in the vast majority of cases.
Parsing iCal was far more performant than parsing JSON as your parser could simply start at the first byte, and follow the spec — whereas a JSON parser would first need to wait for ten years before JSON was invented.
the iCalendar rfc shows nov 1998 - https://www.rfc-editor.org/rfc/rfc2445
iCalendar is based on vCalendar, http://www.imc.org/pdi/vcal-10.txt - need to go to archive.org to see it, which shows Sept 1996.
from wikipedia, XML is listed as first published in Feb 1998, JSON is early 2000s.
edit: XML'd iCalendar, 2011 - https://datatracker.ietf.org/doc/html/rfc6321
JSON'd iCalendar, 2014 - https://datatracker.ietf.org/doc/html/rfc7265
It is also a data model that amounts to a de facto standard across all real calendaring apps. The text serialization sure is ugly and who would want to touch it unless they had to, but that's just a parsing problem. The wiki page shows 14 component types, who is ever going to think of that many when designing a replacement schema? Why not just use the data model you already have 'for free'
It's a data model for an early-mid 1990s calendaring app (Lotus Organizer / Microsoft Schedule+).
If that works for you, go for it.
But what is the exact criticism here, e.g the serialized form doesn't look "nice" when you open the file?
A defacto standard for interchange ("feeds" announcing changes to activities), not for a database (remembering a large set of activities indefinitely).
The appropriate schemas are vastly different: a message like iCalendar should be simple and self-contained (leaving cleverness to the applications exchanging it), while a database like the article discusses should be normalized (nontrivial structures and queries aren't a problem).
Showed up to a doctors appointment during their lunch break once because iCal supports the time format "TZ is whatever is local, trust me" and the device I used to add an appointment was set to UTC, not UTC+1.
My doctors calendar app vendor was pretty happy I found the root cause of their very occasional mystery appointment drift too.
New years is midnight, local time, wherever you are. Trust me.
If I invite you (who lives in Spain, let’s say) to fly over and enjoy my New Year’s party (in New Zealand) it definitely is not going to occur at “midnight local time wherever you are” - Trust me.
(Actually there’s a famous missing person’s case related to a New Year’s party in New Zealand … I don’t think that timezones were part of what went wrong, but I can’t be sure.)
It’s like the email format - it’s the wild west trying to parse these things, it doesn’t help that date/time is already a tricky subject.
In the Wild West things were easier —
“Meet me at high noon!”
Was never met with -
“High noon? What time zone?”
Wouldn't you have to re-invent things like ranges searches which are 'BETWEEN ... AND ...' queries in SQL? The same for finding events that are for users 1, 2 and 3, etc.
In a real application you'd probably have some user accounts of some sort that are stored in a relational database already and then you'd suddenly have to scan for events in a directory that you then have to connect to those records in the database.
So there might be some specific set of applications where you are right, but there are specific things that a database is really good at, which would make it a really good choice. With the proper indices you'd probably get the same or even better throughputs, unless you come up with some clever directory structure for your events, which would in fact be the same as an index and only on one dimension whereas in a database you'd be able to create indices for many dimensions and combinations of dimensions.
So you are right, trade offs.
I didn't mean to imply avoiding use of a database entirely, almost any DB system tasked with copying a few long strings around in a simple query won't perform much worse than a literal raw disk file.
Even just something like: CREATE TABLE calendar(id, user_id, blob)
Yes sure. I can imagine though that normally you'd also want to be able to query on details of an event. In which case having most things in columns would make sense because you can combine it with JOIN queries, etc.
Also, in the context of web applications, you probably already have a database and probably don't have persisted disks on your application servers, which then adds complexity to the file based scenario. In which case using blobs is a perfectly fine solution indeed.
Still you are right that in many cases, let's say a desktop application, you are probably better off reading directly from tens of files on disk rather than having to deal with the complexity of a database.
The same applies to vector databases. I read an article a few months ago that spoke about just storing vectors in files and looping through them instead of setting up a vector database and the performance was pretty much the same for the author's use case.
And then you quickly get to the not-author's-use cases where you have to start reinventing wheels, poorly.
If I understood him correctly, I think this is where some language collections libraries ought to shine. PHP/Laravel collection or C# Link for exemple. Tell it how to load a record into the collection, add one liners for each criterias you want to define, and in a few dozen lines you're free to go.
Such an approach might also provide a neater solution to the infinite time-slots problem mentioned in the article.
Maybe you can do without a DBMS, but you still need a good schema for your data, and it isn't a bunch of iCalendar records: for example, instead of a stable user ID there are only mutable and repeated email addresses.