return to table of content

Show HN: High-precision date/time in SQLite

alberth
15 replies
2d4h

Does this handle the special case of timezone changes (and local time discontinuity) that Jon Skeet famously documented?

https://stackoverflow.com/questions/6841333/why-is-subtracti...

And computerphile explains so well in their 10-min video:

https://www.youtube.com/watch?v=-5wpm-gesOY

---

I've long ago learned to never build my own Date/Time nor Encryption libraries. There's endless edge cases that can bite you hard.

(Which is also why I'm skeptical when I encounter new such libraries)

sltkr
14 replies
2d4h

This library doesn't deal with the notion of local time at all. It's all UTC-based times, possibly with a user-supplied timezone offset, but then the hard part of calculating the timezone offset must be done by the caller.

I do think the documentation could be a little clearer. The author talks about “time zones” but the library only deals with time zone offsets. (A time zone is something like America/New_York, while a time zone offset is the difference to UTC time, which is -14400 seconds for New York today, but will be -18000 in a few months due to daylight saving time changes.)

Someone
7 replies
2d

It's all UTC-based times

Not even that. UTC has leap seconds, which this code doesn’t handle (FTA: “The calendrical calculations always assume a Gregorian calendar, with no leap seconds”)

It copies that from the golang time package, which makes the same claim (https://pkg.go.dev/time)

That makes life a lot simpler for the implementer, but doesn’t that mean you can only reliably use these two libraries for computing with durations, not with moments in time or vice versa? The moment you start mapping these times to real world clocks and adding durations to them, you run the risk of getting small (up to about half a minute, at the moment) inconsistencies.

doctorpangloss
4 replies
2d

Another POV is, why build for the SQLite "ecosystem" at all?

devmor
3 replies
1d22h

Probably because SQLite is one of (if not THE most) widely used database implementations for IoT software in the world.

That’s like asking why someone made a package for javascript.

ErikBjare
2 replies
1d20h

SQLite is the most widely used database in the world. Full stop.

raggi
1 replies
1d17h

after a few things like hosts files which exist on every major operating system including windows

vsuperpower2021
0 replies
1d15h

You can find dozens of SQlite databases on a single phone if you want to be pedantic

nottorp
1 replies
1d20h

The calendrical calculations always assume a Gregorian calendar, with no leap seconds.

That could drift a bit if you can represent a 580 million year interval (with millisecond precision), wouldn't it?

nalgeon
3 replies
2d3h

Thanks for the suggestion! True, only fixed offsets are supported, not timezone names.

alberth
2 replies
2d1h

@nalgeon

Do you plan to address the use cases in the SO post, or asked differently - what is the intended use case of this library?

I tried to recreate it on your site (which is very cool btw in allowing the code to run in browser) and it seems to fail and give the wrong time difference.

  select time_compare(time_date(1927, 12, 31, 23, 58, 08, 0, 28800000), time_date(1927, 12, 31, 23, 58, 09, 0, 28800000));
Results in an answer of '1', which is incorrect.

Please don't take my comments as being negative or unappreciated, this is super difficult stuff and anyone who tries to make the world an easier place should be thanked for that. So thank you.

----

EDIT: this post explains why the answer isn't "1"

https://stackoverflow.com/questions/6841333/why-is-subtracti...

nalgeon
0 replies
2d1h

I appreciate your comments, and thank you for trying out the extension.

This query returns -1 (minus one, not one), which seems correct to me. The first date is before the second:

    select time_compare(
      time_date(1927, 12, 31, 23, 58, 08, 0, 28800000),
      time_date(1927, 12, 31, 23, 58, 09, 0, 28800000)
    );

    -1

kaoD
0 replies
2d

As discussed in the top-level comment, this library has no concept of timezones (only offsets) so the SO link does not apply. The time rollback only happened in Asia/Shanghai.

akira2501
1 replies
1d23h

A time zone is something like America/New_York

It's US/Eastern. Paul Eggert can call this a "deprecated compatibility time" all he wants, but "Eastern Time Zone" is the official name of the time zone as maintained by the civil time keeping authority.

michaelcampbell
0 replies
1d6h

If IANA is fine with it, that's fine with me too.

zokier
13 replies
2d5h

I just wish people would stop using the phrase "seconds since epoch" (or equivalent) unless that is exactly what they mean.

I wonder what does

    select time_sub(time_date(2011, 11, 19), time_date(1311, 11, 18));
return?

ralferoo
11 replies
2d5h

Why do you wish that?

I can think of a few plausible reasons, but the only one that is really significant is "what epoch"? In the case of UNIX-based systems and systems that try to mimic that behaviour, that is well defined. But as you haven't said what your complaints are, it's hard to provide any counterpoint or justification for why things are as they are.

time_date(1311, 11, 18)

That isn't defined in the epoch used by most computer systems, so all bets are off. Perhaps it'll return MAX_INT, MIN_INT, 0, something that's plausible but doesn't take into calendar reforms that have no bearing on the epoch being used, or perhaps it translates into a different epoch and calculates the exact number of seconds, or anything else. One could even argue that there are no valid epochs before GMT/UTC because it was all just local time before then.

But of course, you can argue either way whether -ve values should be supported. Exactly 24 hours before 1970-1-1 0:00:00 UTC could be reasonably expected to be -86400, on the other hand "since" strongly implies positive only.

Other people might have entirely different epochs for different reasons, again within the domain it's being used, that's fine as long as everyone agrees.

Or did you have some other objection?

zokier
10 replies
2d3h

The problem with "seconds since epoch" expression is that almost always it doesn't mean literally seconds since epoch, but instead some unix-style monstrosity. And it's annoying that you need to read some footnote to figure out what exactly it means; it's annoying that it is basically a code-phrase that you just need to know that it's not supposed to be taken literally.

ralferoo
9 replies
2d3h

it doesn't mean literally seconds since epoch, but instead some unix-style monstrosity

That "unix-style monstrosity" is literally seconds since the UNIX time epoch, which is unambiguously defined as starting on 1970-1-1 0:00:00 UTC.

Or it would have been, had leap seconds not been forced upon the world in 1972, at which point yes, arguably it's no longer "physical earth seconds" since the epoch but "UNIX seconds" where a day is defined as exactly 86400 UNIX seconds.

In retrospect, it'd have been better if UNIX time was exactly a second, and the leap seconds accounted for by the tz database, but that didn't exist until over a decade after the first leap seconds were added, so probably everybody thought it was easier just to take the pragmatic option to skip the missing seconds, exactly the same way that the rest of the world was doing.

I'm still not sure if that's what your complaint is about, as I don't know of time systems defined any other way handle this correctly if you were to ask for the time difference in seconds between a time before and after a leap second.

Maybe a better question would be: what do you think would be a better way of defining a representation of a date and time, and that would allow for easy calculations and also easy transformations into how it's presented for users?

edflsafoiewq
8 replies
2d2h

You literally did the exact thing GP is complaining about.

fluoridation
6 replies
2d1h

It's unclear if that's what they're complaining about, but if it is, the reason people just say "seconds since epoch" instead of saying what they actually mean is precisely because no one wants to get into the weeds of leap seconds. POSIX timestamps are a convenient format and they represent an idealized "second" of constant length that's in a uniform correspondence with the rotation of the Earth. Most likely if you're using such a representation you won't care that 1723740711 is not literally and exactly 1723740711 seconds since 1970-01-01T00:00:00Z and the error of a few seconds will not even be noticed.

edflsafoiewq
5 replies
2d

No, that is literally exactly what they just complained about.

And it's annoying that you need to read some footnote to figure out what exactly it means; it's annoying that it is basically a code-phrase that you just need to know that it's not supposed to be taken literally.

Even to point of deferring the real explanation to a secondary paragraph.

ralferoo
3 replies
1d19h

It seems you're arguing based on the assumption that real time is exactly 365x24x60x60 seconds every year or exactly 366x24x60x60 seconds on leap years. It's not.

The problem is that we have a very precise definition of a second in terms of decay of atoms (maybe precise is the wrong word, as it's the statistical likelihood of x atoms decaying given certain conditions, but whatever). The problem is arguably that this is over-defined.

There's a good case that a second is actually how it's always been defined historically up until 1967 - as 1/60th of a minute, which is 1/60th of an hour, which is 1/24th of a day. That's what UNIX seconds are. 86400 of them in one day. And we have a pretty good idea what a solar day is, and have been doing calendars based on them for thousands of years.

But if you want to base your times on the decay of caesium, then you can do that, but you have to accept that it no longer corresponds neatly to a solar day any more. The length of a day fluctuates by a couple of seconds a day in either direction, which we largely just ignore because over time, that mostly cancels out. Personally, I don't think leap seconds should ever have been introduced - over the last century or so, the earth has been rotating "faster" than our idealised second based on radioactive decay, so we've added leap seconds. But more recently, it's been rotating "slower", and we're at the situation where we need negative leap seconds. Maybe really, we should have just left it alone and over a longer period it'd all have averaged out anyway.

But what's interesting is that apart from the meddling with leap seconds, we've decided that a "typical day" has exactly 86400 seconds where a second is some constant time, even though that isn't true of the reality of our planet. Some days are too short when defined this way, some days are too long. But on average, this 86400 seconds is pretty much right.

And arguably, any day that needs a leap second isn't "wrong", the problem is actually that we over defined a second before we realised that the periodicity of the solar day wasn't a constant. I wouldn't advocate trying to redefine what a second is again, because actually having a constant time second is incredibly useful for defining all the other derived SI units. But with that usefulness, you also need to be aware that it's not the same as the traditional timekeeping second.

But in any case, except for leap seconds, all the world's time systems agree on 84000 seconds per day. So, can you make the case for why you think UNIX time in particular is a problem? And what would you rather have instead?

da_chicken
2 replies
1d14h

Because sometimes when you measure time you want to know what fraction of the day it is, and sometimes when you measure time you want an objective measure of the passage of time.

Quite often, we want to use something originally recorded using the former to calculate the latter. It is most convenient to have a second that is of fixed duration. Which is kind of exactly why the tz database exists in the first place.

Except it's made more complicated because most systems that use unixtime also use NTP, and that means they employ smearing because essentially nothing in computing supports 23:59:60 or 23:59:61 or repeated seconds of 23:59:59. So on the day of the leap, the recorded time for events doesn't match standard time. Which is why the unsmear library exists (among others, probably).

Note that TAI (international atomic time) truly is the number of actual seconds since it was first synchronized in 1958. That is what is used to define UTC, and it's about 30 seconds ahead of UTC currently.

All that is to say... Calling unixtime "seconds since epoch" is a forgivable sin in terms of the practicalities of communication, but it's not really defensible as a matter of being a factual description of reality. The truth is that the new definition of a second was agreed upon decades before Unix came along, and when we're measuring time in seconds we don't typically care about the solar day or sidereal day. Further, there is no practical way to construct a computer or clock (barring a sundial) so that supports the original dynamic definition of time divisions. I can't even imagine how relativistic times with GPS satellites would have to work. It would be the longitude problem all over again.

fluoridation
1 replies
1d3h

it's not really defensible as a matter of being a factual description of reality

Is it a factual description of reality? "Seconds since epoch" is an almost entirely abstract idea, given that neither seconds nor epochs exist in the universe. The only way it's connected to reality is that time moves forwards, so "since" has meaning. So it seems to me that someone who says "seconds since epoch" can choose to give the words any meaning they like as long as everyone understands what is meant.

So to me this is as relevant a complaint as saying that the special case of graphs shouldn't be called "trees" because branches in real trees sometimes rejoin. It's a metaphor. We're dealing with entirely human-made concepts with barely any input from the real world, we can use any words we like.

da_chicken
0 replies
4h23m

It's only as abstract as "meters of distance". Which is to say, not in any practical sense until we introduce general or special relativity. The fact that it's synthetic and not an natural unit doesn't mean it's abstract or variable. It means it's arbitrary. It could be any value, but it must remain static.

That's also why we don't use the kings body dimensions as the standard of measurement anymore. We don't need to recalibrate the entire nation's standard of "foot," "inch," "cubit," or whatever when the king grows. That's not useful.

Just because the inch is the width of an adult man's thumb doesn't mean the purpose of the inch is to define thumbs and should always follow from that. That is not the goal of wanting the measuring unit. It's the same for seconds.

That's why I said: Sometimes you want to know what time of day it is, and sometimes you want to know how long something took to occur. That is to say, a datetime and a timespan. Now, you could use a different unit of measurement for those to, but in practical terms it's stupid.

fluoridation
0 replies
2d

I mean if they're referring specifically to leap seconds or to some other obscure complexity about dealing with real time.

ralferoo
0 replies
2d1h

Again, the same question: what would be a better solution?

nalgeon
0 replies
2d4h

If the result exceeds the maximum value that can be stored in a Duration, the maximum duration will be returned.
davidhyde
7 replies
2d6h

I think it’s important to be explicit about whether or not signed integers are used. From reading the document it seems that they may be signed but they could not be. If they are signed then you could have multiple bit strings that represent the same date and time which is not great.

jagged-chisel
3 replies
2d6h

Definitely signed - “use negative duration to subtract”

But bit pattern is an issue internal to the library. If you can find a bug in the code, certainly point it out and offer a fix if it’s in your skillset.

sigseg1v
1 replies
2d5h

I think the negative number here refers to the amount of days/etc to subtract (eg. add negative days to subtract, not supply a negative date).

However, at the same time it seems to indicate that it stores data using sqlites built in number type, which to my understanding does not support unsigned? Secondly, the docs mention you can store with a range of 290 years and the precision is nanoseconds, which if you calculate it out works out to about 63 bits of information, suggesting a signed implementation.

tyingq
0 replies
2d5h

Yes, it's signed...https://www.sqlite.org/datatype3.html

  Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
  # some omitted...
  INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

gcr
0 replies
2d5h

Subtraction of unsigned negative values still works just fine because of two’s compliment.

(uint8)(-3) is 253, for example, and (uint8)5-(uint8)253 = (uint8)8, corresponding to 5 - (-3)

kaoD
2 replies
2d

multiple bit strings that represent the same date and time

How so?

davidhyde
1 replies
1d23h

You’re right, whether or not the integers are signed has nothing to do with the issue above. Unsigned integers have the same issue.

Here is an example for signed integers.

These represent zero time but have different representations in memory:

Seconds: 2 Nanoseconds: -2,000,000,000 (fits in a 32 bit number) Time: zero seconds

Seconds: -2 Nanoseconds: 2,000,000,000 Time: zero seconds

Here is an example for unsigned: Seconds: 1 Nanoseconds: 0 Time: 1 second

Seconds: 0 Nanoseconds: 1,000,000,000 Time 1 second

kaoD
0 replies
1d22h

Thanks, but I'm not gonna pretend that was my point. Dumb question from me, I just forgot the context that time was a pair of integers and was utterly confused, haha. You're spot on!

quotemstr
4 replies
2d3h

Related tangent: databases should track units. If I have a time column, I should be able to say a column represents, say, durations in float64 seconds. Then I should be able to write

    SELECT * FROM my_table WHERE duration_s >= 2h
and have the database DWIM, converting "2h" to 7200.0 seconds and comparing like-for-like during the table scan.

Years ago, I wrote a special-purpose SQL database that had this kind of native unit handling, but I've seen nothing before or since, and it seems like a gap in the UI ecosystem.

And it shouldn't be for time. We should have the whole inventory of units --- mass, volume, information, temperature, and so on. Why not? We can also teach the database to reject mathematical nonsense, e.g.

    SELECT 2h + 15kg -- type error!
Doing so would go a long way towards catching analysis errors early.

n_plus_1_acc
2 replies
2d1h

What about leap seconds?

quotemstr
1 replies
2d1h

The leap second mechanism amounts to a collective agreement to rewrite chronological history. It's like a git rebase for your clock. Everyone (almost) in practice does math as if leap seconds never happened, and the consequent divergence from physical time ends up not mattering.

SonOfLilit
0 replies
1d20h

... no?

If we add a leap second at the end of 2025, nothing in 2024 gets rewritten. Only the future meaning of pointer expressions like "12 pm on January 2nd 2025" change their value. When I want exactly 48 hours after 12 pm Dec 31, I use a leap second independent time representation. But since usually I want the same thing everyone calls 12 pm Jan 2, I usually use a representation that gives me that.

And I, among many, take meticulous care to do my date math (for a bank core system) only in ways that naturally support leap seconds.

out_of_protocol
4 replies
2d4h

Why not go golang style, unix timestamp as nanoseconds, in signed int64. Maybe you can't cover millions of years with nanosecond precision, do you really need it?

commodoreboxer
2 replies
2d4h

With that precision and size, you can only cover the years from 1678 to 2262, which strongly limits your ability to represent historical dates and times.

out_of_protocol
0 replies
2d1h

represent historical dates and times.

With nanosecond precision? Just decide what you want to do beforehand, i bet even datetime don't make much sense for that time period, bare date would suffice. also, you'll likely need location, calendar system etc since real dates were not that standardized back then

azornathogron
0 replies
2d1h

If you're representing dates back into the 1600s you need to keep in mind that calendar maths and things like "was this year a leap year" become more complicated. The Gregorian calendar was introduced in the 1500s but worldwide adoption took a long time - for example, the UK didn't adopt it until the 1700s. So you've got more than a century where just having "a date" isn't really sufficient information to know when something happened, you'll need to also know what calendar system that date is in.

Overall, this means if you're representing historical dates I would question whether a seconds-since-epoch timestamp representation is what you want at all, regardless of range and precision.

Edit: yes, you can kinda handle this as part of handling timezones, but still, it's complicated enough that you may want to retain more or different information if you're displaying or letting users enter historical dates.

nalgeon
0 replies
2d3h

Storing unix timestamp as nanoseconds is not Go's style, but you can do just that with this extension.

    select time_to_nano(time_now());
    -- 1722979335431295000

mynameisash
4 replies
2d5h

I find the three different time representations/sizes curious (eg, what possible use case would need nanosecond precision over a span of billions of years?). More confusing is that there's pretty extreme time granularity, but only ±290 years range with nanosecond precision for time durations?

nalgeon
2 replies
2d5h

It works very well for me and thousands of other Go developers. That's why I chose this approach.

g15jv2dp
0 replies
2d4h

There's no reason it wouldn't "work", the question is "why". Having such precise dates obviously comes with some compromises (e.g., the representation is larger, or it's variable depending on the value which comes with additional complexity, etc.). So surely there must be some pros to counterbalance the cons. "Because it's what Go does" is an answer, but I don't know if it's a convincing one.

bongodongobob
0 replies
2d2h

Nice. Smoking cigarettes works for me and millions of others but it's still stupid and will take years or decades of your life.

michaelt
0 replies
2d

> what possible use case would need nanosecond precision over a span of billions of years?

Once you've decided you're using nanosecond precision, a 64-bit representation can only cover 584 years which ain't enough. You really want at least 2 more bits, so you can represent 2024 years.

But once you're adding on 2 bits, why not just add on 16 or even 32? Then your library can cover the needs of everyone from people calculating how it takes light to travel 30cm, to people calculating the age of the universe.

That's how I imagine the design decisions went, anyway :)

Of course you can't really provide sub-second accuracy without leapsecond support and what does pre-human-civilisation leapsecond support even mean?

lifeisstillgood
2 replies
2d3h

This is a sort of lazy Ask HN: but in your experience, what is more useful / valuable - nanosecond representation, or years outside the nano range of something like 1678-2200

I don't do "proper" science so the value of nanoseconds seems limited to very clever experiments (or some financial trade tracking that is probalby even more limited in scope).

But being able to represent historical dates seems more likely to come up?

Thoughts?

rokkamokka
0 replies
2d3h

A bit like asking if a hammer or a screwdriver is more useful. It depends on the work

cyberax
0 replies
2d3h

Historical dates, for sure.

Simply reducing the precision to 10ns will provide enough range in practice.

cryptonector
2 replies
2d3h

I so wish that SQLite3 had an extensible type system.

funny_falcon
1 replies
2d

As a PostgreSQL smallish contributor I just can say: NO, DON'T DO THIS!!!!

Extensible type system is a worst thing that could happend with database end-user performance. Then one may not short-cut no single thing in query parsing and optimization: you must check type of any single operand, find correct operator implemenation, find correct index operator family/class and many more all through querying system catalog. And input/output of values are also goes through the functions, stored in system catalog. You may not even answer to "select 1" without consulting with system catalog.

There should be sane set of builtin types + struct/json like way of composition. That is like most DBs do except PostgreSQL. And I strongly believe it is right way.

cryptonector
0 replies
1d23h

you must check type of any single operand, find correct operator implemenation, find correct index operator family/class and many more all through querying system catalog.

Not with static typing.

The problem with PG is that it's not fully statically typed internally. SQLite3 is worse still, naturally. But a statically typed SQL RDBMS should be possible.

simontheowl
0 replies
2d5h

Very cool - definitely an important missing feature in SQlite.