return to table of content

More Memory Safety for Let's Encrypt: Deploying ntpd-rs

cogman10
48 replies
20h18m

This seems like a weird place to be touting memory safety.

It's ntpd, it doesn't seem like a place for any sort of attack vector and it's been running on many VMs without exploding memory for a while now.

I'd think there are far more critical components to rewrite in a memory safe language than the clock synchronizer.

jaas
30 replies
19h49m

I'm the person driving this.

NTP is worth moving to a memory safe language but of course it's not the single most critical thing in our entire stack to make memory safe. I don't think anyone is claiming that. It's simply the first component that got to production status, a good place to start.

NTP is a component worth moving to a memory safe language because it's a widely used critical service on a network boundary. A quick Google for NTP vulnerabilities will show you that there are plenty of memory safety vulnerabilities lurking in C NTP implementations:

https://www.cvedetails.com/vulnerability-list/vendor_id-2153...

Some of these are severe, some aren't. It's only a matter of time though until another severe one pops up.

I don't think any critical service on a network boundary should be written in C/C++, we know too much at this point to think that's a good idea. It will take a while to change that across the board though.

If I had to pick the most important thing in the context of Let's Encrypt to move to a memory safe language it would be DNS. We have been investing heavily in Hickory DNS but it's not ready for production at Let's Encrypt yet (our usage of DNS is a bit more complex than the average use case).

https://github.com/hickory-dns/hickory-dns

Work is proceeding at a rapid pace and I expect Hickory DNS to be deployed at Let's Encrypt in 2025.

dheera
26 replies
18h44m

Why are C and C++ all of a sudden unsafe? Did I miss something?

What is safe now? JavaScript? PyTorch?

ghshephard
21 replies
18h30m

One of the major drivers (if not the driver) for the creation of Rust the fact that C is not a memory-safe language.

This has been known for decades, but it wasn't until 2010 that a serious attempt at writing a new system-language that was memory safe was attempted and got traction - Rust.

https://kruschecompany.com/rust-language-concise-overview/#:....

dheera
16 replies
18h25m

How is C not memory safe? If I access memory I didn't allocate the OS shuts the program down. Is that not memory safety?

(Unless you're running it on bare metal ...)

koolba
5 replies
17h20m

If I access memory I didn't allocate the OS shuts the program down.

The real problem is when you access memory that did allocate.

dheera
4 replies
16h13m

So we need a new flag for gcc that writes zeros to any block of allocated memory before malloc returns, not a new language.

saagarjha
0 replies
10h5m

We have that already. There are still other problems that exist.

kelnos
0 replies
15h5m

That wouldn't make it safe. It would just make it crash in a different way, and still be vulnerable to exploitation by an attacker.

KMag
0 replies
15h8m

You'd probably want an alternative libc implementation rather than a compiler flag.

However, calloc everywhere won't save you from smashing the stack or pointer type confusion (a common source of JavaScript exploits). Very rarely is leftover data from freed memory the source of an exploit.

BlobberSnobber
0 replies
7h59m

If only the very competent people that decided to create Rust had thought of asking you for the solution instead...

Have a little humility.

kortilla
4 replies
18h5m

This is not a question for HN at this point. It’s like asking why SQL query forming via string concatenation of user inputs is unsafe.

Google it, C memory boundary issues have been a problem for security forever.

TeMPOraL
3 replies
13h18m

It’s like asking why SQL query forming via string concatenation of user inputs is unsafe.

To be honest, that's a surprisingly deep question, and the answer is something I'm yet to see any developer I worked with understand. For example, did you know that SQL injection and XSS are really the same problem? And so is using template systems[0] like Mustache?

In my experience, very few people appreciate that the issue with "SQL query forming via string concatenation" isn't in the "SQL" part, but in the "string concatenation" part.

--

[0] - https://en.wikipedia.org/wiki/Template_processor

cowsandmilk
0 replies
11h15m

Really, the problem is in combining tainted input strings with string concatenation. If you have certain guarantees on the input strings, concatenation can be safe. That said, I still wouldn’t use it since there are few guarantees that future code wouldn’t introduce tainted strings.

codetrotter
0 replies
13h11m

The problem that causes it is in photo cases that a flat string representation kind of mixes code and data. And therefore escaping (such as not allowing quotes in the user input to terminate quotes in your query, or not allowing html tags to be opened or closed by user input, nor allowing html attributes to be opened or closed by the user input, etc) is needed.

IshKebab
0 replies
9h2m

In my experience, very few people appreciate that the issue with "SQL query forming via string concatenation" isn't in the "SQL" part, but in the "string concatenation" part.

Really? To me it's pretty obvious that not escaping properly is the issue, and therefore the same issue applies wherever you need escaping. I don't think I've ever heard anyone say that SQL itself was the problem with SQL injection. (Although you certainly could argue that - SQL could be designed in such a way that prepared statements are mandatory and there's simply no syntax for inline values.)

patmorgan23
0 replies
16h58m

I think you have a missunderstanding on what memory safety is.

Memory safety means that your program is free of memory corruption bugs (such as buff overflow or under flow bugs) that could be used to retrieve data the user isn't supposed to have, or can be used to inject code/commands that then get run in the programs process. These don't get shut down by the OS because the program is interacting with its own memory.

kelnos
0 replies
15h6m

No, that's a security vulnerability waiting for someone to exploit.

hot_gril
0 replies
18h16m

That's not what memory safety refers to.

pjmlp
3 replies
11h9m

I would consider that serious attempts have been made since 1961, however all of them failed in the presence of a free beer OS, with free beer unsafe compilers.

The getting traction part is the relevance part.

quectophoton
2 replies
4h10m

Yeah, all previous attempts at making such a language lacked two things:

1. They didn't have a big, well-known, company name with good enough reputation to attract contributors.

2. They didn't have brackets.

Success was because traction, traction was because appeal, and appeal was mostly because those two things. Nothing else was new AFAIK.

yosefk
0 replies
27m

No shared mutable state in an imperative language is common, as are memory safe languages with performance close to C's? Didn't see the latter in the language shootout benchmarks, in fact Rust is much closer to C than the next closest thing

steveklabnik
0 replies
18m

I take a slightly different view of this, though I do not deny that those things are important. #1 in particular was important to my early involvement in Rust, though I had also tinkered with several other "C replacement" languages in the past.

A lot of them simply assumed that some amount of "overhead," vaguely described, was acceptable to the target audience. Either tracing GC or reference counting. "but we're kinda close to C performance" was a thing that was said, but wasn't really actually true. Or rather, it was true in a different sense: as computers got faster, more application domains could deal with some overhead, and so the true domain of "low level programming languages" shrunk. And so finally, Rust came along, being able to tackle the true "last mile" for memory unsafety.

Rust even almost missed the boat on this up until as late as one year before Rust 1.0! We would have gotten closer than most if RFC 230 hadn't landed, but in retrospect this decision was absolutely pivotal for Rust to rise to the degree that it has.

runiq
0 replies
12h17m

Tcl

pjmlp
0 replies
11h10m

They have been unsafe from their very early days, Multics got a higher security score than UNIX thanks to PL/I, and C.A.R Hoare has addressed C's existence on his Turing Award in 1980, while Fran Allen has also made similar remarks back in the day.

dequan
0 replies
18h42m

All of a sudden? They've been unsafe for decades, it's just that you had less of a choice then.

alkonaut
0 replies
6h50m

Because you can cast a pointer to a number and back again. Then you can stuff that value into index [-4] of your array. More or less.

twothreeone
2 replies
19h21m

It continues to astonish me how little people care (i.e., it triggers the $%&@ out of me). I really appreciate the professionalism and cool rationale when faced with absolute ignorance of how shaky a foundation our "modern" software stack is built upon. This is a huge service to the community, kudos to you and many others slowly grinding out progress!

vmfunction
1 replies
9h33m

Lol, shaky indeed. A business person once said, "can you imagine if machine engineer (like auto makers) behave like software engineering?".

Seems no digital system is truly secure. Moving foundational code to memory safe seems like a good first step.

twothreeone
0 replies
24m

That's because there is no such thing as "truly secure", there can only be "secure under an assumed threat model, where the attacker has these specific capabilities: ...". I agree that software engineering is getting away with chaos and insanity compared to civil or other engineering practices, which have to obey the laws of physics.

oconnor663
7 replies
20h0m

it's been running on many VMs without exploding memory for a while now

Most of the security bugs we hear about don't cause random crashes on otherwise healthy machines, because that tends to get them noticed and fixed. It's the ones that require complicated steps to trigger that are really scary. When I look at NTP, I see a service that:

- runs as root

- talks to the network

- doesn't usually authenticate its traffic

- uses a bespoke binary packet format

- almost all network security depends on (for checking cert expiration)

That looks to me like an excellent candidate for a memory-safe reimplementation.

cogman10
6 replies
19h52m

runs as root

ntpd can (and should) run as a user

talks to the network

Makes outbound requests to the network. For it to be compromised, the network itself or a downstream server needs to be compromised. That's very different from something like hosting an http server.

doesn't usually authenticate its traffic

Yes it does. ntp uses TLS to communicate with it's well known locations.

uses a bespoke binary packet format

Not great but also see above where it's talking to well known locations authenticated and running as a user.

It's a service that to be compromised requires state level interference.

woodruffw
1 replies
18h45m

I don’t think NTP uses TLS by default. The closest equivalent I can find online is NTS, which was only standardized in 2020 (and for which I can’t find clear adoption numbers).

(But regardless: “the transport layer is secure” is not a good reason to leave memory unsafe code on the network boundary. Conventional wisdom in these settings is to do the “defense in depth” thing and assume that transport security can be circumvented.)

wbl
0 replies
13h39m

The network is just a bunch of strangers in a meet me room.

kortilla
0 replies
18h2m

Ntp does not use TLS widely. It’s also a UDP protocol which makes it subject to spoofing attacks without a compromised network (no protection from kernel network stack verifying TCP sequence numbers).

franga2000
0 replies
19h18m

It's a service that to be compromised requires state level interference

For servers it's definitely harder, although definitely not only state-level, but another big issue could be client-side. NTP servers can be set by DHCP, so the admin of any network you connect to could exploit such a bug against you. And once you have code execution on a desktop OS, all bets are off, even if you're not under the primary UID.

It's not the most important threat vector, but it also doesn't seem as difficult as some of the other system services to rewrite, so I'd say it was a good first step for the memory-safe-everything project.

adolph
0 replies
6h39m

> doesn't usually authenticate its traffic

Yes it does. ntp uses TLS to communicate with it's well known locations.

My knee-jerk reaction is that TLS is not authentication. After skimming through the relevant spec [0], it is interesting how NTP uses TLS.

[NTS-KE] uses TLS to establish keys, to provide the client with an initial supply of cookies, and to negotiate some additional protocol options. After this, the TLS channel is closed with no per-client state remaining on the server side. [0]

0. https://datatracker.ietf.org/doc/html/rfc8915

luma
5 replies
20h14m

It's present on loads of systems, it's a very common service to offer, it's a reasonably well-constrained use case, and the fact that nobody thinks about it might be a good reason to think about it. They can't boil the ocean but one service at a time is a reasonable approach.

I'll flip the question around, why not start at ntpd?

cogman10
4 replies
19h59m

I'll flip the question around, why not start at ntpd?

Easy, because there are loads of critical infrastructure written in C++ that is commonly executed on pretty much every VM and exposed in such a way that vulnerabilities are disasterous.

For example, JEMalloc is used by nearly every app compiled in *nix.

Perhaps systemd which is just about everywhere running everything.

Maybe sshd, heaven knows it's been the root of many attacks.

hedora
2 replies
16h16m

There’s a nice pure-rust ssh client/server already.

Systemd should just be scrapped. This week’s wtf “systemd-tmpfile —-purge” intentionally changed its behavior to “rm -rf /home”. Confirmed not-a-bug. There are dozens of other comparable screwups in that stack, even ignoring the long list of CVEs (including dns and ntp, I think). Rust can’t fix that.

I haven’t heard of any issues with jemalloc, though that seems reasonable (assuming calling rust from C doesn’t break compiler inlining, etc).

yjftsjthsd-h
0 replies
14h47m

There’s a nice pure-rust ssh client/server already.

Prod ready, audited, non-buggy in actual use? And if so, do you have a link so I can start test-deploying it?

kelnos
0 replies
15h3m

For example, JEMalloc is used by nearly every app compiled in nix.*

JEmalloc is used by very very very few apps compiled for *nix. That's a conscious decision that an app developer needs to make (and few would bother without specialized needs) or a distro packager needs to shoehorn into their package builds (which most/all would not do).

rnijveld
0 replies
10h58m

I do think that memory safety is important for any network service. The probability of something going horribly wrong when a network packet is parsed in a wrong way is just too high. NTP typically does have more access to the host OS than other daemons, with it needing to adjust the system clock.

Of course, there are many other services that could be made memory safe, and maybe there is some sort of right or smart order in which we should make our core network infrastructure memory safe. But everyone has their own priorities here, and I feel like this could end up being an endless debate of whatabout-ism. There is no right place to start, other than to just start.

Aside from memory safety though, I feel like our implementation has a strong focus on security in general. We try and make choices that make our implementation more robust than what was out there previously. Aside from that, I think the NTP space has had an under supply of implementations, with there only being a few major open source implementations (like ntpd, ntpsec and chrony). Meanwhile, NTP is one of those pieces of technology at the core of many of the things we do on the modern internet. Knowing the current time is one of these things you just need in order to trust many of the things we take for granted (without knowledge of the current time, your TLS connection could never be trusted). I think NTP definitely deserves this attention and could use a bunch more attention.

lambdaone
0 replies
19h22m

NTP is a ubiquitous network service that runs directly exposed to the Internet, and that seems to me like a good thing to harden. Making NTP more secure does not stop anyone else from working on any other project.

astrobe_
0 replies
1h36m

Agreed. It is a "good old" binary protocol, so the many gotchas of text protocols are not there.

akira2501
32 replies
21h25m

Why does your ntpd have a json dependency?

orf
16 replies
21h8m

Would you rather it had a JSON dependency to parse a config file, or yet another poorly thought out, ad-hoc homegrown config file format?

amiga386
4 replies
20h35m

Poorly thought out, ad-hoc homegrown config file format, please. Every time.

1. Code doesn't change at the whims of others.

2. The entire parser for an INI-style config can be in about 20 lines of C

3. Attacker doesn't also get to exploit code you've never read in the third party dependency (and its dependencies! The JSON dependency now wants to pull in the ICU library... I guess you're linking to that, too)

4. Complexity of config file formats are usually format-independent, the feature-set of the format itself only adds complexity, rather than takes it away. To put it another way, is this any saner...

    {"user":"ams","host":"ALL","runas":["/bin/ls","/bin/df -h /","/bin/date \"\"","/usr/bin/","sudoedit /etc/hosts","OTHER_COMMANDS"}
... than ...

    # I may be crazy mad but at least I can have comments!
    ams ALL=/bin/ls, /bin/df -h /, /bin/date "", /usr/bin/, sudoedit /etc/hosts, OTHER_COMMANDS

All the magic in the example is in what those values are and what they imply, the format doesn't improve if you naively transpose it to JSON.

An example of an NTP server's config:

    # I can have comments too
    [Time]
    NTP=ntp.ubuntu.com
    RootDistanceMaxSec=5
    PollIntervalMinSec=32
    PollIntervalMaxSec=2048

If you just want key-value pairs of strings/ints, nothing more complex is needed. Using JSON is overdoing it.

whytevuhuni
1 replies
12h27m

I once saw an .ini for a log parser:

    [Alarm]
    Name=Nginx Errors
    Pattern="[error] <pid>#<tid>: <message>"
The thing worked. Without any errors. And yet it took:

    Pattern="[error] <pid>
..and then considered the rest of the line a comment. It didn't even error on the fact that the quotes were not closed.

Hand-rolling config formats is hard.

IshKebab
0 replies
8h58m

Yeah try adding a git alias with quotes... I ended up reading the source code to figure out wtf it was doing.

vlakreeh
0 replies
17h37m

1. I can pin my json parser dependency and literally never update it again

2. And how many times have we seen 20 lines of C backfire with some sort of memory safety issue.

3. First off, i'd go out on a limb and say the number of attacks from a well-established (or even a naive one) rust json parsing library is dwarfed by the number of attacks from adhoc config parsers written in C with some overlooked memory safety issue.

4. Usually being the key word, tons of adhoc config formats have weird shit in them. With json (or yaml/toml) you know what you're getting into and you immediately know what you're able and unable to do.

kelnos
0 replies
13h31m

I feel like using the incomprehensibly error-prone and inscrutable sudoers format as an example kinda argues against your point.

(I do agree that JSON is a terrible configuration file format, though.)

akira2501
3 replies
20h36m

yet another poorly thought out, ad-hoc homegrown config file format

OpenBSD style ntpd.conf:

    servers 0.gentoo.pool.ntp.org
    servers 1.gentoo.pool.ntp.org
    servers 2.gentoo.pool.ntp.org
    servers 3.gentoo.pool.ntp.org

    constraints from "https://www.google.com"

    listen on *
I mean, there's always the possibility that they used a common, well known and pretty decent config file format. In this particular case, this shouldn't be the thing that differentiates your ntpd implementation anyways.

IshKebab
2 replies
8h59m

That config file perfectly illustrates the point. There's no need for it to be custom, and require me to waste time learning its syntax when it could just be JSON or TOML. Honestly I would even take YAML over that and YAML is the worst.

xorcist
1 replies
8h3m

You still have to learn the syntax even if it is expressed in json or yaml. Perhaps stating the obvious, but not every json object is a valid ntp configuration.

The configuration object will always and by definition be proprietary to ntp. Expressing it as plain text allows for a trivial parser, without any of the security implications of wrapping it in a general language language ("should this string be escaped?", "what should we do with invalid utf8?").

The more simple format has survived over thirty years, is trivial to parse by anyone, and does not bring any dependencies that needs maintaining. That should count for something.

IshKebab
0 replies
3h38m

Sure you have to learn how to configure things but you don't have to learn basic syntax like "how do I escape a string".

The fact that it has survived tells you nothing other than it's not so completely awful that someone went through the pain of fixing it. That doesn't mean it is good. There are plenty of awful things that survive because replacing them is painful due to network effects. Bash for example.

cozzyd
2 replies
16h1m

JSON is a terrible configuration format since it doesn't support comments.

orf
1 replies
8h53m

Ok. Do you want to now add anything relevant to the comment you’re replying to?

cozzyd
0 replies
6h10m

As a user, I always prefer the bespoke configuration file format, provided it has comments explaining what each configuration option does.

patmorgan23
1 replies
16h39m

Why isn't there a decent parser in the standard library? More than 50% of programs will probably touch json at this point.

tialaramex
0 replies
10h29m

Rust's stdlib is (at least notionally) forever. Things in the standard library (including core and alloc) get deprecated but must be maintained forever, which means that "at this point" isn't enough.

In 2003 those programs would have used XML, in 1993 probably .INI files. Are you sure that despite all its shortcomings JSON is the end of history? I don't believe you.

If you want "at this point" you can, as software does today, just use a crate. Unlike the stdlib, if next week Fonzie files are huge and by 2026 "nobody" is using JSON because Fonzie is cool, the JSON config crate merely becomes less popular.

itishappy
1 replies
21h4m

It uses TOML for configuration.

orf
0 replies
18h7m

Cool, thats why this is a hypothetical question

danudey
11 replies
21h19m

This is a good question to ask, especially in the age of everything pulling in every possible dependency just to get one library function or an `isNumeric()` convenience function.

The answer is that there is observability functionality which provides its results as JSON output via a UNIX socket[0]. As far as I can see, there's no other JSON functionality anywhere else in the code, so this is just to allow for easily querying (and parsing) the daemon's internal state.

(I'm not convinced that JSON is the way to go here, but that's the answer to the question)

[0] https://docs.ntpd-rs.pendulum-project.org/development/code-s...

motrm
10 replies
20h54m

If the pieces of state are all well known at build time - and trusted in terms of their content - it may be feasible to print out JSON 'manually' as it were, instead of needing to use a JSON library,

  print "{"
  print "\"some_state\": \"";
  print GlobalState.Something.to_text();
  print "\", ";
  print "\"count_of_frobs\": ";
  print GlobalState.FrobsCounter;
  print "}";
Whether it's worth doing this just to rid yourself of a dependency... who knows.

syncsynchalt
3 replies
20h39m

Even better to just use TSV. Hand-rolling XML or JSON is always a smell to me, even if it's visibly safe.

hackernudes
1 replies
20h12m

Do you mean TLV (tag-length-value)? I can't figure out what TSV is.

FredFS456
0 replies
20h8m

Tab Separated Values, like CSV but tabs instead of commas.

masklinn
0 replies
11h15m

Hand-rolling TSV is no better. The average TSV generator does not pay any mind to data cleaning, and quoting / escaping is non-standard, so what the other wide will do with it is basically playing russian roulette.

Using C0 codes is likely safer at least in the sense that you will probably think to check for those and there is no reason whatsoever for them to be found in user data.

Gigachad
2 replies
15h29m

This looks like the exact kind of thing that results in unexpected exploits.

Spivak
1 replies
14h31m

Hand rolled JSON input processing, yes. Hand rolled JSON output, no.

You're gonna have a hard time exploiting a text file output that happens to be JSON.

comex
0 replies
12h8m

You're gonna have a hard time exploiting a text file output that happens to be JSON.

If you’re not escaping double quotes in strings in your hand-rolled JSON output, and some string you’re outputting happens to be something an attacker can control, then the attacker can inject arbitrary JSON. Which probably won’t compromise the program doing the outputting, but it could cause whatever reads the JSON to do something unexpected, which might be a vulnerability, depending on the design of the system.

If you are escaping double quotes, then you avoid most problems, but you also need to escape control characters to ensure the JSON isn’t invalid. And also check for invalid UTF-8, if you’re using a language where strings aren’t guaranteed to be valid UTF-8. If an attacker can make the output invalid JSON, then they can cause a denial of service, which is typically not considered a severe vulnerability but is still a problem. Realistically, this is more likely to happen by accident than because of an attacker, but then it’s still an annoying bug.

Oh, and if you happen to be using C and writing the JSON to a fixed-size buffer with snprintf (I’ve seen this specific pattern more than once), then the output can be silently truncated, which could also potentially allow JSON injection.

Handling all that correctly doesn’t require that much code, but it’s not completely trivial either. In practice, when I see code hand-roll JSON output, it usually doesn’t even bother escaping anything. Which is usually fine, because the data being written is usually not attacker-controlled at all. For now. But code has a tendency to get adapted and reused in unexpected ways.

fiedzia
1 replies
20h35m

If the pieces of state are all well known at build time - and trusted in terms of their content

.. than use library, because you should not rely on the assumption that next developer adding one more piece to this code will magically remember to validate it with json spec.

maxbond
0 replies
19h46m

No magic necessary. Factor your hand-rolling into a function that returns a string (instead of printing as in the example), and write a test that parses it's return with a proper JSON library. Assert that the parsing was successful and that the extracted values are correct. Ideally you'd use a property test.

mananaysiempre
0 replies
20h27m

That’s somewhat better than assembling, say, HTML or SQL out of text fragments, but it’s still not fantastic. A JSON output DSL would be better still—it wouldn’t have to be particularly complicated. (Shame those usually only come paired with parsers, libxo excepted.)

rnijveld
2 replies
11h54m

I don’t think our dependency tree is perfect, but I think our dependencies are reasonable overall. We use JSON for transferring metrics data from our NTP daemon to our prometheus metrics daemon. We’ve made this split for security reasons, why have all the attack surface of a HTTP server in your NTP daemon? That didn’t make sense to us. Which is why we added a readonly unix socket to our NTP daemon that on connecting dumps a JSON blob and then closes the connection (i.e. doing as little as possible), which is then usable by our client tool and by our prometheus metrics daemon. That data transfer uses json, but could have used any data format. We’d be happy to accept pull requests to replace this data format with something else, but given budget and time constraints, I think what we came up with is pretty reasonable.

stavros
1 replies
11h2m

If you're only dumping a string, couldn't you replace this dependency with some string concatenation?

rnijveld
0 replies
9h56m

Probably, but we still need to parse that string on the client side as well. If you’re willing to do the work I’m sure we would accept a pull request for it! There’s just so many things to do in so little time unfortunately. I think reducing our dependencies is a good thing, but our dependencies for JSON parsing/writing are used so commonly in Rust and the way we use it hopefully prevents any major security issues that I don’t think this should be a high priority for us right now compared to the many things we could be doing.

ComputerGuru
19 replies
20h47m

Unlike say, coreutils, ntp is something very far from being a solved problem and the memory safety of the solution is unfortunately going to play second fiddle to its efficacy.

For example, we only use chrony because it’s so much better than whatever came with your system (especially on virtual machines). ntpd-rs would have to come at least within spitting distance of chrony’s time keeping abilities to even be up for consideration.

(And I say this as a massive rust aficionado using it for both work and pleasure.)

syncsynchalt
12 replies
20h32m

The biggest danger in NTP isn't memory safety (though good on this project for tackling it), it's

(a) the inherent risks in implementing a protocol based on trivially spoofable UDP that can be used to do amplification and reflection

and

(b) emergent resonant behavior from your implementation that will inadvertently DDOS critical infrastructure when all 100m installed copies of your daemon decide to send a packet to NIST in the same microsecond.

I'm happy to see more ntpd implementations but always a little worried.

timmytokyo
8 replies
19h40m

I really wish more internet infrastructure would switch to using NTS. It addresses these kinds of issues.

1over137
6 replies
16h29m

Never heard of it. Shockingly little on wikipedia for example.

denton-scratch
2 replies
9h17m

I hadn't heard of NTS until a Debian upgrade quietly installed ntpsec. It seems to now be the Debian default.

rlaager
1 replies
3h49m

ntp has been replaced by ntpsec in Debian. (I am the Debian ntpsec package maintainer.) By default, NTPsec on Debian uses the NTP Pool, so no NTS. But NTPsec does support NTS if you are running your own server and supports it opt-in on the client side.

As far as I know, the Debian “default” is systemd-timesyncd. That is what you get out of the box. (Though, honestly, I automate most of my Linux installs, so I don’t interact with a stock install very often.) AFAIK, systemd-timesyncd does not support NTS at all.

Doing NTS on a pool would be quite complicated. The easy way is to share the same key across the pool. That is obviously not workable when pool servers are run by different people. The other way would be to have an another out-of-band protocol where the pool NTP servers share their key with the centralized pool NTS-KE servers. Nobody has built that, and it’s non-trivial.

tialaramex
0 replies
6m

Ah not quite, I think pooling would be rather easier than you've thought, there are Let's Encrypt people here, but let me explain what you'd do to have N unrelated machines which are all able to successfully claim they are some-shared-name.example

Each such machine mints (as often as it wants, but at least once) a document called a Certificate Signing Request. This is a signed (thus cannot be forged) document but it's public (so it needn't be confidential) and it basically says "Here's my public key, I claim I am some-shared-name.example, and I've signed this document with my private key so you can tell it was me who made it".

The centralized service collects these public documents for legitimate members of the pool and it asks a CA to issue certificates for them. The CA wants a CSR, that's literally what it asks for -- Let's Encrypt clients actually just make one for you automatically, they still need one. Then the certificates are likewise public documents and can be just provided to anybody who wants them (including the NTP pool servers they're actually for which can collect a current certificate periodically).

So you're only moving two public, signed, documents, which isn't hard to get right, you should indeed probably do this out-of-band but you aren't sharing the valuable private key anywhere, that's a terrible idea as well as being hard to do correctly it's just unnecessary.

codetrotter
1 replies
13h17m

Yeah. Seems it doesn’t even have its own article there.

Only a short mention in the main article about NTP itself:

Network Time Security (NTS) is a secure version of NTPv4 with TLS and AEAD. The main improvement over previous attempts is that a separate "key establishment" server handles the heavy asymmetric cryptography, which needs to be done only once. If the server goes down, previous users would still be able to fetch time without fear of MITM. NTS is currently supported by several time servers, including Cloudflare. It is supported by NTPSec and chrony.
westurner
0 replies
11h7m

"RFC 8915: Network Time Security for the Network Time Protocol" (2020) https://www.rfc-editor.org/rfc/rfc8915.html

"NTS RFC Published: New Standard to Ensure Secure Time on the Internet" (2020) https://www.internetsociety.org/blog/2020/10/nts-rfc-publish... :

NTS is basically two loosely coupled sub-protocols that together add security to NTP. NTS Key Exchange (NTS-KE) is based on TLS 1.3 and performs the initial authentication of the server and exchanges security tokens with the client. The NTP client then uses these tokens in NTP extension fields for authentication and integrity checking of the NTP protocol messages that exchange time information.

From "Simple Precision Time Protocol at Meta" https://news.ycombinator.com/item?id=39306209 :

How does SPTP compare to CERN's WhiteRabbit, which is built on PTP [and NTP NTS]?

White Rabbit Project: https://en.wikipedia.org/wiki/White_Rabbit_Project

rnijveld
0 replies
10h41m

I'm afraid this is a pretty common sentiment. NTS has been out for several years already and is implemented in several implementations (including our ntpd-rs implementation, and others like chrony and ntpsec). Yet its usage is low and meanwhile the fully unsecured and easily spoofable NTP remains the default, in effect allowing anyone to manipulate your clock almost trivially (see our blog post about this: https://tweedegolf.nl/en/blog/121/hacking-time). Hopefully we can get NTS to the masses more quickly in the coming years and slowly start to decrease our dependency on unsigned NTP traffic, just as we did with unencrypted HTTP traffic.

jaas
0 replies
19h28m

ntpd-rs support NTS, I agree it would be great if more people used it!

rnijveld
2 replies
11h15m

I agree that amplification and reflection definitely are worries, which is why we are working towards NTS becoming a default on the internet. NTS would prevent responses by a server from a spoofed packet and at the same time would make sure that NTP clients can finally start trusting their time instead of hoping that there are no malicious actors anywhere near them. You can read about it on our blog as well: https://tweedegolf.nl/en/blog/122/a-safe-internet-requires-s...

One thing to note about amplification: amplification has always been something that NTP developers have been especially sensitive to. I would say though that protocols like QUIC and DNS have far greater amplification risks. Meanwhile, our server implementation forces that responses can never be bigger than the requests that initiated them, meaning that no amplification is possible at all. Even if we would have allowed bigger responses, I cannot imagine NTP responses being much bigger than two or three times their related request. Meanwhile I've seen numbers for DNS all the way up to 180 times the request payload.

As for your worries: I think being a little cautious keeps you alert and can prevent mistakes, but I also feel that we've gone out of our way to not do anything crazy and hopefully we will be a net positive in the end. I hope you do give us a try and let us know if you find anything suspicious. If you have any feedback we'd love to hear it!

dfc
1 replies
3h37m

I cannot imagine NTP responses being much bigger than two or three times their related request.

I think you must be limiting your imagination to ntp requests related to setting the time. There are a lot of other commands in the protocol used for management and metrics. The `monlist` command was good for 200x amplification. https://blog.cloudflare.com/understanding-and-mitigating-ntp...

rnijveld
0 replies
2h36m

Ah right! I always forget about that since we don’t implement the management protocol in ntpd-rs. I think it’s insane that stuff should go over the same socket as the normal time messages. Something I don’t ever see us implementing.

hi-v-rocknroll
3 replies
11h33m

You might be doing too much work at the wrong level of abstraction. VMs should use host clock synchronization. It requires some work and coordination, but it eliminates the need for ntp in VMs entirely.

Hosts should then be synced using PTP or a proper NTP local stratum (just get a proper GNSS source for each DC if you have then funds).

https://tsn.readthedocs.io/timesync.html

Deploy chrony to bare metal servers wherever possible.

xorcist
1 replies
8h12m

This makes sense. The clock is just another piece of hardware to be virtualized and shared among the guests.

But last time I said that with some pretense of authority, someone shoved me a whitepaper from VMware that said the opposite. Best practice was stated be to sync each guest individually with a completely virtual clock.

I'm not sure I agree, but at least I try to be open to be possibility that there are situations I had not considered. If anyone else knows more about this, please share.

yjftsjthsd-h
0 replies
1h27m

a whitepaper from VMware that said the opposite

Did it say why?

rnijveld
0 replies
10h48m

Our project also includes a PTP implementation, statime (https://github.com/pendulum-project/statime/), that includes a Linux daemon. Our implementation should work as well or even better than what linuxptp does, but it's still early days. One thing to note though is that NTP can be made to be just as precise (if not more precise), given the right access to hardware (unfortunately most hardware that does timestamping only does so for PTP packets). The reason for this precision is simple: NTP can use multiple sources of time, whereas PTP by design only uses a single source. This gives NTP more information about the current time and thus allows it to more precisely estimate what the current time is. The thing with relying purely on GNSS is that those signals can be (and are in practice) disrupted relatively easily. This is why time synchronization over the internet makes sense, even for large data centers. And doing secure time synchronization over the internet is only practically possible using NTP/NTS at this time. But there is no one size fits all solution for time synchonization in general.

rnijveld
0 replies
12h10m

I would encourage you to take a look at some of our testing data and an explanation of our algorithm in our repository (https://github.com/pendulum-project/ntpd-rs/tree/main/docs/a...). I think we are very much in spitting distance of Chrony in terms of synchronization performance, sometimes even beating Chrony. But we’d love for more people to try our algorithm in their infrastructure and report back. The more data the better.

agwa
0 replies
6h27m

What exactly does "time keeping abilities" mean? If I had to choose between 1) an NTP implementation with sub-millisecond accuracy that might allow a remote attacker to execute arbitrary code on my server and 2) an NTP implementation which may be ~100ms off but isn't going to get me pwned, I'm inclined to pick option 2. Is writing an NTP server that maintains ~100ms accuracy not a solved problem?

nubinetwork
3 replies
20h4m

The problem with ntp isn't the client, it's the servers having to deal with forged UDP packets. Will ntpd ever become TCP-only? Sadly I'm not holding my breath. I stopped running a public stratum 3 server ~10 years ago.

brohee
1 replies
19h47m

When one can make a stratum 1 server for $100, there is very little reason for the continuous existence of public NTP servers. ISP can offer the service to their customers, and any company with a semblance of IT dept can have its own stratum 1.

ssl-3
0 replies
10h7m

One can build a GPS-backed stratum 1 server for a lot less than $100 in hardware (and I have done so in the past). It was a fun little project for me, but it involved a confluence of skillsets that many [especially smaller] companies may not collectively possess. And even then, it needs to be documented so someone else can work on it, and that maintenance has to actually-happen, and it needs a view of the sky in order for it to chooch, and it also needs redundancy. This takes time. (And if this IT department doesn't work for free, then the cost is very quickly a lot more than $100.)

And going full-assed with one or more actually-outside antennas can also be problematic, since it's a Bad Day when (eg) grounding is done improperly and lightning comes by to fuck their shit up.

And ISPs can (and certainly should!) provide decent NTP servers. That's the logical place for shared servers, network-wise. But the one's choice of ISP is often limited by geography, and the most-viable ISP here stopped publishing what their NTP servers are -- if they still exist in a form that customers can use, they don't publish this information. (It wasn't always this way and I remember using them years ago when they were more forthcoming. They were shitty NTP servers with high jitter. Good enough for some things, I suppose, but not as good as the members of the public NTP pool tend to be.)

I mean: Many ISPs can't even manage to handle DNS queries quickly. When public servers like 8.8.8.8 and 1.1.1.1 (or the semi-public ones like 4.2.2.1) are faster than the ISP's own servers, then that's a problem. And it's a stupid problem, and it should not happen. But it does happen.

So thus, public NTP servers are useful for many -- including those who have a tinkered-together NTP server with a GPS antenna in a window somewhere, where public servers can be used as backup.

It's good to have options, and it's nice that some organizations provide some options for the greater network.

Faaak
0 replies
19h59m

On the contrary, I'm hosting a stratum 1 and 2 stratum 2s (at my previous company we offered 3 stratum 1s) on the ntp pool. It's useful, used, and still needed :-)

_joel
2 replies
20h17m

Reading this reminded me of ntpsec, anyone actually use that?

move-on-by
1 replies
17h25m

Yes, Debian transitioned to NTPSec with bookworm. The NTP package is just a dummy transitional package to that installs NTPsec.

https://packages.debian.org/bookworm/net/ntp

_joel
0 replies
6h22m

Interesting, thanks

NelsonMinar
1 replies
21h21m

I like the idea of NTPD in Rust. Is there anything to read about how well ntpd-rs performs? Would love a new column for chrony's comparison: https://chrony-project.org/comparison.html

Particularly interested in the performance stats, how well the daemon keeps time in the face of various network problems. Chrony is very good at this. Some of the other NTP implementations (not on that chart) are so bad they shouldn't be used in production.

rnijveld
0 replies
12h13m

In our internal testing we are very close to Chrony with our synchronization performance, some of our testing data and an explanation of our algorithm is published in our repository: https://github.com/pendulum-project/ntpd-rs/tree/main/docs/a...

Given the amount of testing we (and other parties) have done, and given the strong theoretical foundation of our algorithm I’m pretty confident we’d do well in many production environments. If you do find any performance issues though, we’d love to hear about them!

xvilka
0 replies
11h29m

BGP probably should be the next.

hoseja
0 replies
7h5m

Free pair of knee-high socks with every cert.