A portion of this code implemented a SMTP client.
If I wanted to root cause this, the real problem is right there. Implementing protocols correctly is hard and bugs like in the post are common. A properly implemented SMTP client library, like one you would pull off the shelf, would accept text and encode it properly per the SMTP protocol, regardless of where the periods were in the input. The templating layer shouldn't be worrying about SMTP.
The real problem is that SMTP is a "plain-text" protocol that includes in-band signaling. It literally happened because SMTP defined "a line that only contains a single period in it" as a control sequence and not a literal line that only contains a single period in it.
SMTP is an example of an unnecessarily complex design, and the implementation bugs reflect it. SMTP shouldn't be hard for someone to correctly implement by themselves (even though I agree that people shouldn't be re-inventing the wheel).
At some point, all protocols include in-band signalling somewhere: You have to put packets on the line, and those packets are ultimately just a stream of anonymous bytes.
If it wasn’t a period, it would be something else & you’d have to handle that instead.
That's an incredibly reductionistic view of the world that's utterly useless for anything (including actually engineering systems) except pedantry. It's obvious that the level at which you include control information is meaningful and significantly affects the design of the protocol, as we see in the submission. Directly embedding the control information into the message body does not lead to a design that is easy to implement.
Yes, and there are many other design choices that'd be significantly easier to handle.
No, it's not reductionist and pedantic. It's a reminder that there is no magic. Building an abstraction layer that separates control and data doesn't win you anything if, like the people in the article, you then forget it's a thing and write directly to the level below it.
It's very reductionistic, because it intentionally ignores meaningful detail, and it's pedantic because it's making a meaningless distinction.
This is irrelevant. Nobody is claiming that there's any magic. I'm pointing out the true fact that details about the abstraction layers matter.
In this case, the abstraction layer was poorly-designed.
Good abstraction layer: length prefix, or JSON encoding.
Bad abstraction layer: "the body of the email is mostly plain text, except when there's a line that only contains a single period".
There are very, very few problems to which the latter is a good solution. It is a bad engineering decision, and it also obfuscates the fact that there even is an abstraction layer unless you carefully read the spec.
-------------
In fact, the underlying problem goes deeper than that - the design of SMTP is intrinsically flawed because it's a text-based ad-hoc protocol that has in-band signaling.
There are very few good reasons to use a text-based data interchange format. One of them is to make the format self-documenting, such that people can easily read and write it without consulting the spec.
If the spec is complex enough that you get these ridiculous footguns, then it shouldn't be text-based in the first place. Instead, it should be binary - then you have to either read the spec or use someone else's implementation.
Failing that, use a standardized structured format like XML or JSON.
But there's no excuse for the brain-dead approach that SMTP took. They didn't even use length prefixing,
Why not protobufs inside protobufs then?
Exactly! This is an even better phrasing of my point.
From my experience, almost no protocols have in-band signaling. No protocol I’ve ever built has in-band signaling because it’s nuts.
You always know what the next byte means because either you did a prefixed length, your protocol has stringent escaping rules, or you chose an obvious and consistent terminator like null.
If your metadata about the data is in the same channel as the data then you’re doing in-band signalling.
Not in modern times.
The terms harken back from the day of circuit switched networks but now that we have heavily transitioned to packets, bands are an artificial construct on top of packets and applying the term isn’t very clear cut.
The main property of in-band data in the circuit-switched network days is that you could inject commands into your data stream. If we apply that criteria that to a modern protocol, even if you mix metadata and data in the same “band,” if your data can never be interpreted as commands then “out of band” makes an apt description.
See https://en.m.wikipedia.org/wiki/Out-of-band_data
SMTP has stringent escaping rules. The authors of the code in the article were incompetent.
That isn't true at all. In most binary protocols, you put the size of the message in the header. Then any sequence of bytes is allowed to follow. There are no escape sequences.
That’s still in-band signalling: The metadata is in the same channel as the data.
That's not a very useful definition of "in-band signaling". For me, the main difference is an out-of-band protocol that says:
"The first two bytes represent the string length, in big-endian, followed by that many bytes presenting the string text."
and an in-band signalling protocol:
"The string is ended by a period and a newline."
In the second one, you're indicating the end of the string from within the string. It looks simpler, but that's where accidents happen. Now you have to guarantee that the text never contains that control sequence, and you need an escaping method to represent the control sequence as part of the text.
Why didn't they use actual control sequences instead of text chars, ASCII had plenty of those?
When your debugger is telnet circa 1980, non-printable characters are a liability. (You pay for it later, when people who've never seen "...end with <CRLF>.<CRLF>" botch the implementation.)
Honestly this part with the periods, while unusual, isn’t really complex. The two rules regarding the periods were like a small paragraph of text. I agree with sibling comment from Temporal, this is purely a “skill issue”, not a protocol issue
SMTP is that way because running the entire contents of a message through some escape character processing was expensive back then, in the era of 0.25 MIPS machines.
That's why it's a best practice to specify protocols at a very high level (e.g. using cap'n'proto) instead of expecting every random sleep-deprived SDE2 to correctly implement a network exchange in terms of read() and write().
That why you have to read the specs of the protocol you want to implement. Its a matter of engineering rigorousness. Brute-forcing until "it works" doesn't cut it.
I think you're missing the fact that experience has taught us repeatedly that separating the protocol definition from the wire format is a good idea. But sure, feel free to ignore the many lessons and blame it on individuals as if anything could be implemented free from human laziness, error, & economic demands (which btw is ironically a lack of engineering rigour which I was always taught as you assume that humans will be lazy, corrupt, make mistakes & that economics of making things as cheap as possible are a real part of engineering & not something to handwave away as "they're not doing real engineering").
To steelman the GP's POV: there are other parts of solutions to problems where similar levels of rigour are required and cannot be filled in by using a preexisting library (state machines for distributed business logic come to mind as an example). Eliminating the need for that here doesn't help that much in general, and might even make things worse, because it gives people less experience with tasks demanding rigour before they tackle ones that are both subtler and harder.
Learning to blindly follow a spec for the purposes of parsing the SMTP wire protocol doesn't give you extra ability to follow the state machine or distributed business logic specs better. It just adds to the overall opportunities for you to make a mistake. This also ignores the fact that SMTP specs is split across multiple RFCs with no single normative version which further complicates the probability that you implement the spec correctly in the first place.
Engineers get better faster because they leverage better tools and build tools to overcome their own shortcomings and leverage their strengths, not by constantly being beat into shape by unforgiving systems.
To be fair, what you and OP said is not an uncommon mentality. It's even shared in a way by Torvalds:
He has similar views about unit tests btw.
I personally would prefer to work with people who are smart & understand systems and have machines take care of subtle details rather than needing to always be 100% careful at all times. No one writing an SMTP parser is at Torvald's level.
I'm not arguing that this excuses you from being careful or failing to understand things. I'm saying that defensively covering your flank against common classes of mistakes leads to better software than the alternative.
This is a point I agree with and the fact I see it mentioned so rarely, that standards are split across multiple RFC's makes me suspect that people don't mention it because they don't know because they never read them in the first place, and rather try to follow the implementation of some existing program.
It can get tedious and annoying, but I don't think it actually affects the likelihood that you'll implement something wrong. The RFCs link to each other when needed. Also groups of RFCs often get combined and edited into a single revised RFC for simplicity.
This makes me wonder: How could the IETF's approach to standardisation be improved? I'm not sure how to fix this problem without overhauling everything.
Yeah someone should get in their Time Machine and tell those idiot SMTP RFC authors in the 1980s that they should have used a wire format that wouldn’t be invented for another 30 years.
Research is also a real part of engineering. One should not omit it and end up with a SMTP implementation like in the article.
Reading is for nerds. Real programmers™ write. And they write code, for the machine, not docs or specs or any other silly stuff intended for interhuman communication.
Ah, yes, the Agile manifesto.
You don’t need to understand SMTP to send email. You just use a library that implements it and lets you just pass in a html doc.
The real problem isn't the protocol, but the cowboy approach to interacting with it. It's not hard to "accept text and encode it properly per the SMTP protocol", you just need to realize you need to do it in the first place.
There is a multitude of classes of errors and security vulnerabilities, including "SQL injection", XSS, and similar, that are all caused by the same mistake that this case of missing period was[0]: gluing strings together. For example, with SQL queries, the operation of binding values to a query template should happen in "SQL space", not in untyped string space. "SELECT * FROM foo WHERE foo.bar = " + $userData; is doing the dumb thing and writing directly to SQL's serialized format. In correct code (and correct thinking), "SELECT * FROM..." bit is not a string, it just looks like one. Same with HTML templating[1] - work with the document tree instead of its string representation, and you'll avoid dumb vulnerabilities.
So, if you want to avoid missing dots in your e-mails, don't inject unstructured text into the middle of SMTP pipeline. Respect the abstraction level at which you work.
See also: langsec.
--
[0] - And therefore should be considered as a single class of errors, IMO.
[1] - Templating systems themselves are thus a mistake belonging to this class, too - they're all about gluing string representations together, where the correct way is to work at the level of language/data structures represented by the text.
This is not universally true.
JavaScript has an amazing feature called tagged template literals which let you tag a string with interpolations with a function that handles the literal and interpolation parts separately. This lets the tag function handle the literals as trusted developer written HTML or SQL, and the interpolations as untrusted user-provided values.
Lit's HTML template system[1] uses this to basically eliminate XSS (there are some HTML features like "javascript: " attributes that require special handling).
ex:
If `name` is a user-provided string, it can never insert a <script> or <img> tag, etc., because it's escaped.There are similar tags for SQL, GraphQL, etc. Java added a similar String Templates feature in 21.
[1]: https://lit.dev/docs/templates/overview/
Be careful with that "never". A curious and persistent person might discover a bug in the implementation, leading to something like the Log4Shell issue.
Not sure why you are being downvoted here. It's a fair point and properly escaping your data is only one part of the overall security picture but you should also be strictly validating data at the inputs to your system too.
Luckily, for Lit specifically, the "escaping" is done by the browser by setting textContent, so the string literally never passes through the HTML parser. Any string is valid text content, and if you found a bug that permitted unsafe text to be parsed as HTML somehow, it would be a browser bug and a very, very serious one.
But it'd be similar with with other template systems. If the interpolation should allow any string, there's really no validation to be done.
For what it’s worth, some markup-first template systems have tried to respect the target format’s structure—Genshi[1] and TAL[2] come to mind, and of course XSLT (see also SLAX[3]). I said “markup-first” so that the whole question isn’t trivialized by JSX.
[1] https://genshi.edgewall.org/wiki/Documentation/xml-templates...
[2] https://zope.readthedocs.io/en/latest/zopebook/AppendixC.htm...
[3] https://juniper.github.io/libslax/slax-manual.html
Sounds like a great idea, until you find that your SMTP library pulls in 5 other libraries as its own dependencies and those each pull in 3 transitive dependencies of their own, one being some kitchen sink/toolbox project where only 1% of its code is actually relevant to the dependant and the rest is dead weight - but which pulls in 20 more dependencies for functions that are literally never called in your project - and before you know it, your codebase bloats up by several MB and you get CVE warnings for libraries that you didn't even know existed, let alone that you're using them.
This doesn’t seem like a real issue. Servers can handle a several megabyte executable, and CVE warnings for libraries you aren’t using can be ignored.
In the hypothetical above, you won’t have any way to know which libraries are actually being used unless you read through the source code. Many libraries will transitively include protobuf, but most functions will not call protobuf.
Agreed. Even if you establish that it's not being used today, that doesn't mean that it will continue to be unused after the next few commits land.
And, even though you might not see a way to call into the unused code, an attacker might find a way (XZ Utils).
I agree with this principle in general.
But for SMTP libraries, that's often part of stdlib (Ruby, Python, PHP, ...).
I agree 100%.
Single responsibility principle in its finest.
https://en.wikipedia.org/wiki/Jamie_Zawinski#Zawinski's_Law