I usually love Graydon's posts, but this one -- like many articles about formal methods -- may be misleading for most readers unfamiliar with the field. It's a little like reporting an order of magnitude improvement in transmuting lead into gold without mentioning that there are 29 further orders of magnitude to go before the method is cost-effective in practice.
Does no aliasing make formal verification significantly easier? Absolutely! Does it mean we can expect a practical and cost-effective way to verify real world programs? Not even remotely. Yes, some programs, circuits, and program components are formally verified every day, but they are the exception rather than the rule; they are relatively very small and they are constructed in particular careful ways. Nice properties that allow local reasoning about some other properties are important but do not materially affect the way we can assure the correctness of mainstream software with sound methods. A reasonable, real-world program that would take a million years of effort to verify without such properties will only take 100,000 years of effort to verify with them. That is good and it is even useful in practice because a program that would take a year to verify could now take a month -- making it actually cost-effective -- only the number of such programs to begin with is vanishingly small in the grand scheme of things.
A program in a language with no heap, no pointers, no integers -- only boolean variables -- and no loops larger than two iterations (the language is nowhere near Turing complete) cannot be practically verified (by reduction from TQBF). Not in theory, not in practice, and there aren't even heuristic methods that cover many real world instances of such programs (of course, some such programs could happen to be easily verifiable). It can be made to work for some properties (like memory safety) -- we say that we can make these invariants inductive or composable -- but it turns out that that's not nearly good enough for what software needs.
Back in the seventies and eighties, and even nineties, the hope was that while the "worst case" complexity of verifying programs was known to be intractable, perhaps certain local guarantees and structures employed by programming languages could move us away from the worst case. It has since been proven to not be the case. Even the hope that programs people actually write are far enough from the worst case for good heuristic methods to emerge, and even that now seems to not be the case.
Many years ago I gave a talk covering the relevant findings: https://pron.github.io/posts/correctness-and-complexity
The main result is that the verification most interesting properties that we'd like to verify does not compose. I.e. if we can prove a property for components P1...Pn, then proving that property for P1 ○ ... ○ Pn (where ○ is some composition such as a subroutine call or message passing) does not just grow super-polynomially in n (which would be good!), but super-polynomially in the size of each of the components, i.e. it would be just as hard to prove if everything was not decomposed. I.e. correctness does not decompose.
Yes, some important programs can be soundly verified and various local properties can make that easier, which is useful but because of software is growing much faster than the scale of effective formal verification and because we've learned more results about the complexity of software verification, end-to-end verification of "ordinary" software appears further away today than it did 50 years ago. That is why there's a shift in research of software assurance toward unsound methods. A focus on soundness does not, I believe, help the cause of more correct software, but distracts away from more techniques that have proven more fruitful.
This gap between "we can verify quite a bit" and "what we can verify is a drop in the ocean" -- both of which are true at the same time -- is something that is often missing from discussions of formal methods.
I think it's reasonable to look forward to verifying that small Rust libraries use 'unsafe' correctly. That alone would be really helpful.
Even if that were the case, you'd be working very hard, wasting your assurance budget, to prove something underwhelming, like memory safety or other similarly low-level properties. After all that hard work you'd only end up at the same point where most of the software world -- which is already using memory-safe languages -- is today. On the other hand, if you were to spend your budget how they do (rather, should do) with modern unsound methods, you would likely end up not noticeably worse than if you did both.
Remember, the goal isn't to prove some technical property, but to find the greatest number of bugs that matter to you -- those you believe are most dangerous for your particular program or library -- within budget. Researchers are often interested in discovering what properties they can verify, but practitioners care about the functional properties they need to verify.
Deductive methods (including type systems) are particularly myopic in that regard precisely because they're so limited in their effectiveness to begin with. People interested in those particular techniques are rightfully very excited when they discover they can do a little bit more than they previously could, even if what they can do is still so far behind what practitioners need (and are getting elsewhere with more practical methods).
In general, the gap between research and what practitioners think it might mean for their industry work is large in this field and greatly misunderstood by practitioners.
I'm not sure why this is underwhelming. You reduce your hardware cost budget by using a low-level language while ensuring you don't suffer from the single greatest source of security vulnerabilities in real programs. These have significant real-world impact.
There are two problems with your statement. First, it's unclear whether what you save pays for what you gain. Second, the participation of memory unsafety in vulnerabilities is not binary. If you can cheaply reduce memory unsafety by 95%, that doesn't mean that paying a high price for the remaining 5% is worth it. Yes, memory safety is the single greatest source of security vulnerabilities in some languages (it is not in general, but then again most software is already written in memory-safe languages), but once you've already reduced it by 95% it no longer is. (BTW, I'm not even sure that soundly eliminating the 95% is clearly worth it compared to alternatives, but that's a separate discussion).
There's this obsession with soundness that is undoubtedly interesting to researchers of sound methods, but isn't actually justified by the real-world needs of software (sometimes it is and sometimes it isn't, but when and by how much is something that can only be answered empirically; it isn't something that can be simply asserted). Software correctness is at least as much art as it is a science, and ignoring the empirical aspects that make it into an art and focusing only on the aspects that are science because that's where things are clear-cut doesn't make those aspects disappear. At the end of the day someone has a budget and they need to spend it in the way that yields to most correct software; how that is best done is an empirical question. After all, you started with an empirical finding about memory safety being a leading source of some bugs, but it simply doesn't follow that total memory safety soundness is the answer to the empirical question that we need answered. Soundness is clear-cut and mathematical (and that's also why it's attractive [1]); sadly, software correctness is the very opposite.
We must continue researching sound methods, but if anyone thinks -- knowing what we know now -- that this should be the main focus and the best answer to software correctness then I think they are misjudging the nature of the problem.
[1]: Ironically, people who focus on soundness love proofs, yet they try to skirt around the proofs of the futility of soundness by resorting to empirical -- though untested -- claims.
Here's the problem with this argument: the long history of security vulnerabilities demonstrates that people are complete shite at estimating both the likelihood of security problems, and the severity and scope of any unaddressed security problems.
So the only empirical data that we know to be absolutely reliable is this long history of colossal human failure on estimating this risk. That's why I don't trust your argument one bit, and thus, that's why soundness properties are important, because they take flawed human judgment completely out of the equation.
Which isn't to say that I want soundness proven for all properties no matter the cost, but I do take issue with your outright dismissal of soundness. There are low-hanging fruit like memory safety that handle the vast majority of these problems, and so unquestionably have a great bang for your buck.
But you are again trying to make a non-binary problem into a binary one. We will never be able to soundly verify all the properties we need. That dream is just getting further and further away. 100% correctness is unattainable. So given that, let's say we could only ever hope for 99%. In that case it doesn't matter whether we get rid of 99% of problems using unsound methods or soundly get rid of the cause of 99% of the problems -- the end result is the same. What matters is the cost of getting to that 99% correctness, not the technique.
My choice of technique -- sound or otherwise -- is driven by its cost-effectiveness, not principle. I don't care if you're using a sound technique or an unsound one; I only care about how much it costs to give me the most correctness I need and can get within my budget.
I don't dismiss soundness. I dismiss the focus on soundness out of principle rather than effectiveness. That myopic focus is grounded in the preferences of researchers of sound methods; it is not grounded in the study of software correctness.
I literally said I did not want soundness for all properties no matter the cost, but that soundness for some critical properties is absolutely warranted. The binary for some properties is absolutely there and completely justifiable based on the reasoning I laid out (memory safety in particular).
As a spectator in this thread, I do not see an argument for this conclusion in any of your posts. You're just insisting on it rather than arguing based on data.
So the unbelievably long history of memory safety vulnerabilities is not an argument for distrusting human judgment on the prevalence or scope of possible unsafety in one's program, and thus an argument for memory soundness. This is literally the argument I presented here [1]. In your mind, this is not an argument based on data?
[1] https://news.ycombinator.com/item?id=40378315
It's not an argument for always insisting on eliminating 100% of memory safety bugs via sound methods regardless of budget. That requires a more elaborate argument than just "memory safety bugs are really bad and therefore we should eliminate all of them with certainty".
Firstly, I never said sound methods ought to be used. Secondly, I also never said that memory safety bugs always must be eliminated without qualification, I specifically couched this in a context of security. In that context, ensuring memory safety is absolutely critical. For instance, scientific simulations of the weather or something don't have such considerations.
The whole problem of memory unsoundness is that you cannot predict the scope or severity of unsoundness. They are simply not like other "bugs", so the heuristic arguments the OP presented simply don't work. All other desirable properties derive from memory soundness, so if you cannot rely on it then your system effectively has no guarantees of any kind.
So yes, any system with any kind of attack surface should absolutely ensure memory safety by any means necessary. If you need to use an unsafe language for performance reasons, as per my initial reply, then isolate the unsafety to trusted kernels (like checked arrays) that can be verified by model checking or other proof techniques.
You kind of did say this: "soundness for some critical properties is absolutely warranted". It seemed clear that you were taking memory safety to be one of the "critical properties".
In the real world you cannot rely on anything absolutely. All guarantees are probabilistic. Rust, for example, does not have a formally verified compiler, so you cannot be absolutely sure that your provably memory safe code will actually compile down to memory safe machine code (or indeed to machine code that does what it's supposed to do in any other respect). Does that mean we need to go off and formally verify the Rust compiler because otherwise none of our systems will have guarantees of any kind? If it doesn't, then note that you are making exactly the same kind of cost/benefit analysis that pron is talking about.
It's easy to list lots of things that systems 'should' do. In a realistic scenario there is a budget of time/effort/attention/money.
You can achieve soundness without "sound methods" in the formal sense that you were using the term.
I disagree, there's at least an order of magnitude difference between what we're discussing (probably more). pron is tossing around numbers like eliminating 95% of memory safety bugs, where the safety of runtime checked arrays will eliminate more like 99.99% of memory safety bugs, where that 0.01% allows for the chance the you made a mistake in your checked array implementation. There is simply no equating the two.
I think "sound methods" in this context just means methods that guarantee soundness. You can have soundness without sound methods, but you can't know (for sure) that you have it.
A particular technique can only be warranted for practical use by its cost effectiveness, not on principle. If you justify soundness for a property because that property is at the root of 90% of bugs then you should still prefer another technique that reduces 90% of bugs more cheaply. It doesn't matter what technique is used to reduce that amount of bugs.
I understand the argument that nothing but soundness covers the full gamut of mistakes people make in practice around memory safety, but at the end of the day of those mistakes amount to 800 bugs in your program out of 1000, then you should still prefer a technique that reduces 800 bugs -- the same or different ones of equal severity -- for cheaper.
Indeed, there are situations where sound methods are cost-effective, and that is where they tend to be popular.
This is a false equivalence, as not all bugs are the same and so cannot be trivially equated in this fashion. Memory safety violations permit classes of bugs of significantly higher severity than other types of bugs. Your off-hand disclaimer of "the same or different ones of equal severity" doesn't mean anything, because you cannot predict the severity or scope of even trivial memory unsoundness, so your heuristic arguments of preferring lower cost methods simply don't work. A memory safety bug can range from something as simple as "a process crashed" in the best case, to "we lost control of all our servers worldwide and all of our customer data was compromised".
So I would summarize my dispute with your position as follows:
1. Memory safety is NOT an underwhelming property. Arguably, all other properties of your program derive from memory safety.
2. If you have to use an unsafe language that permits violating memory safety, say for performance reasons, and you have any kind of attack surface, then ensuring memory safety is important. Probably more important than most other properties you otherwise think are important due to #1.
3. Heuristic approaches to memory safety do not necessarily entail a reduction in the severity of bugs and/or security vulnerabilities. Arguments that you eliminated 80% of trivial memory unsoundness is simply not compelling if it leaves all of the subtle, more dangerous ones in place.
4. The long history of security vulnerabilities is a main driver in the interest in soundness among researchers, contra your claims that that this obsession has no association with real-world needs.
I think I'll just leave it at that.
Correct, and I tried hard not to make that equivalence.
This is not known to be true after eliminating 95% of memory safety violations.
Ah! Now that is 100% true, but it cuts both ways. Of course it's a problem that we can't know in advance what would be a severe bug or it's probability, but saying let's shift resources into sound methods in areas where they happen to work implicitly makes such guess. It's alright to acknowledge that measuring the effectiveness is hard, but that also means we can't presuppose the effectiveness of something just because it has an easy to understand mathematical property. I'm not arguing for or against sound methods; I'm saying that since the question is empirical, it must be answered by empirical means as that is the only way to answer it.
A pragmatic argument may be that you'd rather risk overspending on correctness than underspending, which essentially means: spend whatever you can on soundness whenever you can have it. The problem is that the world just doesn't work this way, because budgets are limited, and any work that is done to increase confidence in one property must, necessarily, come at the expense of other work that may also have value. There is simply no escape from measuring the cost/benefit of an economic activity that has a non-zero cost.
If I fly on a plane I want to know that the company doing the avionics spent its assurance budget in the way that eliminated most bugs, not that it spent it on ensuring memory safety (when I was working on avionics, we didn't use deductive methods even though they were available at the time precisely because their ROI was low and that's bad for correctness where you want every dollar to do the most good).
Except this is the starting point for most software. If your goal is to get to where most software is already is -- I would very much characterise it as underwhelming.
Yes, but how much should you invest in getting from 0 safety to 90% vs how much from 90% to 100%?
I agree, but again -- it cuts both ways. If you invest a significant amount of effort in eliminating the last 5% of memory safety violations you cannot claim that it's more effective than spending your effort elsewhere. All I'm saying is that these are empirical questions and the fact that whether something is sound or not has a clear yes/no answer doesn't really help tackle the empirical problem.
This is clearly not true, because the focus on soundness has only decreased over the past 50 years and continues to decrease. In the 70s it was soundness is the only way. Now there's more research into unsound methods.
And I say that the history of CVEs empirically shows that any important software with an attack surface must ensure memory safety, because no heuristic approaches can be relied upon.
Except you can measure the effectiveness of memory safety in preventing vulnerabilities in domains with an exposed attack surface. Your argument just reduces to an overall focus on bug count, despite acknowledging that 1) most security vulnerabilities are memory safety related, and 2) that memory safety bugs and other types of bugs can't be equated; and your only "escape hatch" is a supposition that a heuristic approach that "eliminates 95% of memory safety violations" probably doesn't leave anything serious behind. Sorry, 40+ years of CVEs does not make this claim reassuring.
It's not the starting point of software that has high performance or low power requirements, or expensive hardware, which is the context in which I took issue with your statement in my first reply. That's why memory safety is not underwhelming in this context.
There's more research into those methods because security costs are still a negative externality, and so a cost reduction focus doesn't factor it into account.
Anyway, I feel we're just circling here.
That no heuristic approaches can be relied upon to eliminate all memory safety violations -- something that is obviously true -- does not mean that eliminating all memory safety is always worth it. If it is the cause of 80% of security attacks, reducing it by 90% will make it a rather small cause compared to others, which means that any further reduction is a diminishing return.
Sorry, this doesn't make any logical sense to me. If X is important because it is a cause of a high number of bugs then reducing the number of bugs caused by X necessarily reduces its remaining importance. You cannot at once claim that something is important because it's the cause of many bugs and then claim that what's important isn't reducing the number of bugs. The number of bugs can't be both relevant and irrelevant.
Yep.
No. And this has been a particular challenge for poor C++. If we prove a property then the property holds, but if we just found and meticulously fixed 846 of the crucial holes in our system, the bad guys exploit #847 to break in just the same. Is there a #848? #849? Maybe.
We as an industry are significantly under-delivering and we ought to take responsibility for that.
I think you're correct about the gap but wrong about what the gap is. WUFFS exists today. Lots of people on HN have written or even are writing codecs in a general purpose programming language and don't even realise that's a terrible idea. "I'm pretty careful" substitutes for a machine checked proof and the results can be seen easily.
That is an unattainable goal for most properties we care about, particularly if they're as complex as "an unauthorised user cannot obtain some information". Even a complete and total elimination of all memory safety bugs and all concurrency bugs won't get you anywhere near that goal (and you can verify that for yourself by looking at common server vulnerabilities). I should have said that the attainable goal is etc..
Actually that's not really true. If you take that as a base and remove ambient authority, then you have capability security which does get you as close to fulfilling that property as is possible to achieve. I'd say that's pretty near that goal.
I disagree, for multiple reasons:
1. You need to verify that all the sensitive data is indeed guarded with the appropriate capabilities.
2. You need to verify that the capability objects are created only after proper authorisation.
3. You need to verify that the capability objects cannot escape their relevant context.
4. You need to verify that the data is not accessible by "extra-program" means, e.g. the file containing the data may be read via a dumb directory-injection attack.
Some simple sound techniques by the programming language can help with 3 and a part of 2, but not the rest. Indeed the current #1 top vulnerability [1] is a misconfigured or incorrect access control.
[1]: https://owasp.org/www-project-top-ten/
Some background:
1. "An unauthorised user cannot obtain some information" is not a property that can be achieved by any access control system, even in theory, eg. an authorized user can just provide a copy to an unauthorized user. I will outline the only achievable property below.
2. All resources in a capability system are designated by capabilities, just like all objects in a memory safe language are designed by references.
3. You can obtain more capabilities by accessing capabilities contained within resources designated by other capabilities, just like you can obtain more references by accessing fields of objects that contain references.
4. Capabilities intrinsically combine designation with authorization. If a user has a capability to some resource, they are by definition authorized to access it. There is no such thing as a "context" in which a resource may or may not be accessed, you only either have or do not have a capability to it.
5. Capabilities are not ambient in the environment, and so the only path to obtaining capabilities is through the capability set you're given. An example of an ambient authority is the ability to obtain a file handle to any file in a system using an arbitrary file string you can just construct. Another example is being able to convert an arbitrary number into a reference, ie. violating memory safety.
Hopefully it's now clear how there's a formal correspondence between capabilities and memory safe languages that have had ambient authority removed. This does not in any meaningful way restrict our ability to construct programs, at worst we must pass around a few more parameters that we used to access using ambient authorities (like File.Open). Access to resources can now only flow along the reference graph and security analysis now becomes tractable, basically another type of dataflow analysis.
So the security property that you want to verify is that there is no way for user X to access resource Y if given capability set C. This means that the transitive closure of C doesn't include Y.
There are numerous ways to achieve this, but the most trivial way is to use a typed language and create a type specifically to hold Y, and two capability set types for authorized and unauthorized users, and then you can ensure by simple inspection that a Y never appears in any reference placed in a C (or you can go more precise and only ensure that Y never appears in any return/escape position, thus allowing X to operate on Y's indirectly via some API). The type system is already a dataflow analysis, so this just piggybacks off of that to achieve the property you're after. Note that these are comparatively simple steps compared to getting memory safety to begin with.
That said, I agree that "extra-program" means of undermining the foundation of one's secure system obviously undermines the security properties. This is not really a compelling objection to me anymore than the proof of the validity of an algorithm must assume that your computer correctly adds two numbers.
You're thinking like a language designer rather than a black-hat hacker :)
The problem is that data doesn't begin life as a typed object but as a string of bits. I.e. it passes some "IO barrier" between the outside world and the world controlled by the language. At some point, some code must assign those bits to an object, and this code could be vulnerable.
This is defining whatever the program does as being correct, but that is not what we usually mean by correctness [1]. It's like saying that anyone who has obtained root privileges is, by definition, authorised to have them.
I fully understand the point about what is attainable and what isn't, but this is another example of my point about how some who think about correctness focus on what can be done without properly considering how much what can be done is worth, which is defined entirely but what needs to be done (or what's desired). The thing that I, the application's owner, want is not to have my data protected by a programming language's type system or, say, by means of a fuzzer but to have my data not leak to people I don't want to give my data to. In the case of security, the attacker would always go after the weakest point, whatever it is, and the question is by how much you've reduced the chances of a successful attack and at what cost.
Saying I can't give you what you want but here's what I can give you is perfectly fine, but how much what you can give me is worth to me depends entirely on how much it covers what I want. The value of reducing the chance of a successful attack by 70% is the same regardless of the means by which you achieve it, and then becomes purely a matter of cost.
I fully understand the difficulty of judging probabilities, but you need to understand that that is the sole source of value in correctness. But what I want is to lower the probability of some failure as much as possible for my $100 of investment. I don't care how it's done. That means that debates over assurance methods should be grounded in that question of cost and value. What makes an assurance method better or worse than another isn't how sound it is but how much it costs per unit of value, which is fault reduction (obviously the cost isn't linear). Soundness is a technique for the purpose of controlling value and costs, which may be more or less effective depending on circumstances. The point is that it's a means not an end.
As a practitioner but also a small-time CS researcher, I don't expect verifying that small Rust libraries use 'unsafe' correctly will "waste my assurance budget".
On the cost side: we don't know yet how hard it will be to formally verify small Rust libraries that wrap unsafe code in safe abstractions. That's partly because we don't even know exactly what the specification is that unsafe Rust code has to satisfy. But once we've settled on that and built good proof assistant tools, I think the work will usually be pretty easy, because most of these libraries are fundamentally quite simple and small, even including all their dependencies.
You correctly identified compositionality as a key issue. Fortunately, verifying correct use of 'unsafe' is inherently compositional: knowing nothing but the code of the library and its dependencies, we must prove that the necessary safety properties hold in all possible contexts.
On the benefit side: proving that libraries do not undermine Rust's safety guarantees is important, and the importance grows with the popularity of the library. And of course those guarantees aren't just "memory safety" but data-race-freedom and the general "mutable-xor-shared" property --- the importance of which is the subject of the OP here!
Yes, it is an inductive property. Sadly, not a very powerful one.
It is only important if you take it as an axiom that all programs must be proven to be memory-safe. That may be an axiom accepted by some Rust fans, but it is obviously not an axiom of software. What software needs are methods that reduce dangerous bugs (where "dangerous" means different things for different programs) in the most cost-effective way. It may well be the case that, say, soundly eliminating 95% of bugs caused by memory unsafety and using the rest of the budget on unsound methods is more effective than spending it on eliminating the last 5% of those particular bugs. Software's needs do not currently justify complete sound memory safety at all costs, so the question of cost is still the most relevant one. There is no justification to focus on one cause of bugs just because it can be soundly eliminated.
Neither of these are worthwhile enough to move the needle. This is precisely what I mean by the myopic focus on what can be done as opposed to what needs to be done. If the gap is 10 orders of magnitude, narrowing it by 1 doesn't have a big impact, unless it were completely free, which it isn't.
Your focus on "the most cost effective way" is fine in principle but in practice, we don't know ahead of time what bugs exist and what their impact is. Experience has shown that the prevalence and impact of memory safety and data race bugs is very difficult to predict, and the impact has very high variance. There are so many examples of memory safety bugs and races that lie dormant for years and suddenly become very important (e.g. because someone malicious discovered them, or because the software started being used in a different context).
Note that these safety bugs are really different from most other kinds of bugs in that their potential impact is unbounded: they can cause the program to behave in arbitrary ways. Logic errors, even stuff like non-UB integer overflow, have impact that is still constrained by the expected semantics of the programming language.
Given that uncertainty, it makes sense to take precautions. If you have cheap tools to eliminate those bugs, do it just in case. That might even be cheaper than trying to figure out ahead of time how important those bugs are going to be over the lifetime of the software (which is often an impossible task anyway).
Ah, you seem to think I proposed using formal verification to eliminate 100% of memory safety bugs no matter what the cost. In fact I didn't say that, and I don't recommend that. There will certainly be cases, typically in very large libraries that use 'unsafe', where the invariants you have to prove are very complicated and fragile and difficult to verify, and therefore the cost outweighs the benefit.
What I'm looking forward to is formalisms and tools to pick the "low hanging fruit" --- as I said, verifying the correctness of small libraries that use 'unsafe'. In small libraries, in general, and invariants and proofs are smaller and proof automation is correspondingly more effective.
It doesn't need to be completely free to have a positive net benefit.
Memory safety is table stakes. If you can't even prove memory safety, your program doesn't really have any well-defined semantics after all.
Which is another way of saying that the modern software world is already using lightweight formal methods in an end-to-end fashion. Type checking is one such use of lightweight formal methods.
Unsound methods can be useful to surface possible bugs quickly and efficiently, much like fuzzing does. But they're not a kind of verification - they provide no guarantee of correctness.
Sure, but only for specific properties that are inductive. Most properties that programmers care about aren't. It is precisely this kind of extrapolation from one property to another that doesn't work.
Correct 1. nothing else does either (again, sound formal verification only works in very limited situations -- either in the property they guarantee or the program they can guarantee it in -- and is falling further and further behind what the software world needs) and 2. unlike for algorithms, a "guarantee" is not and cannot be the bar for real software running on physical hardware, as you cannot guarantee the real behaviour of a software system even if the algorithm is guaranteed to be correct. Unlike an algorithm, a software system is ultimately a physical object, not a mathematical one. To crudely oversimplify, you don't care about any confidence beyond the probability of a bit flip.
In other words, you're pointing out that the current state-of-the-art in the field is not as good as something we don't (perhaps can't) have and don't need to begin with.
Yes but in practice, we keep finding ways of making more such interesting properties "inductive" by placing appropriate restrictions on how the program can be designed. This is how the Rust borrow checker works, at a high level.
Not really, and this is what I meant by having an order of magnitude improvement is very exciting to the researcher even if in practice there are still 29 to go, but it doesn't make a big impact. Even something as simple as the borrow checker -- that only brings you to about where most of software, with all its bugs, already is -- comes at the cost of only being able to use memory in certain ways. I.e. to make the property inductive, it had to be made more coarse (indeed, that is what is always done when looking for inductive properties [1]) which requires more effort on behalf of the user.
As I've written above, over the past two or three decades we've only learned the theory that things are more difficult than we thought, and at the same time the capabilities of sound verification have fallen further away from the needs of mainstream software. The trajectory of both the theory and practice of sound methods has been negative. All the while, we've made some nice strides in unsound methods. Tony Hoare started recognising it in the nineties (which somehow shifted his 70s view on the subject), and the trend has continued: Yes, there are improvements in sound methods, but the problems become harder faster than those improvements, and the improvements in unsound methods is also faster.
[1]: That is seen very elegantly in TLA+, but I digress.
That would require a formal model of how 'unsafe' works, and we still don't have that. Pointer provenance is an especially difficult issue, also affecting C/C++ specification efforts.
Indeed, we don't have that model, and we definitely need it. I don't think it's intractable. We've made a lot of progress. I would like to see more focused effort on this particular problem from the academic community.
P.S.
A focus on soundness does not, I believe, help the cause of more correct software but distracts from techniques that have proven more fruitful in practice (and supported by theory) -- sometimes to the surprise of those who used to think that soundness is the only way, e.g. http://users.csc.calpoly.edu/~gfisher/classes/509/handouts/h.... This happens over and over: there is some important improvement in sound methods while at the same time unsound methods are making much bigger strides toward more correct software in practice. Here's just one very recent example (which isn't even necessarily state-of-the-art in unsound methods): https://antithesis.com
Of course, this is a very complex topic, because there are some specific situations where sound techniques are more powerful and practical.
Which kind of touches on the other problem, which is the difficulty of getting people to even use the easy, low-cost techniques we already know. Here we are talking about SAT solving and formula propagation and how unsafe Rust interacts with those things and in the world I experience a moment of shock if I so much as see
instead of in code. (Obviously these type signatures are just complex enough to make my point and would generally have more to them in the real world.)Or at times even just encountering a unit test is enough to surprise me. Programmers need to get really, really tired of fixing the same bug(s) over and over in QA before they finally have the brainwave of adding their first unit test somewhere around the 25,000 line point.
I wish formal methods all the best and I wish I had more time to spend with them, but I'm awfully busy fighting upstream just to keep the unit tests running every time a new developer joins the project. I'm just asking for you to please install the git precommit hook or actually pay attention to the CI or whatever. Please. I'm not asking you to learn a SAT solver. Just keep the tests passing, and if a test broke because it's no longer applicable, please remove it instead of just letting it break for the next couple of years. etc.
In my experience the driver for this is "whoever has the money", not programmers. The programmer who carefully writes tests from line 0 gets laid off/told to speed up while the programmer who hacks together a bunch of janky code with no tests, but demos something cool to the <person with the money> is not laid off.
Only if the programmers are doing it wrong. I add unit tests from day one not because I'm just that disciplined... honestly I think my discipline may even be a touch sub-average compared to a lot of programmers... I add them from day one because they speed me up, starting somewhere around the third week or so (so we're not talking years before it pays off here). I do not understand how programmers are OK with making a change and then only finding out later that they broke something distant that they wouldn't have thought would break. It would be so frustrating. Or decide to refactor something in week 4 without unit test support.
Your janky code guy may beat me in the first month but after that I'm quite likely going to pass him and never look back.
Revealed preference of business is to move fast and break things. ("Any idiot can build a bridge, it takes a professional (an engineer!) to build one that barely stands.")
I grieve with you but it seems the way out of this is through some kind of magic potion cooked of the usual forbidden ingredients of: free and open source, being your own boss, idealism, etc.
It's not a bad idea to strongly type the username and password so that you can't so easily pass in unexpected input to the login / can centralize validation (e.g. username and password can't be more than 100 bytes each to attempt a login). It also lets you implement things like secure erasure of the password once it gets dropped.
Weakly typing that though is a sin. I wish Rust made creating strongly typed wrappers around primitives a bit more ergonomic.
By any chance can you elaborate on a bit of what are these unsound methods are, what's the current state-of-the-art? And what antithesis does? (Is it a fuzzer?)
What does "unsound method" mean in this context? Doesn’t that mean, if it finds a bug that this may not actually be a bug? I.e. it may give false positives?
Hi Mr Presler,
I don't know if you realize that you've preached the exact same argument here on HN a hundred time for the past 6 years at least, without ever adding nuance after the many comments you've received on the topic here. You're not entirely wrong, but also far from entirely correct as half the world interested in the topic has had many chances to show you over the hundreds of time you've made that argument here.
And what's fascinating (besides your relentless preaching) is that the complete absence of any evolution in your position after all this time show that you haven't learned anything from these discussions with others, nothing.
That really sounds like pathological close-mindedness at that point. It means that everybody is losing there time when engaging with you, and that's really something you should try and think about.
Regards
Hi. I think that's the normal dynamics when an expert in a particular subject discusses something with laymen. I'm not a researcher, but I have applied a range of formal methods for software verification for about a decade on systems ranging from safety-critical real-time embedded software to distributed databases. I constantly learn from people with a similar or higher level of experience. Even in my comments today you can find me mentioning a technique I only learnt about relatively recently from the people at Antithesis (I'd known about it since FoundationDB, but I didn't know as much about the details of their state exploration heuristics).
It is one of the greatest disappointments of my carreer that that group is still small; even most researchers don't have much experience in the mainstream application of formal methods, plus they each tend to focus on a narrow range of methods. The vast majority of practitioners care about correctness, but very few explore approaches other than simple testing or use of simple type systems; even something as basic and as old as non-trivial assertions -- which is a form of formal specification -- is underutilised.
BTW, a complicating factor is that verification techniques often don't extrapolate well from one application domain to another; a method that is effective for cryptography may not be effective for distributed systems and vice-versa.
But this is hubris, or at least arrogance, on your part: this is a very broad subject with many impact (reliability, security[1], language UX, teach-ability, productivity impact in heterogeneous teams, and so on), and most of the people who've debated with you are in fact often more experts than you in at least one aspect of the topic. Your obstinacy to see them as mere “laymen” is what makes you unable to improve in your understanding of the big picture, which come by aggregating the wisdom that comes from experts of complementary topics to yours.
Your inability to do so is likely the reason for your intellectual stagnation.
And this kind of antagonistic posturing is actually a problem for the whole world, because as you are proudly arguing about the “failures of sound methods” and the success of the alternative, the truth is that even if the real world deployment of TLA+ far exceeds the one of Coq, it's still pretty negligible in the grand scheme of things. And arguing that “but distributed systems do use TLA+ all the time” would be as pointless as saying “but Airbus is using Coq to make planes fly”.
In fact, none of the state of the art methods have gained widespread use, and this industry as a whole keeps pumping out tons of broken software every day without any sign of changing course.
The only successful (in terms of adoption) means of reducing the numbers of bugs are the following:
- static typing [2]
- unit testing
- fuzzing
All of them are pretty bad at their job in average, but at least together they avoids at a billion dollar worth of bugs every years, which is still much better than any of the technically superior, but practically unheard of but most software developers, alternatives. The jury is still out on what can actually work at scale.
Preaching for years on internet forums that your option of choice is the good option, despite its lasting failure to get mainstream adoption is as delusional as the posture you are criticizing, and at least those people are trying something instead of losing their time being patronizing over the internet.
[1] security is very different from reliability in that it has an adversarial nature: it doesn't matter if a bug is “less likely than hardware failure” on random input, if an attacker can trigger on purpose it with high probability. This is for instance something you have been consistently overlooking over these years.
TLA+ and Coq (at least as far as their use in software goes) are both examples of formal specification languages offering a deductive proof systems (and TLA+ also offers a model checker -- yet another sound method). And I don't think you'd find me selling either one of them in this conversation, in which I mostly focus on the limitations of such sound tools.
Unit tests (depending how you interpret the properties of interest) and fuzzing are indeed probably the most famous examples of unsound methods, and so it sounds like you actually agree with me: it is unsound methods that have been more successful in practice except for the simple properties proven by simple type systems. But no matter what argument you find more or less convincing, you should try to listen and learn -- because it's clear that you don't understand the conversation -- rather than shout from the peanut gallery.
If you find the terminology unfamiliar, you can think of sound methods as "static" ones [1], i.e. they don't execute the program, while unsound methods are usually "dynamic", i.e. require executing the program [2]. Indeed, my point is that "static" and "dynamic" methods are most effective when used in combination, and it is the static methods that are more cost effective for simple properties -- their effectiveness drops as the properties become more "interesting" -- and it is the dynamic methods that are more cost-effective for more complex properties. It sounds like you agree.
Now perhaps you can understand my original point: a certain attribute of a programming language that makes certain sound (static) methods easier when verifying certain classes of properties doesn't help all that much if those methods are so limited to begin with for that class either in the range of properties for which they're effective or their scalability. I.e. making a method that, in the relevant situations, is 100x too expensive for your budget only 90x more expensive for your budget doesn't have much of an impact on its overall practicality.
[1]: This is more of a large overlap than an equivalence. A human or LLM reviewing code is also a static form of correctness assurance, but it isn't a sound one.
[2]: I won't confuse you with abstract interpretation and model checking, both of which are sound and are can be viewed as either static or dynamic depending on your perspective. I consider them static as they don't rely on full concrete executions of the program. Some interesting unsound methods like concolic testing have some "static" parts, but I consider them ultimately dynamic.
I only mentioned Coq vs TLA+ because I've read your blog and I know you have strong taste preferences between those two.
That's because, unlike you, I actually try and understand people I don't agree with and see where they come from and under what perspective their views are indeed interesting, and that's how you learn how to understand complex (not difficult) topics.
But that, my friend, requires that you don't immediately consider yourself superior to your interlocutor and first consider them like peers who have different perspective on things because they have a different background. That's the inevitable conditions for improvement.
Thinking about other as mere peasants is probably more intellectually comfortable, but it's also far less constructive. Needless to say, it's also not a great way to earn respect.
Your lecturing here seems to be making a lot of insulting assumptions unnecessarily. It's probably best to try and avoid giving 'advice' like this.