The Valid method takes a context (which is optional but has been useful for me in the past) and returns a map. If there is a problem with a field, its name is used as the key, and a human-readable explanation of the issue is set as the value.
I used to do this, but ever since reading Lexi Lambda's "Parse, Don't Validate," [0] I've found validators to be much more error-prone than leveraging Go's built-in type checker.
For example, imagine you wanted to defend against the user picking an illegal username. Like you want to make sure the user can't ever specify a username with angle brackets in it.
With the Validator approach, you have to remember to call the validator on 100% of code paths where the username value comes from an untrusted source.
Instead of using a validator, you can do this:
type Username struct {
value string
}
func NewUsername(username string) (Username, error) {
// Validate the username adheres to our schema.
...
return Username{username}
}
That guarantees that you can never forget to validate the username through any codepath. If you have a Username object, you know that it was validated because there was no other way to create the object.[0] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
Crazy that actually using your type system leads to better code. Stop passing everything around as `string`. Parse them, and type them.
There's a name for this anti-pattern: "Stringly typed"
This term is typically used to refer to things like data structures and numerical values all being passed as strings. I don't think a reasonable person would consider storing a username in a string to be "stringly typed".
The One True Wiki[0] says "Used to describe an implementation that needlessly relies on strings when programmer & refactor friendly options are available."
Which is exactly what's going on here. A username has a string as a payload, but that payload has restrictions (not every string will do) and methods which expect a username should get a username, not any old string.
[0]: https://wiki.c2.com/?StringlyTyped
I don't agree that this example is more "programmer friendly". Anything you want to do with the username other than null check and passing an argument is going to be based directly on the string representation. Insert into a database? String. Display in a UI? String. Compare? String comparison. Sort? String sort. Is it really more "programmer friendly" to create wrapper types for individual strings all over your codebase that need to have passthrough methods for all the common string methods? One could argue that it's worth the tradeoff but this C2 definition is far from helpful in setting a clear boundary.
Meanwhile the real world usages of this term I've seen in the past have all been things like enums as strings, lists as strings, numbers as strings, etc... Not arbitrary textual inputs from the user.
You inherit some code. Is that string a username or a phone number? Who knows. Someone accidentally swapped two parameter values. Now the phone number is a username and you’ve got a headache of trying to figure out what’s wrong.
By having stronger types this won’t come up as a problem. You don’t have to rely on having the best programmers in the world that never make mistakes (tm) to be on your team and instead rely on the computer making guard rails for you so you can’t screw up minor things like that.
I agree on the one hand but empirically I don’t think I have seen a bug where the problem was the string for X ended up being used as Y. Probably because the variable/field names do enough heavy lifting. But if your language makes it easy to wrap I say why not. It might aid readability and maybe avoid a bug.
I would probably type to the level of Url, Email, Name but not PersonProfileTwitterLink.
I’ve refactored a large js code base into ts. Found one such bug for every ~2kloc. The obvious ones are found quickly in untyped code, the problem is in rare cases where you e.g. check truthiness on something that ends up always true.
It definitely is stringly typed. It's just that it's a very normalized example of it, that people don't think of as being an antipattern.
If you want to implement what Yaron Minsky described as "make illegal states unrepresentable", then you use a username type, not a string. That rules out multiple entire classes of illegal states.
If you do that, then when you compile your program, the typechecker can provide a much stronger correctness proof, for more properties. It allows you to do "static debugging" effectively, where you debug your code before it ever even runs.
I don’t get what you’re about. The root comment clearly presents a structure of a separate type. The fact that it happens to contain a single string field is completely irrelevant (what type an actual username should be, a float?). “Stringly typed” is about stringifying non-string values to save typing work and is not applicable here in the slightest.
I wasn’t sure who was right. I’ll tie break with https://wiki.c2.com/?StringlyTyped= which pretty much says what you just said
I've also seen it called primitive obsession, which is also applicable to other primitive types like using an integer in situations where an enum would be better.
Definitely use to fall for primitive obsession. It seemed so silly to wrap objects in an intermediary type.
After playing with Rust, I changed my tune. The type system just forces you into the correct path, that a lot of code became boring because you no longer had to second guess what-if scenarios.
Yeah, modern type systems are game changers. I've soured on Rust, but if Go had the full Ocaml type system with match statements I think it would be the perfect language.
A lot of languages certainly don't make it easy. You shouldn't have to make a Username struct/class with a string field to have a typed username. You should be able to declare a type Username which is just a string under the hood, but with different associated functions.
Sadly enums are too advanced of a concept to be included in Go.
See https://eli.thegreenplace.net/2018/go-and-algebraic-data-typ... for more background.
Bash :(
JSON
TCL
things have different costs.
Types limit you from making some mistakes, but it also impacts your extensibility. Imagine an enum with 4 values and you want to add 1 because 10 level deep one of the services need new value. How does it usually go with strongly typed languages? You go and update all services until new value is properly propagated to lowest level who actually needs that value.
Now imagine doing same with strings, you can validate at the lowest level, upper levels just pass value as it is. If upper layers have conditionals based on value, they still can limit their logic to those values
Why would you need to update code that isn't matching on the value? It just knows it has an X and passes it to a function that needs an X.
if you don't update the code in intermediate layers, some automated validation based on enum values will fail, which also drops the request
As a PHP developer I am frankly disappointed you think that we only do that with strings. I've got an array[1] full of other tools.
1. Or maybe a map? Those keys might have significance I didn't tell you about.
I originally typed out `int` and wanted to do more, but I try to keep my comments as targeted as possible to avoid the common reply pattern of derailing a topic by commenting on the smallest and least important part of it. If I type `string`, `int`, `arrays`, `maps`, `enums`... someone will write 3 paragraphs about enums are actually an adequate usage of the type system, and everyone will focus on that instead of the overarching message.
It's not guaranteed at all, that's where go's zero-values come in. E.g. nested structs, un/marshaljson magic methods etc. How do you deal with that?
Every struct requiring its zero value to be meaningful is probably one of the worst design flaws in the language.
There is no such requirement. Common wisdom suggests that you should ensure zero values are useful, but that isn't about every random struct field – only the values you actually give others. Initialize your struct fields and you won't have to consider their zero state. They will never be zero.
It's funny seeing this beside the DRY thread. Seems programmers taking things a bit too literally is a common theme.
“Just do the right thing everywhere and you don’t have to worry!”
You can’t stop consumers of your libraries from creating zero-valued instances.
Then the zero value is their problem, not yours. You have no reason to be worried about that any more than you are worried about them not getting enough sleep, or eating unhealthy food. What are you doing to stop them from doing that? Nothing, of course. Not your problem.
Coq exists if you really feel you need a complete type system. But there is probably good reason why almost nobody uses it.
Except for all those times you're the consumer of someone else's library and there's no way for them to indicate that creating a zero-valued struct is a bug.
Again, it's the philosophy of "Just do the right thing everywhere and you don’t have to worry!" Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs.
> Except for all those times you're the consumer of someone else's library and there's no way for them to indicate that creating a zero-valued struct is a bug.
Nonsense. Go has a built-in facility for documentation to communicate these things to other developers. Idiomatic Go strongly encourages you to use it. Consumers of the libraries expect it.
> Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs.
Well, sure. But, like I said, almost nobody uses Coq. The vast, vast, vast majority of projects – and I expect 100% of web projects – use languages with incomplete type systems, making what you seek impossible.
And there's probably a good reason for that. While complete type systems sound nice in theory, practice isn't so kind. There are tradeoffs abound. There is no free lunch in life. Sorry.
You don't have to go as far as Coq. Rust manages "parse, don't validate" extremely well with serde.
Go's zero-values are the problem, not any other lack of its type system.
> You don't have to go as far as Coq.
No, you do. Anywhere the type system is incomplete means that the consumer can do something the library didn't intend. Rust does not have a complete type system. There was no relevance to mentioning it. But I know it is time for Rust's regularly scheduled ad break. And while you are at it, enjoy a cool, refreshing Coca-Cola.
> Go's zero-values are the problem
"Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs." has nothing to do with zero-values. It doesn't even really have anything to do with Go specifically. My, the quality of advertising has really declined around here. Used to be the Rust ads at least tried to look like they fit in.
Any language without zero-values (or some equally destructive quality) can do "parse, don't validate". Go cannot. Rust is just an example.
Top of the hour again? Time for another Rust advertisement?
The topic at hand is about preventing library users from doing things the library author didn't intended using the type system, not "what happens if a language has zero-values". Perhaps you are not able to comprehend this because you are hungry? You're not you when you are hungry. Grab a Snickers.
what happens if a language has zero-values, is that you can't "parse, don't validate".
Maybe it's time for you to finally try rust? Or any other language without zero-values, since rust seems to irritate you in particular.
This insane perspective of “nothing is totally perfect so any improvements over what go currently does are pointless” whenever you confront a gopher with some annoying quirk of the language is one of the worst design flaws in the golang community hivemind.
Tell us, why you hold that perspective? It's an odd one. Nobody else in this thread holds that perspective. You even admit it is insane, yet here you are telling us about this unique perspective you hold for some reason. Are you hoping that we will declare you insane and admit you in for care? I don't quite grasp the context you are trying to work within.
You manage to present a strawman and produce a No True Scotsman fallacy all at once in this comment thread.
Nobody is suggesting that Coq should be used, so stop bringing it up (strawman). And yes, Coq might have an even stricter and more expressive type system than Rust. But nobody is asking for a perfect type system (no true Scotsman). People are asking to be able to prevent users of your library to provide illegal values. Rust (and Haskell and Scala and Typescript and ….) lets you do this just fine whereas Golang doesn’t.
And personally I would much rather have the compiler or IDE tell me I’m doing something wrong than having to read the docs in detail to understand all the footguns.
My personal opinion is that - even though I’m very productive with Golang and I enjoy using it - Golang has a piss poor type system, even with the addition of Generics.
> People are asking to be able to prevent users of your library to provide illegal values. [...] and Typescript
Typescript, you say?
Hmm. Oh, right, just don't hold it wrong. But "sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs."Your example doesn’t even satisfy the base case, let alone the general case. Get back to us when you have actually read the thread and can provide something on topic.
But that is not an accident, is it? It’s someone very deliberately casting an object. It’s not the same and you probably know it.
It might be an accident. Someone uninitiated may think that is how you are expected to initialize the value. A tool like Copilot may introduce it and go unnoticed.
But let's assume the programmer knows what they are doing and there is no code coming from any other source. When would said programmer write code that isn't deliberate? What is it about Go that you think makes them, an otherwise competent programmer, flail around haphazardly without any careful deliberation?
> The vast, vast, vast majority of projects – and I expect 100% of web projects – use languages with incomplete type systems, making what you seek impossible.
…where, "what GP seeks" is…
> way for [library authors] to indicate that creating a zero-valued struct is a bug
I'd say that's a really low and practical bar, you really don't need Coq for that. Good old Python is enough, even without linters and type hints.
Of course it's very easy to create an equivalent of zero struct (object without __init__ called), but do you think it's possible to do it while not noticing that you are doing something unusual?
> Good old Python is enough
No, Python is not enough to "...work with a type system where designers of libraries can actually prevent you from writing bugs." Not even typed Python is going to enable that. Only a complete type system can see the types prevent you from writing those bugs. And I expect exactly nobody is writing HTTP services with a language that has a complete type system – for good reason.
Yes, you are quite right that you, the library consumer, can Foo.__new__(Foo) and get an object that hasn't had its members initialized just like you can in Go. But unless the library author has specifically called attention to you to initialize the value this way, that little tingling sensation should be telling you that you're doing something wrong. It is not conventional for libraries to have those semantics. Not in Python, not in Go.
Just because you can doesn't mean you should.
This is where we arrive at my conclusion that go is not well-suited to implementing business logic!
C++ constructors actually make the guarantee, but it comes with other pains
Lots of languages handle it just fine and don’t need the mess of C++ ctors.
GP is pointing out that go specifically makes it an issue.
What language do you have in mind?
Any language which supports private state: smalltalk, haskell, ada, rust, …
The issue is DRY often comes to wreck this sort of thing. Some devs will see "Hmm, Username is exactly the same as just a string so let's just use a string as Username is just added complexity".
I've tried it with constructs like `Data` and `ValidatedData` and it definitely works, but you do end up with duplicate fields between the two objects or worse an ever growing inheritance tree and fields unrelated to either object shared by both.
For example, consider data looking like
and ValidatedData looking like There's a mighty temptation for some devs to want to apply DRY and zip these two things together. Unfortunately, that can really be messy on these sorts of type changes and the where of where validation needs to happen gets muddled.Except Username is not exactly the same as string, and that's important. Username is a subset of string. If they were equivalent, we wouldn't need to parse/validate.
The often misinterpreted part of DRY is conflating "these are the same words, so they are the same", with "these are the same concept, so they are the same". A Username and a String are conceptually different.
DRY is just "Do not repeat yourself". And a LOT of devs take that literally. It's not "Do not repeat concepts" (which is what it SHOULD be but DRC isn't a fun acronym).
Unfortunately "This is the same character string" is all a DRY purist needs to start messing up the code base.
I honestly believe that "DRY" is an anti-pattern because of how often I see this exact behavior trotted out or espoused. It's a cargo cult thing to some devs.
This seems less about DRY and more a story about a hypothetical junior dev making a dumb mistake masquerading as commentary about “DRY purism”.
Man I wish it was just jr devs. I cut jrs a ton of slack, they don't know any better. However, it's the seniors with the quick quips that are the biggest issue I run into. Or perhaps senior devs with jr mentalities
most srs are just jrs with inflated egos and titles
That's why I like to tell people to always remember to stay MOIST - the Most Optimal is Implicitly the Simplest Thing.
When you add complexity to DRY out your code, you're adding a readability regression. DRY matters in very few context beyond readability, and simplicity and low cognitive load need to be in charge. Everything else you do code-style wise should be in service of those two things.
DRY has nothing to do with readability. The fact that it might help with it is purely coincidental.
DRY is about maintainability - if you repeat rules (behavior) around the system and someone comes along and changes it, how can you be sure it affected all the system coherently?
I've seen this in practice: we get a demand from the PO, a more recent hire goes to make the change, the use case of interest to the PO gets accepted. A week later we have a bug on production because a different code path is still relying on the old rule.
Like everything, it depends is the right answer.
In my experience (~20 years) with software development I developed the belief that people will go through the path of applying patterns, techniques, architectures, good practices, first as dogma, then to rejection, ending in acceptance of the knowledge that almost all of software development patterns/best practices are mostly good heuristics, which require experience to apply correctly and know when to break or bend the rules.
DRY applied as a dogma will eventually fail, because it's not a verified mathematical proof of infallible code, it's just a practice that gives good results inside its constraints, people just don't learn the constraints until it explodes in their faces a few times.
Like any wisdom, it's hard it will be received and understood without the rite of passage of experience.
DRY vs premature optimisation is the landscape most long term devs find themselves in. You can say that FP, OO and a bunch of other paradigms affect this, but eventually you need to repeat yourself. The key is to determine when this happens without spending too much time determining when this happens.
One of the major issues with a lot of the outdated concepts in programming is that we still teach them to young people. I work a side gig as an external examiner for CS students. Especially in the early years they are taught the same OOP content that I was taught some decades ago, stuff that I haven’t used (also) for some decades. Because while a lot of the concepts may work well in theory, they never work out in a world where programmers have to write code on a Thursday afternoon after a terrible week.
It’s almost always better to repeat code. It’s obviously not something that is completely black and white, even if I prefer to never really do any form of inheritance or mutability, it’s not like I wouldn’t want you to create a “base” class with “created by” “updated by” and so on for your data classes and if you have some functions that do universal stuff for you and never change, then by all means use them in different places. But for the most part, repeating code will keep your code much cleaner. Maybe not today or the next month, but five years down the line nobody is going to want to touch that shared code which is now so complicated you may as well close your business before you let anyone touch it. Again, not because the theoretical concepts that lead to this are necessarily flawed, but because they require too much “correctness” to be useful.
Academia hasn’t really caught on though. I still grade first semester students who have the whole “Animal” -> “duck”, “dog”, “cat” or whatever they use into their heads as the “correct way” to do things. Similar to how they are often taught other processes than agile, but are taught that agile is the “only” way, even though we’ve seen just how wrong that is.
I’m not sure what we can really do about it. I’ve always championed strongly opinionated dev setups where I work. Some of the things we’ve done, and are going to do, aren’t going to be great, but what we try to do is to build an environment where it’s as easy as possible for every developer to build code the most maintainable way. We want to help them get there, even when it’s 15:45 on a Thursday that has been full of shit meetings in a week that’s been full of screaming children and an angry spouse and a car that exploded. And things like DRY just aren’t useful.
God no. Stop the copy pasta disease! It's horrible, mindless programming.
When reviewing code, I'm astonished anything was accomplished by copy pasting so much old code (complete with bugs and comment typos).
Incidentally, OOP encourages you to copy a lot. It's just an engine for generating code bloat. Want to serialize some objects? Here's your Object serializer and your overloaded Car serialize and your overloaded Boat serializer, with only a few different fields to justify the difference!
OOP is bad. Copy pasta is bad. DRY is good. All hail DRY, forever, at any cost.
OOP and Dry are compatible! I’ve actually done the thing that the above commenter suggests - create a base object with created on/by so that I never have to think about it. Whether or not you actually care about that, if you implement a descended of that object you’re going to get some stuff for free, and you’re gonna like it!
Countless man-centuries have been lost looking for the perfect abstraction to cover two (or an imagined future with two) cases which look deceptively similar, then teasing them apart again.
For what it's worth, I've always had an easier time combining WET code than untangling the knot than is too DRY code. Too little abstraction and you might have to read some extra code to understand it. Too much abstraction and no one other than the writer, and even then, may ever understand it.
It’s a balancing act, but deletable code is often preferable to purely-DRY-for-the-sake-of-DRY, overly abstracted code.
Yeah, no. Not at all. I imagine that you are taking DRY quite literally, as if and critiquing the most stupid use cases of it, like DRYing calls to Split with spaces to SplitBySpace.
DRY's goal is to avoid defining behaviors in duplicity, resulting in having multiple points in code to change when you need to modify said behavior. Code needs to be coherent to be "good", for a number of of the different quality indicators.
I'm doing a "side project" right now where I'm using a newcomer payment gateway. They certainly don't DRY stuff. Same field gets serialized with camel case and snake case in different API, and whole structures that represent the same concept are duplicate with slightly different fields. This probably means that Thursday 15.25 the dev checked-in her code happy because the reviewer never cared about DRY, and now I'm paying the price of maintaining four types of addresses in my code base.
There's a mistake many junior devs (and sometimes mid and senior devs) make where they confuse hiding complexity with simplicity - using a string instead of a well defined domain type is a good example, there is a certain complexity of the domain expressed by the type that they don't want to think about too deeply so they replace with a string which superficially looks simpler but in fact hides all of the inherent complexity and nuance.
It causes what I call the lumpy carpet syndrome - sweeping the complexity under the carpet causes bumps to randomly appear that when squashed tend to cause other bumps to pop up rather than actually solving the problem.
Go now has generics, so I'm confident some smart fellow will apply DRY and make it a generic ValidatedData[type, validator] type struct, with a ValidatedDataFactory that applies the correct validator callback, and a ValidatorFactory that instantiates the validators based on a new valdiation rule DSL written in JSON or XML.
...Easy!
Now what? the username is in an unexported field and unusable? I can kind of see what its going for but it seems like a way just to add another layer of wrapping and indirection.
It would need a getter here. Probably good to keep it immutable, if you want guarantees that it will never be changed to something that violates the username rules.
Yeah, thats what I figured. Im not sure if I want the tradeoff of calling .GetValue in multiple places just to save calling validate in maybe 2 or 3 places.
Not to mention I cant easily marshal/unmarshal into it and next week valid username is a username that doesnt already exist in the database.
Maybe this approach appeals to people and Im hesitant to say “that’s not how Go is supposed to be written” but for me this feels like “clever over clear”.
The tradeoff is not that you save calling validate, it’s that you avoid forgetting to call validate in the first place, because when you forget to validate, you get a type error.
IMO it’s a little more clear this way:
It lets you write code that is little more obvious.I’m not sure I understand. In your example you’ve grouped related data in a struct and validating that it matches your system’s invariants, that feels good to me.
The original example was more “wrap a simple type in an object so it’s always validated when set” which looks beautiful when you don’t have the needed getters in the example nor show all the Get call sites opposed to the 1 or 2 New call sites. All in the name of “we don’t want to set the username without validation” but without private constructors Username{“invalid”} can be invoked, the validation circumvented and I’m not convinced the overhead we paid was worth it.
The countless bugs I've had to deal with and all the time I've lost fixing these bugs caused by people who forgot to validate data in a certain place or didn't realize they had to do so proves to me that the overhead of calling a get on a wrapper type is totally worth it.
I value the hours wasted on diagnosing a bug far more than the extra keystrokes and couple of seconds required to avoid it in the first place.
No, you’ve achieved an illusion of that as now your spending hours wasted on discovering where a developer forgot to call NewUsername and instead called Username{“broken”}. I cant see the value in this abstraction in Go.
They can’t because value is not exported. They must use the NewUsername function, which forces the validation.
In my opinion, this pattern breaks when the validation must return an error and everything becomes very verbose.
Oh, thats true about it being unexported. I hadn’t considered that.
I always understood "parse don't validate" a bit differently. If you are doing the validation inside of a constructor, you are still doing validation instead of parsing. It is safer to do the validation in one place you know the execution will go through, of course, but not the idea I understand "parse don't validate" to mean. I understand it to mean: "write an actual parser, whatever passes the parser can be used in the rest of the program", where a parser is a set of grammar rules for example, or PEG.
I'm not a Haskell developer, so it's possible that I misunderstood the original "Parse, Don't Validate" post.
Why that would be considered validation rather than parsing?
From the original post:
That's the key idea to me.
A parser enforces checks on an input and produces an output. And if you define an output type that's distinct from the input type, you allow the type system "preserve" the fact that the data passed a parser at some point in its life.
But again, I don't know Haskell, so I'm interested to know if I'm misunderstanding Lexi Lambda's post.
Parse don't validate means that if you want a function that converts an IP address string to a struct IpAddress{ address: string } you don't validate that the input string is a valid IP address then return a struct with that string inside. Instead you parse that IP into raw integers, then join those back into an IP string.
The idea is that your parsed representation and serializer are likely produce a much smaller and more predictable set of values than may pass the validator.
As an example there was a network control plane outage in GCP because the Java frontend validated an IP address then stored it (as a string) in the database. The C++ network control plane then crashed because the IP address actually contained non-ASCII "digits" that Java with its Unicode support accepted.
If instead the address was parsed into 4 or 8 integers and was reserialized before being written to the DB this outage wouldn't have happened. The parsing was still probably more lax than it should have been, but at least the value written to the DB was valid.
In this case it was funny Unicode, but it could be as simple as 1.2.3.04 vs 1.2.3.4. By parsing then re-serializing you are going to produce the more canonical and expected form.
Thanks for that explanation! I hadn't appreciated that aspect of "parse, don't validate," before.
But even with that understanding and from re-reading the post, that seems to be an extra safety measure rather than the essence of the idea.
Going back to my original example of parsing a Username and verifying that it doesn't contain any illegal characters, how does a parser convert a string into a more direct representation of a username without using a string internally? Or if you're parsing an uint8 into a type that logically must be between 1 and 100, what's the internal type that you parse it into that isn't a uint8?
Personally I don't think I would have used the phrase "parse don't validate" for something like a username. It isn't clear to me what it would mean exactly. I generally only thing of this principle for data that has some structure, not as much a username or number from 1-100.
IP address would be about the minimum amount of structure. Something else would be like processing API requests. You can take the incoming JSON and fully parse it as much as possible, rather than just validate it is as expected (for example drop unknown fields)
Just for the sake of example, your internal representation might start from 0, and you just add 1 whenever you output it.
Your internal type might also not be a uint8. Eg in Python you would probably just use their default type for integers, which supports arbitrarily big numbers. (Not because you need arbitrarily big numbers, but just because that's the default.)
Perhaps "normalize" or "canonicalize" is more appropriate. A parser can liberally interpret but I don't take it to imply some destructured form necessarily. There are countless scenarios where you want to be able to reproduce the exact input, and often preserving the input is the simplest solution.
But yes usually you do want to split something into it's elemental components, should it have any.
You can use new types with validation too. In fact the approaches seem to be duals.
Parse, don't validate:
Validate, don't parse:The problem is that pattern "fails open." If anyone on the team forgets to define an untrusted string as UnvalidatedString, the data skips validation.
If you default to treating primitive types as untrusted, it's hard for someone to accidentally convert an untrusted type to a trusted type without using the correct parse method.
The dual problem would be any function which forgets to accept a ParsedString instead of a string can skip parsing.
Both cases appear to depend on there being a "checkpoint" all data must go through to cross over to the rest of the system, either at parsing or at UnvalidatedString construction.
The difference is that if string is the trusted type, then it's easy to miss a spot and use the trusted string type for an untrusted value. The mistake will be subtle because the rest of your app uses a string type as well.
The converse is not true. If string is an untrusted type and ParsedString is a trusted type, if you miss a spot and forget to convert an untrusted string into a ParsedString, that function can't interact with any other part of your codebase that expects a ParsedString. The error would be much more visible and the damage more contained.
I think UnvalidatedString -> string also kind of misses the point of the type system in general. To parse a string into some other type, you're asserting something about the value it stores. It's not just a string with a blessing that says it's okay. It's a subset of the string type that can contain a more limited set of values than the built-in string type.
For example, parsing a string into a Username, I'm asserting things about the string (e.g., it's <10 characters long, it contains only a-z0-9). If I just use the string type, that's not an accurate representation of what's legal for a Username because the string type implies any legal string is a valid value.
The example also assumes that everything is like a 'ParsedString' that contains a copy of the original untrusted value inside.
I’ve found it hard to apply this pattern in Go since, if Username is embedded in a struct, and you forget to set it, you’ll get Username’s zero value, which may violate your constraints.
Why? You can easily call NewUsername inside NewAccount for example, just return the error. Or did I misunderstood?
Because go doesn’t have exhaustiveness checking when initialising structs. Instead it encourages “make the zero value meaningful” which is not always possible nor desirable. I usually use a linter to catch this kind of problem https://github.com/GaijinEntertainment/go-exhaustruct
I like this but in the examples would volume be calculated by width/length rather than being set?
But if you then create a constructor / factory method for that struct, not setting it would trigger an error. But this is one of the problem with Go and other languages that have nil or no "you have to set this" built into their type system: it relies on people's self-discipline, checked by the author, reviewer, and unit test, and ensuring there's not a problem like you describe takes up a lot of diligence.
Just do
And replace withThe problem there is that you lose the guarantee that the parser validated the string value.
A caller can just say:
You also get implicit conversions in ways you probably don't want:That's true I did not think of that.
If you do that, people outside the package can also do Username(x) conversions instead of calling NewUsername. Making value package private means that you can only set it from outside the package using provided functionality.
This is annoying to translate later. At least also include some error code string that is documented somewhere and isn't prone to change randomly.
I mean, you may end up just wanting something like,
And reason can be "username cannot be empty" or "username may not contain '<'" or something like that.This is fine for lots of different cases, because it’s likely that your code wants to know how to handle “username is invalid”, but only humans care about why.
I have personally never seen a Go codebase where you parse error strings. I know that people keep complaining about it so it must be happening out there—but every codebase I’ve worked with either has error constants (an exported var set to some errors.New() value) or some kind of custom error type you can check. Or if it doesn’t have those things, I had no interest in parsing the errors.
I write mostly frontends. Sometimes the APIs I talk to give back beautiful English error messages - that I can't just show to the user, because they are using a different language most of the time. And I don't want to write logic that depends on that sentence, far too brittle.
Right—I think the “error code” here is going to be the error type, i.e., UsernameError, or some qualified version of that.
It’s not perfect, but software evolves through many imperfect stages as it gets better, and this is one such imperfect stage that your software may evolve through.
Including a human-readable version of the error is useful because the developers / operators will want to read through the logs for it. Sometimes that is where you stop, because not all errors from all backends will need to be localized.
But surely this is just another way of doing validation and not fundamentally "parsing"? If at the end you've just stored the input exactly as you got it, the only parsing you're potentially doing is in the validation step and then it gets thrown away.
Implementation-wise, yes, but the interface you're exposing is indistinguishable from that of a parser. For all your consumers know, you could be storing the username as a sequence of a 254-valued enum (one for each byte, except the angle brackets) and reconstructing the string on each "get" call. For more complex data you would certainly be storing it piecewise; the only reasons this example gets a pass are 1) because it is so low in surface area that a human can reasonably validate the implementation as bug-free without further aid from the type checker, and 2) because Go's type system is so inexpressive that you can't encode complex requirements with it anyway.
The validation is not completely thrown away, since the type indicates that the data has been validated. I understand "parsing" as applying more structure to a piece of data. Going from a String to an IP or a Username fits the definition.
I push my team to use this pattern in our (mostly Scala) codebase. We have too many instances of useless validations, because the fact that a piece of data has been "parsed"/validated is not reflected in its type using simple validation.
For example using String, a function might validate the String as a Username. Lower in the call stack, a function ends up taking this String as an arg. It has no way of knowing if it has been validated or not and has to re-validate it. If the first validation gets a Username as a result, other functions down the call stack can take a Username as an argument and know for sure it's been validated / "parsed".
This is a good design pattern, but be wary of doing validation too early. The design pattern allows you to do it as early or late as you like, but doesn't tell you when to do it. Often it's best to do it as part of parsing/validating some larger object.
See Steven Witten's "I is for Intent" [1] for some ideas about the use of unvalidated data in a UI context.
[1] https://acko.net/blog/i-is-for-intent/
I read through that piece and strongly disagree with the premise that their insight is somehow at odds with leaning into the type system for correctness.
The legitimate insight that they have is that anchoring the state as close as possible to the user input is valuable—I think that that is a great insight with a lot of good applications.
However, there's nothing that says you can't take that user-centric state and put it in a strongly typed data structure as soon as possible, with a set of clearly defined and well-typed transitions mapping the user-centric state to the derived states.
Edit: looks like there was discussion on this the other day, with a number of people making similar observations—https://news.ycombinator.com/item?id=39269886
A text file and an abstract syntax tree can both be rigorously represented using types, but one is before parsing and other is after parsing. The question is which one is more suitable for editing?
Text has more possible states than the equivalent AST, many of which are useful when you haven't typed in all the code yet. Incomplete code usually doesn't parse.
This suggests that drafts should be represented as text, not an AST.
And maybe similarly for drafts of other things? Drafts will have some representation that follows some rules, but maybe they shouldn't have to follow all the rules. You may still want to save drafts and collaborate on them even though they break some rules.
In a system that's not an editor, though, maybe it makes sense to validate early. For a command-line utility, the editor is external, provided by the environment (a shell or the editor for a shell script) so you don't need to be concerned with that.
My Go is rusty, do you mean not exporting the type "Username" (ie username) to avoid default constructor usage?
In Go, capitalized identifiers are exported, whereas lowercase identifiers are not.
In the example I gave above, clients outside of the package can instantiate Username, but they can't access its "value" member, so the only way they could get a populated Username instance is by calling NewUsername.
Conceptually equivalent to the ancient arts of private constructors and factory methods.
Which (in Java) were then abstracted away in... interesting annotations.
the fact that this is some special “technique” really shows how far behind Go’s type system & community around typing is
em ai have a problem from cars
But copy-pasting the same lines of code in literally every function is the Golang Way.
It makes code "simpler".
This is a variation on one of my favorite software design principles: Make illegal states unrepresentable. I first learned about it through Scott Wlaschin[1].
[1]: https://fsharpforfunandprofit.com/posts/designing-with-types...
Related:
Parse, don't validate (2019) - https://news.ycombinator.com/item?id=35053118 - March 2023 (219 comments)
Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=27639890 - June 2021 (270 comments)
Parse, Don’t Validate - https://news.ycombinator.com/item?id=21476261 - Nov 2019 (230 comments)
Parse, Don't Validate - https://news.ycombinator.com/item?id=21471753 - Nov 2019 (4 comments)
So far I like the commonly used approach in the Typescript community best:
1. Create your Schema using https://zod.dev or https://github.com/sinclairzx81/typebox or one of the other many libs.
2. Generate your types from the schema. It's very simple to create partial or composite types, e.g. UpdateModel, InsertModels, Arrays of them, etc.
3. Most modern Frameworks have first class support for validation, like Fastify (with typebox). Just reuse your schema definition.
That is very easy, obvious and effective.
Encapsulation saves lives.
AKA 'Value Object' from DDD or a similar 'Quantity' accounting pattern. Another angle is that this fixes 'Primitive Obsession' code smell.