I would have presumed that security-minded people, which includes those who work in tech, would not so easily give away their genome, and that most of 23andMe's customers are a slice of the general population. But then I read about things like WorldCoin and that people who go to startup parties jump at the chance to give away scans of their retinas and I'm befuddled. Why would anyone willingly do that?
I'm familiar with security (I keep a copy of Applied Cryptography on my shelf for "fun reading") and tech, here's a copy of my whole genome: https://my.pgp-hms.org/profile/hu80855C Note it's a full human genome, far more data than a 23&Me report. You can download the data yourself and try to find risk factors (at the time, the genetic counsellors were surprised to find that I had no credible genetic risk factors).
Please let me know in technical terms, combined with rational argument, why what I did was unwise. Presume I already know all the common arguments, evaluated them using my background knowledge (which includes a PhD in biology, extensive experience in human genome analysis, and years of launching products in tech).
I've been asking people to come up with coherent arguments for genome secrecy (given the technical knowledge we have of privacy, both in tech and medicine) and nobody has managed to come up with anything that I hadn't heard before, typically variations on "well, gattaca, and maybe something else we can't predict, or insurance, or something something".
1) You can be subject to discrimination based on your ethnicity, race, or health related factors. That's especially a problem when the data leaks at scale as in 23andme's case because that motivates the development of easy-to-search databases sold in hacking forums. The data you presented here would be harder to find, but not the case with mass leaks.
2) It's a risk for anything that's DNA-based. For example, your data can be used to create false evidence for crimes irrelevant to you. You don't even need to be a target for that. You can just be an entry in a list of available DNA profiles. I'm not sure how much DNA can be manufactured based on full genome data, but with CRISPR and everything I don't think we're too far away either. You can even experience that accidentally because the data is out there and mistakes happen.
3) You can't be famous. If you're famous, you'd be target of endless torrent of news based on your DNA bits. You'd be stigmatized left and right.
4) You can't change your DNA, so when it's leaked, you can't mitigate the future risks that doesn't exist today. For example, DNA-based biometrics, or genome simulation to a point where they can create an accurate lookalike of you. They're not risks today, doesn't mean they're not tomorrow.
There are also additional risks involved based on the country you're living in. So, you might be living in a country that protects your rights and privacy, but it's not the case with the others.
You forgot an important one: Your ancestors, descendants, siblings, and cousins share much of the same DNA but did not consent to its release. All of the above risks apply to them as well. I'd be most concerned about insurance companies using genetic family history to deny coverage.
I'm not too worried about it because it's never a 100% overlap. Even my brother and I share only ~50% DNA. It gets way sparser for more distant relatives.
About insurance companies, they're legally forbidden to use such data.
Great training set to check the results of other factors, then use those to infer.
Moreover "legally forbidden" means jack faeces unless you can point to people who had convictions recorded and went to jail. Otherwise we're merely discussing business conditions & expenses.
I mean, of course but that’s applicable to all regulations, isn’t it? Yes, they can be violated, but what else do we have?
If you keep things secret they can't be used in a regulation breach by people who don't know those things.
We have /that/.
Theft is illegal and you lock your house, and that regulation is a serious one. The idea we have nothing but regulation is absurd in the extreme.
This is completely false. Any two random humans have more than 99% overlap by virtue of being the same species. It's even higher for brothers. We also share around 90% DNA with cats, dogs and elephants.
https://www.amacad.org/publication/unequal-nature-geneticist...
This doesn't make sense. If they were equal, you'd be the same person except for environmental differences. Many applications don't need equal DNAs. E.g.
https://youtube.com/watch?v=KT18KJouHWg
This is a very weak argument. There's a long history of companies doing illegal things, and even if it's illegal today it doesn't mean it'll be illegal tomorrow.
I think it was clear that @sedatk was referring to the 1% that separates him from other human beings, not the 99% that separates him from trees.
Why do you think people are entitled to have genome data on you? The morality is flipped. Privacy is recognized as a core, natural right. Others have to prove their onus for wanting your biological data. Trusting others is a moral and character weakness, because you have no guarantees as to how that data will be used. Or more specifically, what new ways to analyze and take advantage of that data will become.
I think actuaries will care an awful lot about this data and could use it to negatively influence your risk factor, and thus insurance premiums.
I think if your prior includes "trusting others is a moral and character weakness" then I don't think it's useful for us to discuss this topic further.
As for actuaries, in the US, the GINA law prevents health insurance companies from using this data. I think legal protection is much more important than attempting to hide my DNA.
I agree, if you can't justify trust with reason then it's hard to trust your argument that relies on trust. Trust can be broken, and your stance doesn't address that concern.
While I hold privacy in high regard, your standpoint on trust is pretty extreme.
With your own "trust can be broken", you could conclude that you should distrust "with reason" (hey, it was broken) — basically, flipping it is an equally sound stance.
As a rule, I trust people, keep private stuff not easily aggregated (eg. I might talk some stuff over lunch, but will not email it to the person so they have it on record), and I am quick to distrust people once they fail me. Legal protections do matter, because they discourage misuse of unintended data sharing.
The law could change, allowing the usage of your data without your consent.
Where is it stated exactly that privacy is a core, natural right? Not in the Constitution, though the 4th suggests it. It’s not part of the natural order, I don’t think (most stuff is out in the open). I’m not saying I think privacy is bad or people deserve to have their info out in the open, I just don’t understand why people feel such a right to it, or where governance — natural or man-made — dictates it.
They could also use it to positively influence my risk factor.
Sure, if you don't believe in any of the potential negative scenarios, anything goes. You could also post your full name, SSN, DOB, address, etc. here if you are secure in the knowledge that no harm could ever come of it.
I think what they're saying is that name (probably not), SSN (almost definitely), DOB (maybe?) and address (probably) have known, confirmed risks. There are current ways that bad actors can abuse that information.
Genome is still pretty theoretical, except getting caught for committing crimes.
I just checked, and using my True Name (https://en.wikipedia.org/wiki/True_Names) I can easily find my DOB, prior addresses and phone numbers, and using that information, it's likely I could make a reasonable guess for the SSN.
it's likely I could make a reasonable guess for the SSN.
It is? I mean then why are we bothering to protect anything, this shit is all super available for any given person.
SSNs are fairly predictable- if you know region of birth and DOB you can get awfully close, for a wide range of the population.
https://www.pnas.org/doi/10.1073/pnas.0904891106
Konerding's 12th law, amended: "There is no bit of pseudonymized data which cannot be de-anonymized by a sufficiently motivated MIT grad student" (not entirely joking; see https://archive.nytimes.com/bits.blogs.nytimes.com/2015/01/2...)
The question is, what are the potential negative scenarios.
I think we already know for sure that posting a combination of full name, SSN, DOB, and address is a reliable way to provide scammers with the necessary information to commit fraud.
That's not the same risk because 23andme also has name, address, email.
One risk if you have PII+genome is that a technically sophisticated entity can determine if you've physically been in a location. Also with an extensive PII+genome database they could find your family, for example for blackmail purposes.
Another risk is that a health insurance provider could deny you based on potential health issues they find in your genome.
Yes, but technically sophisticated entities can also use methods that require less effort.
https://xkcd.com/538/
That's your defense? You asked for actual risks and when shown real, plausible ones recede into XKCD quotes. Clearly just a spoiler.
What real, actual risks which I didn't already know about have been shown in this thread?
The point is that while you can use DNA to identify people in most cases, sufficiently motivated adversaries have more effective, cheaper, lower-technology approaches that they will use first.
Like with many things, the issue is the aggregation of data on many individuals (a database), and easy accessibility of your individual data on request (discoverability and processing).
Me shouting my sensitive private details in a crowded bar is entirely different from putting them on my webpage. There's even a difference between writing them down on a napkin or shouting them out.
Technically, even without PII an adversary could determine that you have been in a physical place, they just wouldn't know what to call you.
For one thing, this leaks a portion of the genome of your relatives, which is a clear breach of their privacy. Whether you personally deem it sensitive or not, genetic data is meant to remain confidential.
I don't believe making my genome available, which contains similarity to my relatives, is a breach of their privacy.
I think part of my point is that DNA, by its nature, simply cannot remain confidential, and that thinking we can keep it that way is just going to lead to inevitable disappointment.
First, some people extend your argument from DNA to everything and say "I believe that privacy in the modern world is unrealistic"; that doesn't make the argument applicable to the rest of us.
Second, whether DNA can or cannot remain confidential is yet to be seen, but feasibility is certainly orthogonal to whether it ought to be, which is the point at hand.
Third, whether you believe it's a breach of privacy to leak part of your relatives' DNA is besides the point. It's their decision to make, since it's their personal data and deemed confidential under most privacy frameworks, and therefore a breach.
To your first point: Yes, I generally extend my argument to more or less everything in the modern world. Put your garbage out on the street: reporters can rifle through it looking for evidence.
To your second point: we already know DNA can't remain confidential (there is no practical mechanism by which even a wealthy person could avoid a sufficiently motivated adversary who wanted to expose their DNA). That's just a fact, we should adjust our understanding based on that fact.
Most important: sharing my genomic information with the world is not a breach of any privacy framework I'm aware of and subject to (US laws). Do you have a specific framework or country in mind?
Fully agree with you here. I can understand why people argue "We must do everything possible that no human being ever finds out anything medical-related about another human being, ever"
But that is a value judgement, and I believe it is one that comes at a great cost to society- I wouldn't be surprised if >50% of the cost of medical care is directly or indirectly due to this attitude, and that medical progress has been slowed immensely for the same reason.
If we could make medical data more open, it would greatly benefit the vast majority of people. OF COURSE it is true that some smaller number of other people/patients are helped by the existing medical secrecy system. I fully admit this is a trade-off, where we have to decide what values are more important.
(source: Am medical doctor)
This is disgusting. You want people knowing the maladies they got treated, and how?
There's the old saying of knowledge being power. If you want this information about people being spread, then you're advocating having power over these people over that information.
It takes very little imagination to see how humans would misuse this data.
it's a tradeoff
I'm disgusting for "people having power over other people", you're disgusting for the graveyard of dead people due to the status quo system.
I'm gonna start making clones of you.
I'm fine with that, but merely having my genome sequence doesn't enable you to do that.
Wasn't your original argument that they could easily get your genetic material (to figure out the genome from) anyway?
Would a bunch of your cells be sufficient at some point in the near future? (I know progress is being made to turn any cell into a reproductive cell, but that's still not exactly the same thing, but it's on that exact path)
You still might not mind a bunch of your clones though, so I don't think that's much of an argument.
Generally, being pseudo-anonymous is what allows open and free discussion (but lots of vitriol too).
While genetic information is not yet understood well enough by masses to be abused in stereotyping and rejecting and — indeed — "cancelling", there is a huge potential to do so. This especially holds true for gender, racial, national differentiation, genetic disease potential and health profiling — all accessible through a full genome (even if some of the indicators are not with 100% confidence). Lots of this can also be used to start linking genome data to an actual person (helped with data from other contexts), which is where it starts to become risky according to known risk profiles.
Unsurprisingly, someone who is likely a white male (I could have checked using your genome too, but loading up your profile above confirms that) with "no credible genetic risk factors" is a lot less concerned about opening up their genome to the public: you are unlikely to get discriminated against. With that said, even you can get potentially ignored for your privilege: even I just engaged in that — somewhat discounting a part of your experience/claim because you are a white male. Part of that is also education: your extensive experience in the field allows you to make an educated choice. Many can't attain that much knowledge before they decide whether to share their genome or not.
This opens up the question similar to that entire face recognition fiasco — how will unprivileged be affected by the privileged being mostly used to train the models on and do research on?
So the question is how do we ensure enough anonymity to make everyone happy to contribute to the world knowledge, but reduce chances of linking data back to actual people? I know nebula.org is doing something of the sort (though mostly just guaranteeing that they will remove the data at your request, and not share it without your permission), but we could have one genome produce a bunch of part-genomes, still allowing causation/correlation research, but none of them having the full picture.
That would disable some of the groundwork research (is there a correlation/causation only visible in the full genome or larger part of it?), so it's a tricky balance to find.
And finally, I always like to make this choice a bit personal: how would you feel about your child being linked to a criminal case due to your genome being publicly available?
One non-theoretical risk is that you or a relative leaves DNA on the scene of a crime you didn't commit (or?), and this makes you a suspect. This is also assuming a real identity is tied to the DNA.
So let's assume you committed to publishing your genome in advance regardless of result. Sounds like you spun the barrel and dry snapped to demonstrate that russian roulette is safe for everybody.
Tell us about how differing views on this to yours would influence opinion about your products you've launched in tech given your extensive experience in human genome analysis. Not at all?
This really may not be a case of being unable to understand something one's paycheck depends on not understanding at all but we can't know that yet.
I am a security engineer. When I signed up for 23andme, I assumed with certainty that it would be hacked and all data leaked at some point. I balanced that with the value of knowing potentially important health/genetic bio markers.
In the end, I valued knowing these bio markers above the privacy of my genome. The former is actionable and I can use it to optimize my health and longevity; the latter is of vague value and not terribly exploitable outside of edge-case threat models.
Exactly my thoughts.
I'd be more upset if a combination of my name and email/phone number got leaked than if my DNA was made available public.
Why would you be upset if your name+phone combo was leaked? Mine is all over internet so wonder why you feel it would be bad.
I simply don't want to deal with spam or scams. If I'm exposing my contact details it would be a separate set that is dedicated to dealing with communication coming from the public.
Why? You can change your phone number and your name. Good luck with doing so with your DNA.
And that is exactly why they can be changed - because they're valuable details that can be used to track someone down. Your DNA is easily obtainable and is not used in any meaningful way that would affect your life if it was exposed.
Phone numbers are an increasingly important identifier. Sucks to lose one.
Q: Is it a HN thing to be (obsessively?) interested in health and longevity?
Dying is a natural process. Sorry.
I don't really care whether it's natural or not. Maybe if you ever have a NDE you will understand.
It's a human thing. Not all humans, but many.
Avoiding dying, as best one can, is also a natural behaviour.
We fight all sorts of natural processes. Most common forms of death from a couple of centuries ago are solved. Our average lifespan has increased dramatically. We fly around in planes, travel to space, grow fruit out of season and build giant cities.
As a species, we're excellent at working around or ignoring what's "natural".
In retrospect, how do you so far value the utility of the data you got? Did you take any actions based on them, do you think you will be doing so in the future?
Luckily I had no severe biomarkers. Some minor ones, but nothing I didn't know already. I loved learning about my ancient ancestry, though (ie migratory patterns 300k years ago.)
On balance, was the utility worth the cost (of a breach)? Probably not, because I found no major actionable issues. But if I did find severe biomarkers, it would have been worth it. So I do still think I made the right choice.
Or the reality is, if someone wants your dna they will follow you around and grab a coffee cup.
Yes, yours specifically, but what if I want like 200.000 people so I can find one that has a DNA profile similar to mine, who could serve as a escape-goat or victim?
Maybe I want to steal a kidney, or a child that could reasonably pass as my own?
There are already literally entire databases of millions of peoples DNA freely available for scientific research.
Not with names and contact information I assume?
If you were smart enough to hack 23andMe to get genetic data to find a specific person, you'd be smart enough to reconstruct identities from publicly available data. You'd just have to cross-reference public anonymous databases with public non-anonymous ones. Both of which exist, and are free.
So far, the only real use-case for doing this is people trying to identify criminals from just DNA.
You realize this data is often available for purchase or eventually publicly leaked, right? You don't have to be "smart enough" to do the hacking to benefit from it.
In the US, the bad actor here is much more likely to be insurance companies who can tune their secret algorithms to make sure no one with a gene tied to an illness which blooms later in life can get affordable heath care.
In the US, health insurers can only price based on age, location, and tobacco use. Setting health insurance premiums or denying coverage based on any health-related factors has been illegal for over a decade, and changing that would be totally unviable politically.
However, it's a significant risk for other types of insurance including life, disability, and long term care.
Just because it's illegal, doesn't mean health insurance companies don't find loopholes, and consider fines when they get caught as the cost of doing business. See this series of articles[1] for some of their criminal shenanigans.
It's more than likely that they would use genetic data to deny insurance, and then settle the cases in court if they happen to get sued, which statistically is probably a rare occurrence.
[1]: https://www.propublica.org/series/uncovered
They are denying claims. If they are going to do that, why would they condition it on genetics (vs just denying anything they think they can)?
The paranoia about insurance and genetics is that they simply refuse to do business with high risk customers.
Unless this is an online joke I don't get, I think you mean "scapegoat".
Seems to be the same thing.
"The concept comes from an ancient Jewish ritual described in the Bible, specifically in Leviticus 16. During the Day of Atonement (Yom Kippur), two goats were chosen: one to be sacrificed and the other to be sent into the wilderness, symbolically carrying away the sins of the community. This second goat was called the "Azazel" or the "scapegoat".
Over time, the term "scapegoat" evolved to have a more general meaning in English. It came to refer to a person or group that is unjustly blamed for the problems or misfortunes of others, reflecting the original ritual in which the goat was symbolically burdened with the sins of others before being sent away. "
The same people believed crypto-currency, infinite growth, social media and many other things. At least 23andMe provided actual value, to some at least.
What I find strange is that 23andMe did not automatically delete data after 30 days, or at the very least took it offline, only to be available on request. Notify people that their results are available and inform them that the data will be available for 30 days after the first download. This is potentially really sensitive data and based on 23andMe's response, they seem to be aware of that fact. So why would they keep the data around? That seem fairly irresponsible and potentially dangerous to the company.
What actual value did 23andMe and similar services offer in the first place?
Quenching someone's curiosity about where their ancestors are from? Do we even know how accurate it is at doing that?
I was adopted. I have no idea who my biological parents were or what genetic risks I might have inherited from them. When the doctor asks "Has anyone in your family ever had <fill in the blank>?" I have no answer to those questions without a genomic test.
Ancestry data, but also health markers. I.e. you're probably going to get macular degeneration, Tay-Sachs and cervical cancer.
Once I enabled the social graph thing I was immediately hounded by distant relatives who I assume want to chop me up for parts.
The police have closed a few cold murder cases based on adjacency (once Parabon got their hands on samples), so it must be pretty accurate.
Anecdotally, my profile told a radically different story about our ancestry than my family's vague lore led me to believe. 23andMe's data made way more sense.
If you go back in time, 23andMe was founded to collect genetic data with the goal of using that data to improve the health condition of humanity.
Over time it became clear that 23andMe's data set had limited predictive ability for health for a number of technical reasons (previously, dahinds, one of their statistical geneticists, has defended the quality of their predictions on HN, you can search for his comments. I suspect he can no longer comment on HN because of 23&Me's security debacle).
However, around that same time, 23&Me's dataset turned out to be excellent for ancestry analysis. It's generally considered fairly accurate (not just 23&Me- the entire process of ancestry through snp genotyping workings really well).
I never did 23&Me but my dad did- and he learned he has children all around the US (half brothers and sisters of mine) from some samples he provided some 45+ years ago. Both my dad and those people gained value from making that connection. It's interesting because my dad had already done most of the paper research (including going to SLC to visit the Mormon archives) to identify our obvious ancestors, and these relatives would never have shown up.
I just wanted to confirm my connection to royalty because I've always felt, y'know... special
Locating secret/hidden family is kinda nice.
Their service is selling you a dashboard over your genetic data that’s continually updated for new gene correlation studies and ancestry matches. It’s not really the one and done “Promethease” style analysis service you’re thinking of.
They will NOT delete your data even if you request a full account deletion, so surely they aren't interested in voluntarily deleting it.
It's all in the fine print. The labs will keep the genetic information as well as at least your DOB and sex for at least 10 years (CLIA requirements), and 23andMe will keep your identifying information (such as your email address) and account deletion request ID for some undefined period of time. Yes, this will remove some links (and birthday paradox works in user's favor), but this is certainly not a full and complete removal.
You didn't need to supply accurate information, this isn't a bank here with any validation of your identity.
You can at least change your name. You can't change your DNA, so when companies start selling that data it will be easy to detect when you give out fake information.
The only missing piece is a way to scan your DNA as part of a login form.
What good is my DNA without a real identity attached to it?
Idk, it probably has some value. But my point was that it's going to be difficult to prevent your real identity from becoming attached to your DNA forever. The moment your real (DNA, identity) pair leaks from a credible source, your privacy is permanently and retroactively ruined.
So if 23andMe leaked a fake name with your DNA, it's out there in the hands of advertisers/scammers/governments/etc. From now on, anyone who gets access to your DNA will be able to build up data on you, and all it will take is a single leak/sale from a credible source to make it accurate.
(...but in truth, I have no idea what "DNA data" looks like, or if it's even possible to use it for targeting...)
If someone else is leaking a credible ID/DNA combo, it doesn't matter whether or not I did 23andMe. And credible identification is actually kinda hard.
It will be a cold day in hell before I ever submit to dna analysis of this nature.
That doesn't stop my family from doing so, but I sure as hell will never.
So they've basically done it for you. Primary sensitive information is about is predisposition to hereditary disease. That's the same for you and your siblings.
I understand that but I can't control them so I must draw the line where I'm able.
I'm befuddled that anyone thinks Sam Altman is the least bit trustworthy after WorldCoin.
There is a difference between genomic data and biometric data: biometric data has a known potential exploit vectors. So, with a picture of your retina, a sophisticated adversary could potentially reproduce your retina to allow access to some secure facility.
Genomic data doesn't have the same risk factors--at least at the moment. I think that the point many are trying to make here is that there may be risk vectors available at some point in the future that aren't known now. A couple of theoretical examples:
* You had to give a blood sample rather than other biometric data like a retina scan.
* Spoofing DNA evidence. That would be very/prohibitively expensive/difficult at the moment, but I suppose could become as easy as 3d printing at some point in the future.
Poor and desperate people don't have the luxury thinking of these first world privacy issues. There a reasin Altman and launched it where they did.
That explains the WorldCoin but not 23andme, people voluntarily paid for that so they couldn't have been that poor.
The long term premise of WorldCoin is to not store retina scans in any way, and scanning stations in the US already do not do so.
'long term premise'
I know someone who is very security-minded, but also he was born to parents misplaced due to a war and they didn't know where they come from (their adoptive parents would only know a region, but not for sure). At the time it was an easy option to learn something about his heritage to him. His curiosity was satisfied.
Maybe they accept the possibility that they die one day?
I was 24 in 2015 and not in tech or as security minded as I am now when I received the test as a Christmas present. Obviously now I wouldn’t have dared do it, but it’s too late. Lacked the foresight at the time.
What's the implication here, that tech people should know better? I just don't care a ton about my privacy. At least that makes me not a hypocrite for working at a company that profits from user data (like many tech ones do).
Is this actually happening, or is that just what the stories say?
Well, in the case of WorldCoin, I think there's still some pretty significant questions of why they made Africa a prominent launch market (well, there are some reasons), but in some places they repeatedly increased incentives until they were offering people there up to a month's income to give their scans. That might not be a lot of money to a big startup, but is telling that they had to offer that much to get some people to "opt" in.