Google Drive misplaces months' worth of customer data

Not your drive, not your data.

Isn't this like saying "not your house, not your possessions" ? But landlords cannot just take or misplace my stuff, especially if I pay rent and have some rights.

What rights do you have in this case?

Google Drive allows you to upload, submit, store, send and receive content. As described in the Google Terms of Service, your content remains yours. We do not claim ownership in any of your content, including any text, data, information, and files that you upload, share, or store in your Drive account. The Google Terms of Service give Google a limited purpose license to operate and improve the Google Drive services — so if you decide to share a document with someone, or want to open it on a different device, we can provide that functionality.

From their terms of service, they deleted YOUR data. Thus the rights you'd have when anyone destroys your stuff apply. Hmmm, might even be criminal... I'm not a lawyer though, so I don't know shit.

If Google can delete it, it's not really *your* data is it?

https://blog.google/products/photos/storage-policy-update/

They say expressly when it will be removed in the ToS, if it doesn't meet any of those conditions, then they, contractually, cannot delete it.

But saying that "because they can, they can" is silly. I "can" (as in able to) break into your house and steal your shit. By that logic, if I can do it, it wasn't trespassing or theft.

But saying that "because they can, they can" is silly.

They can because user's agreed to their terms of service --- and by so doing so, relinquished some of their ownership rights.

Did you read them, because it very clearly states in the drive addendum that users DO NOT give up any ownership rights.

https://www.google.com/drive/terms-of-service/archived/

   The total liability of Google, and its suppliers and distributors, for any 
   claims under these terms, including for any implied warranties, is limited to 
   the amount you paid us to use the services (or, if the subject of the claim 
   is the free service, to supplying you the services again).

In other words, they can delete *your* data any time they want and claim it was an accident. If you don't like it, you can sue for your money back. If you're using the free tier, you can expect $0.

You can't sign away your rights, no matter what a contract says, in some places, tort still applies.

Like I said, I'm not a lawyer, but just because there is a limit on liability, does not mean there is a limit on damages and tort.

a limit on liability, does not mean there is a limit on damages and tort.

This is exactly what Google intends it to mean. Good luck convincing a judge otherwise.

In some places, the fact there is no signed contract and no exchange took place (free tier) means there is no liability --- the user received everything they paid for.

If I ask you to hold my phone and say “don’t blame me if it breaks” and we agree. Then you deliberately smash my phone, you caused me a damage and broke the terms of the contract. In this imaginary case, there was an implied agreement that you would hold the phone, but you didn’t. You smashed it. You aren’t protected from your liability agreement because it no longer applies.

This same thing applies here. Intellectual property has value. Google agreed to hold that value and not delete it. They stopped holding the property and smashed it. Their liability clause no longer works because they broke the contract. It doesn’t matter if money changed hands or not. Damage is damage.

I’m not a lawyer, I don’t know shit.

Almost every program you run can delete your data. Does that make it not your data?

Quick rule of thumb --- it's not really your data if the only backup is someone else's hands.

This needs some elaboration.

Scenario A: I take a photo. There is no backup. Is this my data?

Scenario B: I have multiple independent backups of a document. Google deletes the main copy off my computer against my will. Is that "not my data" because Google deleted it? Does the deletion not count because I have a backup? Third option?

"But landlords cannot just take or misplace my stuff, especially if I pay rent and have some rights."

They sure can. Years ago I had a shitty landlord clear an apartment I'd moved 95% of my belongings from but before the term of my lease had ended. Now you may have some legal recourse, but the cost/time associated with litigating a matter like that tends to outweigh the remedy.

Someone can also rob your house if you own it. Now you may have some legal recourse, but...

The "legal recourse" is what defines "ownership".

If someone can take or destroy your stuff *without* legal recourse, then you don't really "own" it'

Okay, and how does this connect to home ownership?

You have even less recourse against the average burglar than the average landlord.

You have even less recourse against the average burglar than the average landlord.

???

Where I live, you can literally kill the average burglar for breaking into your home. That is pretty much the ultimate recourse.

If you're not home and the police can find your burglar, you can legally press charges against him/her.

You can physically stop a landlord from illegally dumping your items too.

I'm talking about legal recourse after the fact. You can win both cases, at your own expense, but the burglar is much more likely to be judgement-proof.

If a crime was committed, the state generally assumes the prosecution --- i.e. no expense to you.

You can't get blood out of a rock but you can incarcerate it.

That's free if it's big enough to get their attention, and the culprit is known.

But I don't want free revenge, I want my stuff/money back. An actual remedy.

I want my stuff/money back.

As the Rolling Stones pointed out years ago (maybe before you were born), "You can't always get what you want" --- yadda, yadda, etc., etc..

It is not just about giving you what you want. The fact that *legal* retribution of some sort applies if someone takes or destroys your stuff is a defining characteristic of "ownership".

Interesting that you cut the line there, because control over personal possessions usually goes under the "need" category.

Anyway I'm still not sure what point you were originally trying to make, because defeating a landlord in court and getting paid back is at least as good of a proof of "ownership".

Both situations have big flaws in the legal recourse, but you definitely have it in both situations.

Though I prefer the one where I get compensation.

They sure can. Years ago I had a shitty landlord clear an apartment I'd moved 95% of my belongings from but before the term of my lease had ended. Now you may have some legal recourse, but the cost/time associated with litigating a matter like that tends to outweigh the remedy.

The existence of legal standing is what defines the property rights. If you're going to move the goalposts to "you don't own it unless you have legal standing and the means to pursue legal recourse", then you might as well say that only the extremely wealthy have any property rights at all.

It’s like that, but works differently depending on the country. If you live in a decent country with good laws and rules, then you have rights. If you live in a third world country, then you are out of luck. Google is a third world country on the internet in terms of “user’s rights”.

But you control the access to the house. Your landlord won't enter your house and rearrange or remove your furniture.

You put the analogy there, then broke it down - which should answer your own question. They are not the same, it's not like saying that at all. Once again HN misses the point on the quest for the perfect comparison.

Only because there are strong legal protections for residential tenants. Those laws don't apply to your relationship with Google.

I'm sure the commenter did not mean this literally.

They probably meant trusting a cloud provider can have consequences.

No, it definitely doesn't sound like that and it's not about rights. Your landlord cannot go and remodel your apartment and ruin your possessions without your approval of the changes they are about to make.

If I pay for the storage, then it is my drive and my data.

The article didn't say anything about the free or paid GDrive accounts. If it is a free account, then that is up for debate. If it impacts the paid consumers, they paid for the service and it is their data.

do you have a link to the legal decisions that confirm this is the case and establish what rights the customer actually has in this relationship?

What rights do the contracts that a paid customer signs with Google say they have over the data?

From what I see in the free terms of service: "You retain ownership of any intellectual property rights that you hold in that content. In short, what belongs to you stays yours." which is neat.

You have the rights to your data, but there doesn't seem to be any stated obligation for Google to keep that data or make it accessible to you.

You have the rights to your data, but there doesn't seem to be any stated obligation for Google to keep that data or make it accessible to you.

On a paid account, clearly that's part of the contracted service being provided. There is absolutely an obligation to provide the service you took payment for.

But no, in general in our society, gratis products aren't required to carry warranties, for obvious reasons (no one would provide them if so).

On a paid account, clearly that's part of the contracted service being provided. There is absolutely an obligation to provide the service you took payment for.

yeah, unless there is a statement sort of like "We reserve the right to terminate accounts in the case of violation of terms of service as determined by our automated systems"

in which case you would need to go to court and get a determination that clearly that is wrong and they can't do that, in general in American society that's how things work.

That seems like a digression, though, since obviously none of these putative Drive customers reporting being notified of a TOS violation. I didn't say "service providers must provide service under all circumstances", I just said they had to follow their contracts, which demand provision of service under whatever terms the parties agreed on.

But if I go to drive.google.com, it quite clearly says "My Drive".

Indeed - Google refer to it as My Drive. They're making it clear it's not Your Drive!

Getting Smokey Bear vibes from this

https://www.youtube.com/watch?v=wX1x7pfH8fw

I’m going to laugh when they say that that’s the name of the feature. A “my drive”

Check mate!

Anecdotal, but with Google Photos, I've observed data loss to the point where I no longer trust the service with anything valuable.

Photos I know I've taken are missing. There are periods of time when I walked around a city 10+ years ago taking a large volume of photos, like when I first moved to Seattle. Going back to that window of time in Google Photos, I only have a handful of photos from that walk.

There are also partially corrupted photos from many years back. Photos that only partially render, or render as noise.

Luckily, all of these are random low-value photos from my youth. They aren't core memories of my kiddos growing up or anything. I'm glad I discovered the data loss in that window of time, and not by losing photos of my kiddos' birthdays.

The Google photos user this is really fucked up. I primarily use Google photos to manage my kids pictures as well what did you move to?

https://mega.io

The quality of their software is vastly superior to Google Drive. They have mobile, desktop, and CLI apps on Windows/Apple/Linux. Been using them for a year without issues.

This company is owned by Kim Dotcom, isn't it? Are you ok with entrusting him anything of value?

It was started by him but he is no longer involved. Even if he was still running the company I would trust him infinitely more than Google or Microsoft. Also I like that they are based in New Zealand.

Their software is well engineered and is open source. https://github.com/meganz

The solution isn't to move. Every storage mechanism is going to have some amount of failures. The solution is redundancy. Store your valuable files on multiple medium. I know a lot of people that have both google photo and icloud - that works okay. You can also schedule backups to a NAS or some other cloud storage.

The failures being reported in the comments here are unacceptable. Better to have redundancy across services that are more reliable with better customer support. Choose companies that specialize in file storage, not a shitty advertising company (Mega, Dropbox).

Personally I have turned off google photos and had turned off apple photos when I had an iphone for a couple of years.

Instead I use sync.com (yes, UI is horrible and it's slow, but it works) and my synology NAS that is mirroring to a synology NAS at my fathers house (we mirror each others photos).

Basic rule of thumb, if it's free they don't care.

You want to pursue uncorrelated redundancy more than you want a signular alternative.

People who came up in the cloud era were promised the false (impossible) idea that large conpanies could and would eradicate your risk of data loss if you just paid them a few dollars a month.

That's not a thing. If you really can't tolerate some loss, you need to coordinate something more robust across a few different and dissimilar options. And even then you'll still have risk.

That said, even just making sure you keep local copies of your archive or print your treasured favorites can go a long way towards greater reliability that without a lot of extra effort.

This is not for everyone, but I host my family photos myself, most recently with this: https://piwigo.org/. I have been doing this since 2007 (started on a different software, called "gallery". Was able to migrate from gallery2 to gallery3 and now piwigo), and so far no major issues. Advantage: I can easily share photos with family, no need for iCloud, Facebook, or indeed any service- they just need a web browser on their desktop computer or phone.

There really isn't any single service you can move to, to be "safe". This is why 3-2-1 backup schemes exist for important data, although they have morphed a bit with cloud usage. The basic idea is to be robust to any single point of failure (at minimum, but extra redundancy costs). So if your laptop or your google photos blows up, you still have everything.

If you have things you really care about not losing, you need a backup strategy.

I now back up valuable files to three regions in S3, three regions in Google Cloud, and offline physical storage I keep on a shelf in my office.

Everything in my family dropbox - I make sure all of them can fit on both my and my wife's laptops. With backblaze backup. And a backup to a local external hard drive. I also keep a checksum of each file that I regularly... check.

I have backup both to Google Photos and iCloud.

The probability of both failing and/or both deciding to just delete my account because I look "suspicious" is very small. But I _have_ had iCloud mysteriously not being able to retreive old photos.

I have Photos set to download all the files locally, and those get backed up with Time Machine and Backblaze.

People forget even s3 will lose files. The reliability is high sure but it’s still sub 100%

Has S3 ever lost a file yet? I know that the reliability isn't 100% as that isn't possible in the real world. But I've never heard a report of a file that was confirmed as stored being lost, and I was under the impression that this was in the category "bound to happen eventually, but hasn't happened yet".

S3 is designed for 11 9s of durability. What this means is that they model a "correctly functioning system" (i.e. replication and/or erasure coding) and the durability guarantee is about HW failures only. However, such models do not (& really cannot) account for the existence of bugs or introduction of new ones. That's a huge part of why S3 doesn't really do a whole lot of feature development (well that + it's hard to maintain a 20 yold codebase).

Also, we're talking about Google Drive here which isn't GCS (Google's S3 competitor) but a higher-level product layered on top of GCS but with it's own book keeping / ACLs etc. Thus there's more room for error. My hunch is that the data is permanently lost.

Additionally, S3 stores an enormous amount of data such that probabilistically they're bound to lose something to HW failure. 2 years ago, S3 stored 100 trillion objects [1]. With 11 9s of durability annually you'd expect to lose 100 objects a year. The saving grace is that most objects aren't accessed (maybe not ever again) & they detect & correct durability errors on access to ensure that accessed objects definitely aren't lost. So while they "admit" to 4 objects, that's likely an under count because I wouldn't expect them to regularly check if all 100 trillion objects are accessible because of how long that would take.

[1] https://www.zdnet.com/article/aws-s3-storage-now-holds-over-...

In December of 2012 AWS admitted that four files were lost: two completely deleted and two more truncated.

Probably not in the way you're thinking.

But with cloud systems you can always lose your files when a hacker gets access to your root account and holds it hostage, or you lose the phone with the 2fa secret on, or your payment doesn't go through and you miss the reminders, or your account is wrongly closed for abuse by a malfunctioning bot.

My concern about my own backup strategy (which, for Photos, is exactly the same as yours) is that I don't have a system in place that notifies me if files get corrupted or randomly deleted, and I'm not sure how I would go about implementing something like that. Any ideas?

s3 losing a file is a statistical anomaly and would get a lot more than a blog post and support deflection

Never had iCloud lose any of my data and I've been using it since not long after it launched, but now I'm a little concerned.

With photos I do keep them all local to my MacBook, but not documents. Maybe I should.

iCloud is not very reliable for me. It did mysteriously delete photos and files. I always use it just as a secondary backup since then, never as a primary backup

Tangentially related, if you've ever had the pleasure of migrating away from Google Photos/Drive with Google Takeout, prepare to spend some time fixing photo metadata in your library and re-embedding exif data with exiftool. Takeout strips out the exif data into a non-standard JSON sidecar, as opposed to something more standard/well supported like XMP.

It's unfortunately hostile for non-technical users that care about their photo metadata, which I assume is most people since it includes data such as creation timestamps and location. It's not too bad to script this if you're savvy (and careful), but otherwise you'll have to pay for a third-party tool[1]

[1]: https://metadatafixer.com/

Worse than that videos shared with you are reencoded to 1080p with crap bitrate. IIRC the JSON sidecar is also missing for these. Doesn't matter if you save them to your library. You can watch them in original quality and see all of the metadata on photos.google.com but they don't get taken out.

Tangentially related, if you've ever had the pleasure of migrating away from Google Photos/Drive with Google Takeout, prepare to spend some time fixing photo metadata in your library and re-embedding exif data with exiftool. Takeout strips out the exif data into a non-standard JSON sidecar, as opposed to something more standard/well supported like XMP.

I heard this claim before but I was never able to reproduce it. Does this only happen with the "Storage Saver" option enabled and/or when metadata has been manually changed? At least with the "Original Quality" option my takeout data seems to exactly match the files that were originally uploaded.

I do a takeout quarterly and stash the data someplace safe. I've been burning the data to M-DISC blu-rays for a while now as a deep archive copy, they seem pretty reliable.

I also now have multiple apps syncing all the photos from my phone so they're stored in a couple different cloud photo locations on top of that.

I do this too, and I include par2 files on each disc. You sacrifice some space on each disc but the incremental cost of a few more discs is minimal.

This has been happening to me across all google products. Emails - random emails are just gone. Sometimes I can see evidence of them from body of replies. Maps - I am a HUGE user of maps, starring thousands of places as I planned travel over the years. Randomly 1-50% of my stars just won't be there. Sometimes they come back. Photos - same issue as OP in this thread, random clusters of photos (mostly 10+ year old ones) just gone. There are a lot of support threads out there of people reporting the exact problem with no real fixes too.

I have the same with starred places on google maps. Sometimes they're there, other times they're MIA. Super annoying.

Worth noting that archived photos don't show up in most places, including the timeline. Not saying it's not buggy, but if you haven't already specifically checked the archived photos section, it's worth checking.

Good callout. Yes, I've checked and they are in fact missing, not just hidden.

I have a weird gap on icloud, and the photos are on google photos thankfully, and other periods it’s opposite

they are both supposed to have all

dangerous

Anecdotally, I know a handful of my photos that were posted via Google+ are gone after that was turned down. I have been able to find all my other photos that were uploaded through Google Photos.

I noticed a while ago the UI would just show some photos for a day, and I'd have to click the down-arrow at the top right of the group of pictures for a date to get all the pictures of that date.

I did not notice direct data loss yet, but I noticed other behavior that made me reconsider how much I trust it.

For example in the past I had several occurrences of duplicate photos where each of the two instances has exactly the same bytes as the other one. Usually that should have gotten deduplicated, but it hasn't. What's even stranger is that deleting one of the two pictures also deleted the other one. Re-uploading the original photo usually made both duplicates show up again. The way how I was able to get rid of those duplicates was to wait a day or so until I re-upload the data and to attempt that process several times. At some point only a single photo showed up.

Also those duplicates only showed up in the regular photo view. Even if both of the duplicate instances were added to an album viewing the album would only show the photo once.

For photos i backup using dropbox. Then my synology fetches the photos from dropbox.

Google photos only has the low quality photos.

This is why I run a personal Google Workspace account which has regular (hourly) backups managed by a self-hosted CubeBackup instance. It's quite a nice system: I get daily email summaries from CubeBackup via email, and I'm able to back up all the Google Drive/Email/Contacts/etc. data to both my local Synology NAS and Backblaze B2.

I have absolutely no affiliation with CubeBackup, but I highly recommend it (and a personal Google Workspace account) for anyone who uses the Google app suite. It costs me $5/yr and is worth every penny.

https://www.cubebackup.com/

$5/yr + hosting, which is way more than $5 unless you have a terabyte going unused in a box somewhere

I assume that "self-hosted" meant precisely that they own the box. A hard drive is under $10/terabyte.

Ah I assumed it was self-hosted but in the cloud somewhere

Self hosted means not in the cloud.

I was about to ask where you're seeing such cheap hard drives, but then I checked diskprices.com and saw that yes, there are indeed some drives for <$10/TB (current lowest price is $8.332/TB, and there are only 10 drives that are <$10/TB; I'm only looking at new drives).

I mean, I think my point is still valid for a $20/TB drive. It's a one-time cost, not monthly, and you're unlikely to exceed 1TB with Google Drive data.

Yes I host on a very cheap Hetzner VPS (which I use for other self-hosted Docker images as well) so it’s always up. The storage location is not on the VPS itself, it’s S3 and B2 compatible, so you can use AWS S3, Cloudflare R2, Backblaze B2, whatever.

I use Backblaze which is quite cheap and reliable: https://help.backblaze.com/hc/en-us/articles/360037814594-B2....

Google storage for a personal account is 15gb if I'm not mistaked, why would you need 1tb?

I highly recommend it (and a personal Google Workspace account)

I can't recommend against this enough, and I work at Google. Workspace accounts don't work with so many services. At least YouTube TV, Nest cameras, Google Home sharing, Google Play family accounts, Android parental controls, and Opinion Rewards are broken in my experience, probably a lot more.

It's a shame, especially for people who've had a Workspace account for many years, and have a bunch of email, documents, and media tied up in it.

I wish Google would let us convert a personal Workspace account to a regular Gmail account.

I do to. Importantly, they need to let you merge a workspace account into the existing Gmail account you had to set up to use some services.

The cause of this is that Workspace accounts have a different ToS so they cannot simply convert one user type to another.

I don't see why not, if you're agreeing to both ToS and to the conditions of the migration. Most people who have these problems are either the maintainer of the Workspace domain or family members. It wouldn't be a problem to get additional sign offs.

I know a little more about the situation, but can't really comment on it here.

I use YouTube TV with mine and that works fine, but I do agree with the general sentiment. Many, many things are unsupported or broken.

It works great for me. I recommend it. I’m not sure what working at Google has to do with recommending or not.

I don’t use any of the non-core services because I expect them to be killed off soon anyway. For contacts, email, docs, and drive it works great. YouTube as well.

I have a “regular” Gmail account in case I need to use any niche services that don’t work with my Workspace account.

Overall the trade off is well worth it.

Have you tried a restore?

Yes, probably once every 2-3 months. There’s a great interface for restoring specific files.

I keep the version history to infinite, so even if files are corrupted or lost on newer backups I can always reference the old ones, back to day 0. Backblaze is great for this as the storage price is negligible.

I think it works in a similar way to Tarsnap or something like Duplicati, but it’s a lot easier to manage and plugs in nicely to all of the Workspace account (not just Drive).

Thank you, that is great to hear.

I was going to say, if Google is silently losing and corrupting files then backing up from there is just duplicating the failure. But perhaps his backup scheme accounts for this unreliability.

That's why I'm wondering if they've tried. It may well be that the restore shows that even accounts that don't appear to be affected have lost files simply because very few people have an exact accounting of what it is that they've stored.

I have pretty much all of my stuff in a subversion repo and it serves both as backup and local copy checked out on multiple machines. The only exception is the business stuff (because it is shared with multiple people) and we use Google for that. But an inspection of Google take-out showed that it's nice to have the files but you won't be doing much with them outside of Google unless you download them one-by-one in a format that isn't tied to the Google cloud. And that is very tricky and time consuming.

For Google Drive files, you don't need a special account or service to back up to Synology and B2, which I also do. I use the Synology Cloud Sync service to automatically sync all Google Drive files to the synology, then another Cloud Sync task to backup the Synology to B2. Works great.

Synology is indeed great, but CubeBackup is a nice alternative for me as it runs on a VPS. I’m often away from home for long periods due to business travel, and I like being able to remotely keep the backups working even if the NAS goes down or is turned off.

Synology NAS can already do this without any third-party products via the ActiveBackup application. Supports Google Workspace and MS 365. It’s free, assuming you already own the Synology NAS. It replicates data from Mail, Contacts, Calendar and Drive within seconds or minutes, keeps a history of all changes and offers a nice timeline view to show and export data at any point in time.

CubeBackup is a nice alternative for me as it runs on a VPS. I’m often away from home for long periods due to business travel, and I like being able to remotely keep the backups working even if the NAS goes down or is turned off.

Thanks for the recommendation! I'll check it out, curious on maintenance cost though as I don't really want a project.

1 - How much time was setup? 2 - How many times did it break in the last year? 3 - How many of the summaries were actionable?

My main concern with active auto backup is maintenance time; my expectation is I'm busy and my personal backup would take me weeks to fix if and only if I notice there is an issue, and the summaries are alert fatigue.

I feel while this situation is avoidable, you will still lose data from your last successful backup and anything less than "just works" isn't viable for anything important vs manual duplication.

I have two copies of some files (local and Drive) and probably a few Drive-only despite having local sync. If my computer died in the same time as Google list the data, I would expect some data loss, but the cost of categorical prevention is very high.

That being said, if it is only $5 and is actually touchless,

Setup was about 15 minutes getting the Docker image to run on my VPS, and it’s broken zero times in two years. The summaries are very basic and just let you know how much was backed up and if there were any errors.

With cost of storage pretty low and AI being so good, I wonder if having a home based thing for everyday use, especially photos (Sort, tag, Memory creation, junk detection etc.) will become easy enough that cloud usage declines.

There are still tech hurdles yes, but NAS + a decent GPU for photos, email organization, assistant is going to be a killer business.

I use Synology which has its own Photos app. Honestly, it’s there right now. I prefer doing my own tagging but it’s able to automatically group things just fine.

I got a second Synology for Hyperbackup and send one to the other for backups. Seems to work well.

I do the same thing for Drive, but with rclone (which is free as in speech and beer). rclone makes it trivial to keep a mirror on backblaze and locally, which has come in quite handy a few times.

I've elevated to rclone fanboy since I discovered crypt: https://rclone.org/crypt/

Wow, Crypt is awesome.

I’ve never looked into rclone although I’ve heard much about it. Looks very cool. I didn’t know it supported Google Drive.

One nice thing about CubeBackup is that it handles all of the Workspace account, not just the Drive file storage, but also the contacts, emails, etc. all of which is equally important to me.

Is there something similar for Google Accounts (not Google Workspace)? I do it manually: I have the every 3 months automatic backup and I upload it manually to S3, but I'd totally pay for a service that does this automatically providing my S3 keys (or similar).

i have a synology NAS. It backs up my google account. For google photos it doesn't work.

So I pay for dropbox. The dropbox app backs up the photos and the synology fetches them from dropbox.

This happened to our business about a year ago. Many random files went missing from GDrive. Eventually many turned back up (usually after a few weeks or months), but in the root folder instead of the folder the files were actually organized in. Google support told us we were crazy.

As an aside, we are looking from moving to a Google + Slack + Notion to OneDrive + Teams + Loop. What is appealing is the ability to collaborate more directly on files and “Loop Components” directly in Teams. But we’ve been waiting for 2 weeks now for support to help us enable Loop, because the instructions in the Microsoft docs aren’t working. They are working on it, which makes them better than Google in our experience, but it has been too slow. Maybe we have to upgrade to premium support?

I wish I could go back to google drive even with the file loss issues. OneDrive appears to be a sort of different view into Sharepoint's file management system, and boy is it painful to use.

God, why are the choices so bad.

I wouldn't trust Google with anything important these days. But Teams/SharePoint/OneDrive is so bad.

I don't think that selfhosting is a great option for corporate, but Microsoft and Google are just shit.

I don't think that selfhosting is a great option for corporate, but Microsoft and Google are just shit.

Why? When corporations weren't capable of managing credit cards securely, PCI was created and all the lazy businesses were told to either do this correctly, or outsource it to someone who will.

When corporations weren't capable of managing infrastructure, Cloud providers created an out and told all the lazy businesses that they could do this correctly. Businesses believed the hype, fired their on-prem administrators, and still can't do this thing correctly.

So, if you can't be bothered to do something correctly, and you can't pay to get it done correctly, maybe you are not fit to be operating.

So, if you can't be bothered to do something correctly, and you can't pay to get it done correctly, maybe you are not fit to be operating.

The amount spent on Microsoft suggests a willingness to blow money on the problem.

But with Google having no real support, always shutting down projects, and now losing a lot of people's Drive data and refusing to acknowledge it makes me reluctant to use them.

Teams is janky in other ways, such as Microsoft forcing Sharepoint everywhere, and now we have a chat program that attempts to silo data as much as possible so we a lot of project documentation that isn't always accessible or discoverable.

If you have alternative suggestions, I am open to them.

We’re perfectly willing to pay for it. An employee seat costs about $200pm of various licenses and subscriptions, there’s a lot of money to make.

still can't do this thing correctly

and you can't pay to get it done correctly

Am I reading this right? You're blaming the business that is purchasing a service, for the provider's inability to provide it?

Feels like a good opportunity for someone to build the “do one thing and do it well” file storage on top of B2. Backblaze has a simple approach that stores 30% recovery data next to the files (and self-heals) while also storing customer data in multiple offsite locations.

Build the ability for integrations (with customer-visible audit logs), inter-user sharing, and proper permissions and you have the modern equivalent of the shared drive from the old days.

And don’t worry about quoting XKCD #927 at me. I know the realities of a project like this.

I have a friend who lost data with Backblaze’s backup product. It can always happen. Don't trust a single provider or software. Backups should not be “integrated”, they should be as much apart as possible.

I’m very interested in the “receipts” (meaning: the story and the paper trail to verify the story) if you’re able to share

Yev here -> we work pretty well with Movebot (https://community.movebot.io/hc/en-us/articles/360001557095-...) where you can set up a Gdrive to B2 sync/backup. And Rclone can do that as well (https://www.backblaze.com/docs/cloud-storage-integrate-rclon...) so there's a few ways to do this.

Dropbox seems to still work fine in my experience.

We did the Notion + Discord thing for years and it worked well and felt modern.

Because Microsoft has a document monopoly via their shitty proprietary format that they make sure no competitor can effectively import/edit. And Google is just shitty as usual, but they're a giant free default.

I wonder if with all this antitrust stuff going on we'll ever see regulation in the form of required interoperability and/or data portability? Instead of trying to break up companies, block mergers, etc, I feel like simple technical solutions would be faster and more effective.

why are the choices so bad

I've seen a couple reasons:

One, often these products are made by companies where the products are not their primary revenue source. The products are often managed as independent business cases. That means they have a budget, timelines, expected revenue figures/costs. It's common for a product to get a low level of investment that only meets the needs of adding new features [to meet sales figures] without ensuring quality. If it were a startup flush with VC money, they could invest all they have into the product, but at an enterprise, it's often the opposite.

Two, often these products are actually acquisitions of startups. You may not know this, but startups tend to churn out some horrifying, janky code just to get themselves off the ground. Buying one of these often leaves you with a huge mess on your hands. Combine that with a lack of investment or cost-cutting, or some of the lead product people leaving, and the product gets worse. Then try to integrate different products, and you're really integrating different messes.

Three, it's genuinely hard to create groupware products that are both high-quality and useful. They're often complex and need to interoperate with one another, yet are built by separate teams. And because they're complex, they each suffer from the standard problems that happen to software products (many books written about them). But the people managing and creating them fall into the same old pitfalls, because software product development is not required to avoid them. Bad management and bad engineering are common, and these products in particular are no exception.

Four, they're actually difficult things to build and sell. If they were easy, there'd be more competition. There tends to be "alternatives", but not with the same features.

Sharepoint is killing productivity.

It's so sad that Microsoft does not see how they ruin productivity of big orgs.

Sharepoint broke Excel - you cannot link between shared documents easily.

They even broke the windows taskbar, which automatically combines things together. They assume that people working in an office have only 1 email open and 1 spreadsheet open, while most have 20 and dont know how to turn off this shitty UI.

Sharepoint broke Excel - you cannot link between shared documents easily.

I actually found a way where I work now, I made a shortcut to the sharepoint folder from teams that appear in the file explorer, I can then link the document.

Of course this only works for me and not if I share the file.

Can't wait for the merge of the desktop and web excel to finally break it all...

I also overwrote an important document today because the idiotic auto save was turned on, I used the document as a template for a new one without checking.

Luckily I had a copy of the important one in my Documents folder on the local computer which they recommend you not to do since you might have data loss!

Yes, I hate office 365...

The thing is that the documents are shared out by default and it simply craps out when you have flows like:

Source1 + source2 -> summary report -> consolidated report

(With like 20 source files)

They've finally conceded and re-implemented the original task bar method of "let's actually label the icons in your taskbar so you have some idea of what you're actually clicking on before you click it".

You have to re-enable it in taskbar settings, but it's FINALLY there to be enabled again.

I use a program called StartIsBack to try and return to a more "classic" shell experience, but Windows has done such a good job at tying so much to their shell that from what I hear it's close to impossible to do certain things anymore.

That’s very fair. I totally recognize that Microsoft has a lot of half-baked me-too products to round out the suite. We are going to do a full evaluation before switching (if Loop ever gets enabled) to see if, despite the limitations, this improves collaboration, or if it just creates as many headaches.

The only thing more painful than using OneDrive is talking to the salesperson who sold it to your company.

I recently set up rclone to do nightly backups of my entire OneDrive to Dropbox, I feel pretty safe with this in terms of reliability, I find it unlikely that both services will catastrophically fail like this at the same time.

Would recommend rclone to anyone, it has worked pretty consistently so far.

Dropbox constantly scans your data and can easily block your whole account permanently. It's hard to recommend it unless you encrypt your data.

I remember OneDrive doing this too, where photos of parents' children taking a bath etc. were flagged as child porn. Unfortunately unless you encrypt your data locally, privacy is one of the tradeoffs in using these services.

This happens with Apple iCloud as well. Staff member had their iCloud photo collection locked because of this. Was able to regain access but I would definitely caution anyone storing all family photos online without another backup option.

they don't, but they were planning to: https://www.wired.com/story/apple-photo-scanning-csam-commun...

They canceled their plans for client-side scanning. They do scan content on their servers. Therefore whether your data in iCloud Photos is scanned depends on whether Advanced Data Protection is enabled or not. It’s disabled by default. Enabling ADP will turn on E2E encryption and disable account access via iCloud.com.

According to NCMEC Apple does not do any proactive scanning of photos.

Last year, while Meta’s Facebook and Instagram submitted a combined 26 million reports, Google 2 million, and TikTok nearly 300,000, Apple submitted 234. The year before that: 160.

Apple isn’t a social media company, so this is hardly a direct comparison. That said, WhatsApp, which is end-to-end encrypted, scans unencrypted content such as profile and group photos for CSAM. It provided over 1 million cybertips to NCMEC in 2022; Microsoft sent in 110,000; Dropbox, nearly 46,000; and Synchronoss, the cloud storage provider for Verizon, over 30,000.

https://archive.is/AyuCq

I'd venture to say Apple probably has a "less false positives" policy than the others. I can't say whether or not they do or do not scan, but if they review incidents with humans and not automated this could be why. They probably know flagging / disabling / reporting accounts incorrectly has a high cost on user satisfaction.

You’re thinking Google https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&c...

yeah, cryptomator is the way to go for dropbox for windows in my experience at least for a FOSS solution (all the others I've tried were comparatively flakey though less memory intensive...)

It wasn't really a conscious choice on my part, I had to get the subscription since I needed to collaborate on some files with people who were using it, and it forces you to add shared folders to your own Dropbox if you want to use them for collaboration, causing them to count towards your quota which is very annoying.

So I thought I might as well use it as a backup location if I have to get this subscription. I don't really expect privacy from these services, I guess it's a tradeoff.

Thats why you setup Cryptomator

Does rclone allow you to do some kind of diff or are you literally copying the full contents of OD into DB?

I'm interested in doing something similar for our GD into DB or Backblaze but not if it's a full copy each day - we have 400GB in GD

It does diff, and can be configured to use various diff methods (checksum, modified datetime). I use it with S3 and to diff with checksum it needs to do an expensive, slow metadata read - so I use `--use-server-modtime` which only copies files newer than the remote S3 file time. https://rclone.org/s3/

Amazing, thanks for confirming. Will dive into the docs properly

"I recently set up rclone to do nightly backups of my entire OneDrive to Dropbox ..."

I hope it is interesting / useful for you to learn that you can maintain a workflow like this without ever installing rclone.

rclone is already built into the rsync.net platform so you can use it to orchestrate transfers like the one you describe:

  ssh user@rsync.net rclone sync s3:bucket dropbox:whatever

... using only ssh.

Similarly I recently setup rclone to mount google drive and restic backup it up a la 3-2-1.

rclone is a fantastic tool. I’ve done a few data migrations the other way, from NAS to M365 (for reasons) and it’s been mostly straightforward every time, even coping quite adeptly with request rate limiting by OneDrive.

I experienced different and in some ways more pernicious data loss with Google Drive several years ago: I was writing a novel, and Drive simply erased random paragraphs of text. Just poof, gone. I could find neither rhyme nor reason for it, either. Some had been edited in after the fact, while others were present in the original draft. All told, it equated to around 8K words over the course of the novel.

Granted, you usually want to pare down your words during the editing process, but I'd rather do it manually rather than have some malfunctioning computer system prune random blocks :)

I sent a support email, but it will surprise no one to learn I never heard anything back. Needless to say, I haven't trusted Google Drive since.

Using google docs (or any web based office apps) to write a novel (or do any other serious "heavy" work) is a terrible choice for a writer to be fair.

Can you enlighten us as to an ideal choice?

vim

Out of the interest of brevity, I didn't want to go into my full setup. However:

1. I write locally and have both manual and automatic backups (local and remote) 2. I upload drafts online for beta and alpha readers to leave comments 3. My group used to use Google Docs for this but stopped once I discovered the data loss

Happily, because local is the source of truth, I didn't lose any of my writing. I did, however, lose some feedback, and reconciling things was a pain.

On the face of it, I can definitely see the upside: "the cloud" is typically synonymized with safety ("your data is backed up on our secure servers", etc.), not to mention the fact that you can edit your data from any of your devices logged into Google. Heck, you don't even need to install Office anymore: you just go to https://docs.google.com and it's all there.

I do agree with you, though: one of the most important rules for computing in general is to keep multiple backups of everything. A while back, a friend of mine made a habit of emailing herself drafts of her work, and kept multiple copies on her computer. It did save her once or twice.

That’s not a fair statement at all. A writer isn’t a computer expert, and shouldn’t have to be. Google Docs specifically positions itself as a product that takes care of all the data handling for you, and they market that fact quite heavily.

"but it will surprise no one to learn I never heard anything back"

(surprisingly?) ... I have colleagues that, even in 2023, will express genuine (not sarcastic shock) when a megacorp doesn't reply or provide decent support. I've only got a few, but the refrain will be something like "wow, I'm kinda surprised google's cloud support doesn't get back to us - sometimes for days. they've always had such a good reputation for support... i hope this is a blip".

It's as if they've only ever read marketing/sales leaflets, then really are genuinely surprised when things are 'normal' (meaning poor performance, non-responsive support, bad/wrong documentation, broken SDKs, data loss, etc).

Well, everybody assumed that was the norm for non-paying customers. It's surprising when you're paying to be ignored.

That's my point though, was that there are still folks who assume "support" for everything (paying/non) is just "going to be good" ("because it's google/ms/etc - they have a reputation! they wouldn't want to jeopardize it!"). I realize I'm talking about a small number of folks, but even after these folks have been in the industry 10-15 years, their ... naivete(?) - is surprising.

And... I'm not even generally surprised when I get no support on paid for services. Indeed, I'm generally shocked when I get good and timely support from a paid service. Now... that's less shocking when it's "small startup X"; we've all seen small companies get big/acquired and the very thing that got them there - good service/support - is what is thrown out to start squeezing out profits. Staff get overworked, pissed off, leave, quality plummets, people move to the next small startup service, and it starts over again.

they've always had such a good reputation for support

I'm genuinely curious as to what group they have a good reputation for support with.

People who have not yet had a problem.

A trick I've used a few times, in similar situations: Suggest that they look back in their notes / emails / whatever, find the specific MegaCorp support rep. who was so helpful the prior time, and ask that person to help.

OOC, was this a google doc or a docx file uploaded?

I paid for Google support. That way you get a human tell you to just try it again, rather than a computer.

Same, I have had years of photos in google photos simply go missing when they transitioned from Picasso to photo albums

This is pervasive on browser based O365 apps like PPT

It will literally delete a sentence immediately after you write it

Infuriating and absolutely no fixes - this is a core issue with write priority and where “canonical” data lives.

How does people even notice that their data has gone missing? I can barely find my own documents in the train wreck that is the Google Drive UI. Any document written by anyone else in the organization can generally be considered lost unless you have a direct link sitting in an email or bookmark.

It's really hard to make a file browser in 2023 with only about 100k employees, give them a bit of a break. Some things are just hard!

I'm not defending google or saying they haven't done anything wrong here, but it is insanely hard to make Google drive. I very much doubt Google drive has a staff of any where near 100k employees and it's probably the largest file browser ever made. It's an engineering marvel. Can Google do better? Absolutely. But, it's objectively wrong to say there's nothing hard about building something like Google drive.

I think this is like looking at the Burj Khalifa and asking how hard it can be to make a building. It's not just a building, it's the tallest building ever built.

If you've made it and released it to a the public (and it has paid tiers!) then you better have made it right. This is the case for both buildings and data storage systems.

Nothing I said disagrees with your statement. In fact, I specifically said they could do better. My point is that it's wrong to say google drive is easy to build or maintain.

Yes, not disagreeing with you. :)

Cant have a pro in motion without a promotion point for it.. and it looks so good in the presentations. Useability? Function? Screw it.. it must look good in adds. Make all the things screenshot useable only.

I mount it using their desktop client and use voidtools' everything to search for files.

Works reasonably well, though Google seems intent on breaking stuff every so often. Also Google hasn't figured out how to make it stop creating desktop.ini files everywhere yet. I'd say it makes them look like amateurs, but amateurs tend to be better at making software.

You can blame a different megacorp for the desktop.ini files: those are artifacts from windows explorer (I’m assuming you’re using a windows machine with the google drive mounted)

Partially yeah, but that's usually a system file, which isn't supposed to be visible. Google drive for some reason decides to make it visible, which is damn annoying.

This is a bit like Google drive adding two files named '.' and '..' to each folder in unix.

How does people even notice that their data has gone missing?

They need the data and look for it and it's not there, but they remember they put it there?

At my job if a log file or data recording is too big for jira, but need to be attached to a bug report we sometimes store them on google drive. And then we link it from the ticket. If much later we get back to it and the link is 404 that would make us suspicious. If someone remembers the link working before that would make it clear it is not just a copy paste error with the url.

Or you open the meeting notes doc and it has notes from May 2023 at the top, or you sort by last edit and it says May, etc.

How does people even notice that their data has gone missing?

That’s what I want to know too. Try to reconcile the cloud sync systems to make sure you aren’t missing anything. It’s basically impossible.

I also go crazy when I see MS touting OneDrive as a backup. I’m pretty sure moving (because of files-on-demand) my data to OneDrive where I can’t verify they aren’t losing stuff is not a backup. Add the risk of being banned and it becomes unconscionable IMO.

If it's Docs/Sheets/Slides then I'd notice. If not...I don't really store anything else on Google Drive, just files shared with me.

For non-images I suffer the same poblem. But for images I have yet to find a service that lets me search the heap better than Google Photos. It was amazing and was pre-AI craze. "My driveway in the summer." "<Child A> and <Child 1> on a dock" etc. I don't have to catalogue anything anymore. Apple has the same concept but it never works right.

Guess at root cause:

Some replication process at Google had fallen behind by 6 months (and presumably didn't have monitoring/alerting), and someone noticed and in trying to fix it they forced that replica to take mastership (meaning the users now see the 6 month old data).

Since the replicas presumably now have conflicting changes, re-merging the two is going to require a lot of code be written to smartly merge the data, and some users are going to permanently lose data (where they edited an old version of a document for example, and those edits cannot be automatically rebased onto the new version)

and some users are going to permanently lose data

Why couldn't you write code to let users compare and choose which version of the data they want to keep?

You could... But writing and deploying a diffing tool for every different gsuite application is probably 6 months of work for a whole team... There are so many corner cases.

What will the users do for 6 months waiting for their data to return?

"let users compare" doesn't require diffing tools. Worst case put the versions next to reach other.

That's an excellent description of a basic diffing tool.

A tool that cannot parse the files is not a diffing tool.

Worst case put the versions next to each other.

For some files (anything encrypted, for example) this may actually be the best you could do.

Dropbox had this behavior ten years ago, so it's kind of inexcusable for Google today to be just mindlessly overwriting.

I guess there may be a bevy of different file formats and a diff for a spreadsheet an an image would require building tooling, just from those examples. I was thinking of creating an application where the front end is more than just a file browser to stored file types but the interface to read/edit the content as well.

You don't even need to go that deep -- just give me a filetype-based icon, file name, and last modified date, and that should be enough for me to choose. And if they want to be simple, include the option to always keep the most recent version of the file (which is going to be the right answer in many cases).

Ideally Google will resolve it without the customer needing to do the work.

That seems like one of the more unlikely causes, among the almost infinite variety of causes that are consistent with "several people said they can't find some files".

To me the most likely one by far is the deletion was actually commanded, for example by a flaw in the program they use to sync, or by malware.

A few users have reported that documents they deleted months ago have reappeared too. And files they moved long ago have unmoved.

I am still skeptical that users are correctly describing the behavior of a multi-party synced filesystem, or that Drive for Windows is bug-free. You seem to think it's a flaw on the backend but I think there are many other possible causes.

Or some backend data storage system experienced transient errors and occasionally returned a corrupt pointer to a storage location.

Low frequency data loss until it reaches alerting threshold of 0.1%

Wasn't Google starting to delete old files? My guess would be that something screwed up in that process.

This is a royal fuck up. Anyone using any g cloud service better be suspect when using their services.

Amazing how some folks can't get a job at google if they can't invert a binary tree. Perhaps hire folks with relevant skills rather than folks who can solve 2 leetcode mediums in under 45 mins?

This comment smells very much like you think you are The One. Or that The One is out there somewhere [1].

Bugs happen and this is a serious failure that Google needs to fix. But the idea that incompetents are doing the development or that Google hires puzzle solvers over technical talent is just not true.

1. https://rachelbythebay.com/w/2018/12/21/env/

...that Google hires puzzle solvers over technical talent is just not true.

Yeah, they do: https://twitter.com/mxcl/status/608682016205344768?lang=en

I’m quite familiar with that story. Google doesn’t disclose to candidates why they were rejected. I didn’t interview him so have no insights, but ability to get along with others still matters. But even if he was rejected for the reasons he thinks he was, one story does not reveal how the tens of thousands of interviews Google conducts each year actually go and how hiring decisions are made.

I have been a part of hundreds of hiring decisions at Google, and there is much to hate about the process. But leetcode over qualifications is not one of them.

...there is much to hate about the process.

That's a signal that's trying to tell you something.

But leetcode over qualifications is not one of them.

A lot of people hate leetcode and the state of interviews at tech companies (17k people liked that tweet), but it looks like you do not.

Tech is probably the only field where you have to study unrelated subject matter than the job you actually end up doing.

I have been a part of hundreds of hiring decisions at Google

I am curious - I heard that Google is getting rid of its team matching process[1] :

The rationale is current system outputs a lot of false positive and false negative.

The rumor was that google would do "generalist" (read that to "leetcode") style interviews and then put the candidate in a team matching process, where teams would review the candidate's (who ostensibly got an offer) package, and decide whether or not to bring them on the team.

But then a lot of candidates would get stuck in the team matching phase bc they couldn't find a team to take them on. Machine learning engineers would be presented to teams that needed front end UI devs, which were clearly a mismatch.

Basically, the generalist interview was creating a pool of candidates that passed the generic leetcode interview, but they producing poor matches for what teams at google really needed.

Is that true?

[1] https://www.teamblind.com/post/Google-just-got-easier-no-mor...

> ...there is much to hate about the process.

That's a signal that's trying to tell you something.

Indeed, and what it is telling me isn't what your points are about. I'm fairly vocal internally about certain changes I believe need to be made. But I'm also not really in a position to do anything about it but suggest changes.

There have been non-generalist interviews for over ten years, although not for every team. This problem has been addressed several times over the years. I think the post above is about the most recent round. But Cloud and many other orgs have always had their own inteview style.

Generally speaking, the more senior the candidate, the less generalist the interview seeks to be. And the more specialized the role, the less generalist the interview seeks to be. Some interviewers haven't gotten the memo, and many situations end up being, say, 50% generalist, and 50% specific role. But plenty also go purely role-based, especially in areas like security or UX design.

However, one thing Google strongly seeks is someone not wedded to this or that technology. If you think of yourself as, say, a Ruby-on-Rails developer, then you are probably a bad fit for most roles at Google, even the ones that involve Ruby-on-Rails. Google expects you to be able to pick up whatever knowledge you need to do your job, whether that is learning a new framework on the fly, or debugging a system you aren't familiar with. It's not exactly a "generalist", but more like someone who can learn tools and techniques easily.

From what you described, Google needs to work a great deal on its recruiting practices, bc right now it is an incoherent mess.

The consequences seem to manifest into serious technical problems, like losing customer data on its cloud storage service.

I think you meant to link to https://rachelbythebay.com/w/2018/04/28/meta/

That's not actually how it works, that's an ollllld viral article. Source: worked at google for 7 years, 2.8 gpa dropout from state school economics

EDIT: upvotes for reality-detached rant, upvotes for "you think you are The One" -2 for a polite concise reality.

My friend, don’t take this the wrong way, but you need to hear this.

It would benefit you to take the downvotes as constructive criticism. You opinions may need recalibration based on feedback.

Oh, you think developers still have that kind of power... Management has made sure to not allow any developer to have any kind of power. Not even naming their own variable the way they want it. All big and small decisions are taken by managers that don't even understand coding. They are the ones that push for hard deadlines and then loose months deciding what to do next for example. The crappy software trend has been rising at the same pace with the stripping of power from developers.

This is likely very paranoid behaviour but I recommend downloading all your Google data from time to time: takeout.google.com

This works fine for smaller accounts, but on larger accounts it seems to be regularly failing (based on my own experience and based on other postings and reports that I found online).

Exporting my Google Photos sometimes fails consistently even with lots of attempts. Out of well over 10 export attempts or so this year maybe a single one succeeded. I have a few hundred GB of data stored on that account. I also currently have a support ticket with Google open on that issue, but after initial follow-ups haven't received a response in a couple of months now.

That said my current approach for backing up things is to upload an "age" encrypted version of the data from Google Takeout to Wasabi. Once uploaded I run a script that shows me the diff between the data sets (so that I can ensure that no old data went missing that shouldn't have gone missing) before I delete older data. Probably not the most optimal approach though. Might be better to just set up some versioning layer on top of Wasabi and to keep deleted or modified data forever.

I was trying to set up a similar thing. I already do the google takeouts every few months (~400GB, I don’t have issues with export though), but so far have been storing all of them.

How do you do the diff between the old and new encrypted versions? Do you encrypt and upload the takeout .tar.gz files, or do extract first then encrypt?

My personal Internet connection is a bit too slow to wait for re-uploading all the data and my vserver doesn't have enough disk space to temporarily store all the data so I pretty much do everything in a streaming fashion: I use a Firefox extension that gives me the wget command (which includes cookies, etc.) when triggering a local download from Google Takeout, then I patch that command to stream to stdout, this first (tee-)pipes to a Python script that decompresses the data on the fly and dumps the hashes for each file into a log, and it also goes to "age" for encryption, and then to s3cmd for uploading the encrypted data to Wasabi.

For the comparison I pretty much only use the logged hashes which allow me to figure out if any hashes (and associated files) are missing in the new version of the backup. This isn't a perfect solution yet as a few things aren't detected. For example Google Takeout bundles mails in mbox files and I currently don't check for missing mails. It would be better to convert the mbox files to a Maildir first so that the comparison can be done on a per-mail basis.

This is absolutely not paranoid and everyone should keep local copies of any data that is important to them in any way.

If you aren't backing up and restoring your data, you aren't storing your data.

100% I do this yearly at tax time when adding a bunch of new docs.

Can anyone recommend a Google Drive backup to something like AWS S3 Glacier or other long-term storage provider? I was running BorgBase but had some financial difficulties and couldn't pay for it, plus copying 2TB of data from home at around 40Mbit/s isn't great.

Ideally something that pulls from google directly without involving my home computer would be best.

Maybe an rclone script that runs from AWS. Package it in a Docker image and run it from a Lambda function for pennies per invocation. Schedule it to run nightly.

https://rclone.org/s3/

https://rclone.org/drive/

https://docs.aws.amazon.com/lambda/latest/dg/images-create.h...

https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-...

My script, though I run locally

  rclone -v copy /source/location s3:backup.bucket.name/ --transfers 1 --copy-links --filter-from /config/syncfilter.txt --max-delete 20 --stats 5m --stats-log-level NOTICE --update --use-server-modtime --fast-list

My Dockerfile includes

  RUN curl https://rclone.org/install.sh | bash

And I've configured the S3 remote to use Glacier

  [s3]
  type = s3
  provider = AWS
  ...
  storage_class = GLACIER

Interesting! Thank you, I think I will give this a try soon :)

Google Takeout will export to Dropbox, ad hoc or on a cadence (into the /Apps/Google Download Your Data/ path). Backblaze B2 is ~$6/month per TB, so depending on your storage requirements, at some point it will be more cost effective to have a pair of Synology NAS devices; one local, one as a backup offsite, with backups shipped. These same NAS devices can backup directly to a variety of targets [1].

[1] https://kb.synology.com/en-uk/DSM/help/HyperBackup/data_back...

(no affiliation with any above entity)

Why don’t you give your friend an 8tb to keep at his house and rsync?

Is there any word if this is actively syncing deletions to local storage? I have my google drive synced to multiple computers which I assumed was enough...

I was hit by this. It looks like my laptop had not been syncing correctly for a couple months, then GDrive overwrote my local “My Drive” folder with the old cloud copy.

I was going to say, from what I've seen it looks like this kind of problem: a problem with the desktop sync, rather than Google's cloud storage itself.

The sync problem at the same time feels more likely and worse, as if the files never touched Google's servers and its sync went and deleted them, there'd be nothing they could do.

That sounds painful. Are you using the official Google drive app?

It's certainly not enough if you assume there can ever be an error in the backend which Google is unable to recover from.

Be me, self host my own cloud drive. No fees, no surprises, infinitely scalable, insanely fast local backups.

I'm sure everyone reading Hacker News can cobble something together in a weekend or two or knows someone who can.

What should the rest of the world do?

(Also, my hats off to you that you're seeing "no surprises." So you never discovered your backup machine has a flaky wifi card or bad Ethernet jack? Never had bit-rot or errors on your backing store, or are you just not detecting them until it's too late? You're testing your local backups regularly to make sure there isn't a script error that means none of those bits can actually be restored? And are you backing up offsite so when your residence burns down you still have your backups?

Lots of us are "backing up" using ad-hoc bespoke solutions that we will discover were broken only when we need them.)

What should the rest of the world do?

Educate and help those who do not know? I helped setup my cousin’s NAS and gave it a subdomain.

What i meant by no surprises is: the backend and front end stays the same forever, short of me manually updating it. My server and their apps, will not suddenly lose functionality or have cruft added because someone in Mountain View deems so.

Key to making this work is keeping things simple. Meaning no container or any burying of my data under any layer of abstraction. My files are laid bare on the platter, making it very portable because any application that does file system operation can immediately operate on my files. Unlike the clusterfuck that is owncloud.

Every quarter I do an incremental offline backup of all the drives in the NAS, to address all the issues you mentioned.

https://www.google.com/appsstatus/dashboard/

Nothing showing on service status currently

Do you look outside to get the weather from 5,000 miles away?

There is no rational basis for trusting a first-party status page.

I’ve been self hosting images with https://github.com/immich-app/immich and I‘m really happy. Decent mobile apps, even has good face recognition and H/W accelerated video transcoding. Can be served either with Tailscale or, as I did, with a reverse proxy on a cheap Hetzner box.

Good backups are a must obviously, I have a backup server on a separate VLAN that syncs ZFS snapshots daily, and daily incremental encrypted backups to Google Drive w/ Duplicati. I‘ll have to reconsider Gdrive now obviously.

Dies It hace any native app? The default mobile web UI seems a bit too overcrowded by other elements rather than pictures (at least on Firefox/Android). I'm using Photoprism currently and looking for some alternative that has better face detection system and an integrated "X years ago" automatic slideshow thing (I don't care about the "splish splash", "out in the wild", "magic hour" compilations that GPhoto automatically does, just the basic X years ago thing, maybe when picture has tagged people)

Yes, native apps are available and are quite good. Sometimes I get hangs after first opening the iOS app, but only briefly. Quite handily it has an option to allow self signed certificates, to make within-LAN use easier. I‘m binding my Immich domain to a local IP with unbound and don’t need to bother with letsencrypt internally that way. Only the hetzner box needs the certificate (obviously).

The quality of face tagging surprised me, it uses one of the open source algos by Microsoft.

MatthewSt reports that he has a fix

Let’s see the instructions on that link.

2. Install an old version of Google Drive (there's a version 82/83 floating around on the internet).

This is awful advice. If you have a link, share it. Considering the number of people this is affecting, chances are increasing that the version you find “floating around on the internet” is malware.

Considering the number of people this is affecting, chances are increasing that the version you find “floating around on the internet” is malware.

The windows installer at least is digitally signed to prevent this issue.

Apps and installers on macOS are also signed but there’s always a way to go around it. Simplest method is for the malware author to resign it, since effectively no one will verify the signature is from the correct source. But even then you can always convince people to disable System Integrity Protection to install crap. Despite it being a convoluted process, never underestimate what someone who is not tech savvy but is desperate can do. They will install anything and jump through any hoop the malware author walks them through with nice screenshots.

I doubt there’s no way to trick people the same way on Windows. Perhaps I’m wrong, and if I am I’d welcome learning how that system works.

This is why its good to have Cryptomator setup and figure out how to use that rsync or rclone whatever. Still gotta figure out how to implement it but anything with the G is already on shaky grounds. You're better off with Dropbox or literally anyone else.

Does this affect iCloud?

The iCloud is built by a different company than Google Drive

I thought Apple used GoogleCloud/GCP or whatever?

All lights are green: https://www.google.com/appsstatus/dashboard/

No recent/active incident for Google Drive is shown:

https://www.google.com/appsstatus/dashboard/summary

https://www.google.com/appsstatus/dashboard/products/VHNA7p3...

What a shame. The mighty Google is after all in the same league as some random LLC providing cloud storage for $1.99/mo.

Discussion earlier today: https://news.ycombinator.com/item?id=38427864

Thanks! Macroexpanded:

Google Drive files suddenly disappeared - https://news.ycombinator.com/item?id=38427864 - Nov 2023 (281 comments)

Use an open source system. You can choose to back up to a self-hosted version, too. https://federated.computer.

You can also lose your own files!

I suggest users look into cube backup, it’s great for local backups and versioning of G suite data. I’ve been using it for three or four years now, and couldn’t be happier. (Dev is very quick to reply and helpful.). Still just 5$ per year per user (I’m not sure if it supports Gmail free, but it definitely supports G suite users.)

Additionally, it does not help that the search giants, search functionality for Drive and Gmail is so incredibly poor (more so Gmail, I have found drive search has improved,). For example, you still cannot do AND searches in Gmail properly. You also cannot do wildcard, searches of any kind. (And no regex nor partial regex)

One other note (and I haven’t read every reply here yet) – other than the anecdotal evidence of drive data loss, one way to 100% prove drive data loss is occurring, is if someone has an example of data, not showing up on Drive, but that same data is present in a cubeback up local back up (as cube backup Supports versioning, and it’s only source of data is directly from G suite).

This would prove the data was on drive at one point, but is now gone along with the date and time it was retrieved and saved locally via cubeback up. (Additionally, one can show that it wasn’t deleted via user action by looking at the logs in Gsuite admin, or I think cubeBackup would log a delete as well, assuming you’re retaining those cubebackup logs)

(to be clear I’m simply a user of cubeback up. I have no interest or affiliation with them)

I've been convinced for years that Google Docs loses things. I both had files vanish and have lost edits - one time I could demonstrate it with a screenshot of part of a doc that later reverted to an earlier version.

Other people at work agree with me. Google support insists we're nuts.

I only use it when other internal folks start collaborative docs, otherwise I do not trust it. Files don't randomly disappear from my local filesystem (and we test our backups).

Use takeout to make snapshots to compare.

If I wanted to indefinitely back up every image my wife's and my phones take, without me having to think about anything other than an annual bill, what services would people suggest? Currently Google does this perfectly for us... except for the part that maybe it's not actually doing that after all, and the fact that it's Google and they're seemingly less competent by the hour.

Apple is the only alternative I'm aware of.

Edit: Just to be very very very very clear. I don't want to have to do ANYTHING but pay a bill.

An ssd big enough and syncthing?

Take photo. Enjoy them, they are temporary. No-one will give a fuck in 25 years.

speak for yourself.

My partner has been experiencing emails coming and going in big chunks - months and years at a time - but if we search support forums all we can find are "support agents" blaming the individuals complaining about this phenomenon. It's gaslighting all the way down. Even when it's actively happening, checking moment to moment and having another few months vanish, and they'll still tell you "you must be deleting things, stop deleting them."

When did this start happening and which time period is it affecting? Since November 21, my friend has had all her Gmail emails since May 2023 go missing. It matches the time period mentioned in the article but this is Gmail rather than Drive.

This is yet another reason to do backups so that you can perform restores. Testing your backups is very important to do. Relying on any service has a risk of something extremely bad occurring.

This is why I always have my device backup to both GDrive and OneDrive.

My solution: Everything(phone,laptop,etc) syncs with Nextcloud on my homeserver, then backup encrypted snapshots to Backblaze using Restic. Also occasional manual backups to a hard drives at locations other than my house. Works great, no dependence on "free" cloud storage.

This is why you always maintain backups. Do not depend on data just being there

i was tracking about once a month my portfolio from various brokers in spreadsheet for years. One day that spreadsheet disappeared. I still don't know if it was miss-manipulation from me or something else. today it reinforced me in the idea of it being on the google side.

An important reminder, I rely on this to have quick access to official documents.. time to backoff.

I've an RPI with a big old spinning metal disk external HDD attached, anybody recommend a good workflow for automatic bidirectional sync between GD and my thing?

Nacho keys, nacho cyber

Yet another reminder that nobody should be storing anything only in the cloud. If you must use the cloud, always keep your own copy of all data you put there.

"Scary cloud things" like this are why I moved off of Dropbox and Onedrive years ago and got to a new normal of "syncthing all my main devices together" for file sync, then have Backblaze backup my machine that holds the linux ISOs etc. that I'm not necessarily wanting to have redundant storage space used for them at that time.

Now, the failures are my own fault, but at least I can do offline backups to HDDs and BD-Rs, and I don't have to worry about any of the cloud services (TM) messing with my data and me having little recourse.

Yes, I could do separate backups of the cloud things too, but at that point, that just adds 'cloud service' as an option to the first part of the equation:

cloud device storage / local device sync (on and off-site) / NAS

+ offline on-site backups as HDDs & BD-Rs

+ online off-site backups

Doing syncthing with local device sync is cheaper than cloud device storage (recurring costs) and NAS (fixed costs / hardware maintenance)

about 5 years ago I just stopped trusting the Cloud for my personal memories, it is just too valuable to lose those photos and I do not trust these companies. Running a home server is easy and cheap. 18Tb drives now are inexpensive. And every few times a year I'll make a full copy and stash it at my moms house and safety deposit box.

edit - love foldersync pro for my Android. does SFTP/SMB/webDAV and all major cloud services over wifi or 5g.

every night it transfers everything off my phone to my server which is also on parity

I've always found it sound to backup cloud storage. And this is an example why.

Anyone else having issues with Google Drive in Mac OS?

This week I've been noticing I've been unable to open files on my Mac that are stored in Google Drive. I can open them via the web, so I thought it was just the Mac software until I saw this post.