Awesome!
Anyone using this long term or in production who can comment on how it's been working?
I see TRIM is supported. Is RETRIM also (or whatever is needed during drive optimization to release areas that didn't get TRIMmed the first time due to a full command queue).
Could this serve as an effective NTFS replacement with data parity for those who don't like ReFS?
How mature is it compared to ZFS on Windows?
ReFS with Storage Spaces already serves this purpose and is integrated and fully supported.
From what I’ve heard, BTRFS has a crazy long list of defects where it’ll lock up or corrupt data if you so much as look at it wrong.[1]
Using something that is unreliable at best on its native OS shoehorned into Windows is madness. Fun for a lark, sure, but I would never ever entrust this combination with any actual data.
[1] “It works for me on my two disk mirror” is an anecdote, not data.
The list cannot be crazy long if Synology uses it for their NASes.
Synology uses a hybrid BTRFS+mdadm arrangement specifically to deal with reliability problems with BTRFS RAID: https://kb.synology.com/en-us/DSM/tutorial/What_was_the_RAID...
Which is kind of the point. BTRFS only has issues with RAID5/6 configurations. Using it as a filesystem for a single disk or partition should be totally fine.
Anecdotally, this is untrue.
Personally, BTRFS is the only filesystem that has ever caused me any data loss or downtime. I was using a single disk, so it should have been the perfect path. At some point the filesystem got into a state where the system would hang when mounting it read/write. I was able to boot off of a USB stick and recover my files, but I was unable to get the filesystem back into a state where it could be mounted read/write.
At work, we used to run BTRFS on our VMs as that was the default. Without fail, every VM would eventually get into a state where a regular maintenance process would completely hang the system and prevent it from doing whatever task it was supposed to be doing. Systems that wrote more to their BTRFS filesystems experienced this sooner than ones that didn't write very much, but eventually every VM succumbed to this. Eventually the server team had to rebuild every VM using ext4.
I know that anecdotes aren't data, but my experience with BTRFS will keep me from using it for anything even remotely important.
Unfortunately you got what you payed for! :) No one in the Linux world appears to be seriously investing in engineering a robust and reliable filesystem, with e.g. correctness proofs. We have only hobby projects.
Facebook literally uses it in production. There are plenty of insults we can use, but hobby project is not one of them.
Facebook presumably uses xz in production too and that is a hobby project (as we all recently found out). My understanding is that development of Btrfs was not sponsored by any company and was entirely a "community effort". It certainly would explain why it's perpetually unfinished.
I honestly find it weird when I hear about companies like Facebook and Synology using it.
Facebook could easily work around failures, they've surely got every part of their infrastructure easily replaceable, and probably automated at some level. I'm sure they wouldn't tolerate excessive filesystem failures, but they definitely have the ability to deal with some level of it.
But Synology deploys thousands of devices to a wide variety of consumers in a wide variety of environments. What's their secret sauce to make BTRFS reliable that my work's commercial Linux distribution doesn't have? Surely there's more to it than just running it on top of md.
Maybe in the years since I was burned by it things have greatly improved. Once bitten, twice shy though - I don't want to lose my data, so I'm going to stick to things that haven't caused me data loss.
At work, this all happened on a commercial Linux distribution which we do pay for. As far as I recall, their support was unable to resolve the issue, hence rebuilding all those VMs. I’m not on the server team, so I don’t know many details, but I was affected by this issue and it caused a lot of grief across the organization.
So no, I don’t think we got what we paid for.
Are you sure brtfs is supported in production by your commercial Linux distribution? I would be surprised if this is true. RedHat and Ubuntu do not support it.
It was at the time, it may not be now.
Everything I've read about btrfs's RAID5/6 deficiencies is that it can't tolerate sudden losses of power (aka write hole problem), which I think is fine so long as you are aware of it and implement appropriate safety measures such as a UPS or APU.
And besides, if you are doing RAID you are probably concerned with the system's uptime which probably means you will have implemented such measures anyway.
Note that, yes, I'm aware most home users either aren't aware (nobody RTFM) or are too lazy/cheap to buy a UPS from Office Depot. So perhaps btrfs is warning people to save them from themselves.
A UPS will not much improve the reliability against sudden power loss. At least here in Europe it is much more likely that a PSU or other component fails than that the power line is suddenly interrupted.
And lost writes are a problem that all filesystems have. I recommend reading the paper "Parity Lost and Parity Regained" by Krioukov at USENIX 08...
Kernel panic too...
I know but that's only one important part of what a file system does. If the file system was otherwise totally broken, they wouldn't use it.
Notably, Synology's agent-based backup software requires BTRFS but will not back up BTRFS.
https://kb.synology.com/tr-tr/DSM/help/ActiveBackup/activeba...
While that statement might well be correct that the quote is in fact an anecdote, the following is also an anecdote: ‘From what I’ve heard, BTRFS has a crazy long list of defects where it’ll lock up or corrupt data if you so much as look at it wrong.’
Witnessing defects means that they exist; witnessing no defects does not mean they don't.
If that’s the case, prove the giant Flying Spaghetti Monster doesn’t exist.
I think you’ve misread the parent comment. Witnessing no FSM does not mean there is no FSM.
Assuming FSM is referring to defects, witnessing no defects increases confidence that no defects exist in the observed state.
Conversely, witnessing defects does not itself prove defects exist if the test cases were not scientific, it increases the confidence that defects exist but there exists some probability that an unrelated defect (bad ram, kernel error, hardware failure, solar flares) could have caused the issue.
But there’s also a lot of evidence to suggest Brtfs has had a lot of defects resolved in recent years, so it’s also important to note that as time moves forward, the amount of existing and likely rate of introducing new defects is likely to decrease.
I should add ive had minimal skin in this game until yesterday. I chose brtfs for two systems for snapshot support, but that’s in addition to regular backups on another host, because it’s silly to trust any single compute node regardless of file system.
No. It is referring to the Flying Spaghetti Monster, but is an analogy for anything including defects. This is discussion about epistemology not filesystems.
Replace ‘defects’ with ‘miracles’ and/or ‘science’ depending on which makes more sense.
Yeah, an interesting scenario is that many people compare the btrfs behaviour to "never had issues with extfs". When in practice it's "extfs couldn't have told me about this issue even if it existed".
Made the mistake using btrfs for a Hadoop cluster at university in kernel 4.x times after reading that SLES uses it and after reading an interview on lwn with someone important, I think the maintainer at that time - that deemed it stable. This must be 10 or 12 years ago or so and it was a wild ride - crashes, manual recovery on 200 machines using clusterssh to get the partitions to mount again. Got out of disk space errors on a 16tb raid 1 (which is not a real raid1) with 5% usage - lot's of sweat I'd rather avoid. Should have just used ext4 in hindsight.
For me I decided to not touch it anymore after that experience. I'm sure there is a name for that bias but I don't care. Got burned badly. Lots of people had probably similar experiences and that's were that coming from. Reading the mailing list archives at that time might also be useful to convince yourself that it was more than anecdote.
I’m not disputing any discourse relating to the factual in/correctness of the anecdote, I’m pointing out that gp is providing an anecdote while disputing anecdotes that they don’t agree with.
Provide actual data that’s recent. Linux 4.x was what, 10 years ago? Cars are substantially safer now than they were 10/20/50 years ago, so whose to say your experience with a file system would be different?
Same could be said about cars: Why ever buy a [insert brand] again after you've been burned by its reliabity or other issues?
You probably just don't, as the alternatives are good and plenty.
Which cannot be said about file systems on Linux which support metadata+data checksum and repair though. As far as I'm aware the only file systems which could realistically be used are btrfs and zfs (bcachefs looks promising but not there yet). Zfs is not even a part of the kernel and you have to compile it yourself and hope it does actually compile against your kernel due to API changes.
True dat.. :-)
Just wanted to point out, that it is "normal" for people to avoid the thing that did not work out for them in the past.
What changed with respect to car safety compared to 2014? If anything, the recent trend of putting every control in touchscreen interfaces has made cars less safe.
It's unfortunately a very common anecdote over the last 10 years (and a similar experience to my own). And to be honest, it's a red flag with how this critical system component is being developed.
Nope. It works perfectly on both my striped arrays raid 0 and mirrored raid 1.
Thanks.
I tried ReFS when it first came out and it was terribly slow (with data parity on), and Storage Spaces was obscure to set up and manage. Has the landscape improved?
On WS2022 without patches I noticed that Storage Spaces was only queueing one IO per NVMe device. With current patches queuing is fixed and performance is much better. I think this was fixed sometime in 2023. I’m pretty sure both NTFS and ReFS were affected.
Ah, that would explain the absurdly bad I/O performance I was seeing in Azure VMs that had the new NVMe virtual disk controllers!
I had spoken with some of the teams involved and they were rather cagey about the root cause, but at least one person mentioned that there were some fixes in the pipeline for Windows Server 2022 NVMe support. I guess this must have been it!
One place where ReFS is rather decent is "reflinks" - that's where identical blocks are stored once and in the background the rest are simply links to the one block.
That is rather useful in backup systems.
XFS also supports reflinks amongst other things and is way older than ReFS and hence considered out of beta (which ReFS isn't, by me)
I don't trust data to RefS yet - its a fun project that will no doubt prove itself one day. For now, Windows boxes run NTFS and Linux runs ext4 or XFS.
What happens if the very important directory you copied 11 times (just to be sure) ends up producing the same block and doesn't indeed get duplicated as you expected? And now, that block gets corrupted...
Back in the day if I copied (geophysical air) survey data 11 times and put all the copies in the same walk in fire proof safe (in the hanger), that offered no real additional security in the event of a direct hit by an aircraft and explosion while the door was open.
If you're going to make 11 copies, they have to go to different physical locations, different devices at least, geographically different places to be sure, or it's pointless.
In this instance, block de-duping on a single device makes sense .. expecting mutiple copies on the same device (with or without duplicat block reuse) to offer any additional safety does not.
ReFS only got put back into normal Windows 11 a few months ago. That's a good sign for the future, but it was looking bad for a long time.
Also if you turn on data checksums, my understanding is it will delete any file that gets a corrupted sector. And you can only override this behavior on a per-file basis. Unless this changed very recently?
Oh, is it no longer exiled to Windows Pro for Workstations? This feature comparison chart still has this there:
https://www.microsoft.com/en-us/windows/business/compare-win...
For what it’s worth, regular Windows 10 & 11 Pro (and other editions maybe?) have supported reading and writing ReFS this whole time. It’s just the option to create a new volume that’s been disabled.
It still sort of is but you can create Dev Drive which is based on ReFS
Weirdly, it's possible that this version could be more stable/reliable/safe than the Linux version, since it's apparently a wholly independent reimplementation. I suppose it depends on whether BTRFS's problems stem from the underlying data format or the actual code as written for the Linux driver.
Advising ReFS is a little bit insane though. I would certainly not entrust it for my data either.
ReFS is terrible. We have seen so many customers lose data on ReFS that I started strongly advising everyone against using it.
One example: If you (accidentally or on purpose) attach a ReFS disk or LUN to a newer Windows version, it will be silently upgraded to a new ReFS version without any feedback (or chance to prevent it) for the user. No way of attaching the disk on an older Windows version afterwards. But that is not the real problem. The real problem is that the upgrade runs as a separate (user-space) process. If this process crashes or your PC crashes or reboots while it runs, your data is gone. There is no feedback how long it still has to run (we've seen multiple days on large volumes)
So yeah, maybe avoid ReFS for a few more years...
One time I accidentally ran a Visual Studio build in a btrfs git clone rather than my main NTFS drive. By the time I noticed and cancelled the build, there were two folders with an identical name but different contents, which I had to delete the folder name twice. I'd say the driver has issues with concurrency.
I once ran a `git clone` from WSL1 on the C: drive, and tried to build a C++ project in VS. It complained that "EXAMPLE.H" was not found. An "example.h" file did exist in the repo, and my code asked for "example.h". Turns out WSL1 set some obscure bit not known in Win32 land (but enforced by NTFS) that makes the file names case-sensitive, while VS's path normalisation expects a case-insensitive file system. Perhaps this was related to your issue?
In a separate occasion, I also got that issue (but worse). I once marked a NTFS folder as case-sensitive to help root out all case mismatch bugs (to get a C++ project eventually building on Linux), but then Visual Studio and CMake started spitting out "file not found" errors even for the correct case! I had somehow produced a "cursed" folder that could not be used for building code until I copied (not moved) its contents to a regular case-insensitive NTFS folder.
I have run this casually on my main machine for a few years now. I have a Windows partition, a Linux partition (btrfs on LUKS), and a third btrfs partition where I kept my files.
I don’t use it often, but when I do I don’t even notice it. It’s as if Windows could just natively read btrfs all along. This was without any “advanced” usage beyond simply accessing, modifying, or deleting files.
Heads up, installing both WinBTRFS and OpenZFS on Windows may have problems:
"Win OpenZFS driver and WinBtrfs driver dont play well with each other"
https://github.com/openzfsonwindows/openzfs/issues/364