WizTree is famously almost 50x faster than WinDirStat (on normal Windows NTFS drives) by reading the Master File Table (MFT) instead of walking the tree to measure each file.
WizTree isn't open-source like WinDirStat but "free as in beer" with optional donations.
There's also a fork of WinDirStat patched to read the MFT but I don't know anyone who's tried it: https://github.com/ariccio/altWinDirStat
What's the downside of just reading the MFT? Why doesn't Microsoft do it in file explorer, and why wouldn't every tool use it instead of walking through the file system? Maybe there's no downside but it's such a huge speed boost that it would be weird to not use it otherwise, right?
Along with the reasons others have mentioned, it would also bypass any filter driver in the file system stack (Windows has the concept of a stack of filter drivers that can sit in front of the file system or hardware) and would also ignore any permissions (ACLs) on who can see those files. There’s no way they can credibly use this technique outside of say something from SysInternals: it violates the security and layering of the operating system and its APIs.
Is there a Linux equivalent for those "filters"? I'm a bit clueless about win32 and NT sadly enough...
Would that mean that there's no way to "scope" the MFTs?
Edit: That also makes sense, since if I got it right they aren't necessarily supposed to be consumed by userspace programs?
I guess that's why those tools always ask for admin access and basically all perms to the FS.
It's a bit sad that the user gets exposed to a much slower search and FS experience even if the system underneath has the potential to be as fast as it gets. And I don't think ReFS is intended to replace NTFS (not that it's necessarily more performant anyways)
There is no equivalent on Linux. That's why linux has no online antivirus scanners (scanners that scan the file as it's opened) while this is a basic feature of every antivirus program on Windows.
Linux has device mappers (dm-crypt, dm-raid and friends). But those sit below the file system, emulating a device. Window's file system filter drivers sit above the file system, intercepting API calls to and from the file system. That's super useful if you want to check file contents on access, track where files are going, keep an audit log of who accessed a file, transparently encrypt single files instead of whole volumes, etc. But you pay the price for all that flexibility in performance.
What are the APIs related to this named?
IO Minifilter drivers are the modern version: https://learn.microsoft.com/en-us/windows-hardware/drivers/i...
Sure there is, you're talking about fanotify.
https://man7.org/linux/man-pages/man7/fanotify.7.html
https://lwn.net/Articles/339399/
It even lets you block the access until the scan/decision is made.
I believe they're approximately equivalent to FUSE
Filters are vaguely similar to things like mountpoints overlaying portions of the filesystem. E.g. in Linux you might have files in /d1/d2/{f1,f2,f3} in the root filesystem but you also have a mountpoint of a 2nd filesystem on /d1/d2 that completely changes the visibility / contents of d2. Filter drivers can do similar things (although they are not actually independent mountpoints).
Maybe stating the obvious, but if the security can be violated that easily, it's not very secure.
You need admin permissions to read the MFT on Windows. The traditional security model of both Windows and Linux assumes that the kernel is a security barrier between system and unprivileged user, and between different unprivileged users. An admin being able to bypass security restrictions isn't traditionally seen as a problem.
Indeed, only in very recent history has the admin/root user/owner been seen as a threat to the system and the system employs defenses against them. I'm hoping that trend reverses because I really hate the direction things are going.
There are pretty good reasons to do that. We've been really lax in what is allowed to run as root/admin when in reality, those permissions should only be used when doing things like reading the MFT or snooping on all the network traffic with Wireshark. It should not be required to run as root/admin in order to install most software because installing software is a very common thing to do.
Even if you want more control over your system, I still think technically capable people would be better served by having a separate administrator account from your normal day-to-day account which you have to explicitly log into (so no UAC prompts, you need to go onto that other account and then you get the UAC prompt). Unfortunately, I think most Desktop OSes are still too unusable with this sort of workflow due to how much software insists on admin for installation.
I largely agree. I think what makes the "the user is a threat" model so difficult to me is that there is a lot of truth to it. Users often don't know enough to make good decisions.
I really like your idea of logging in separately, such that is isn't something you're going to do cavalierly. That seems like a great compromise to me! I fully agree that we way overuse admin and really don't need it for the majority of things.
Then it doesn't violate the security of the OS, if you need to be an admin to do it.
Reading the MFT directly requires Administrator permissions, and doing it correctly means reimplementing support for every nook and cranny of NTFS including things like hard links, junction/reparse/mount points, sparse files, etc.
Spacemonger uses the MFT and doesn't require Administrator privileges
Is this the Spacemonger you are talking about https://web.archive.org/web/20121126062443/http://www.sixty-...
It does not say anything like that in FAQ and i don't remember it being fast.
Yes that one. Just use it and see. It's blazing fast.
Just learned that its open source now https://github.com/seanofw/spacemonger1
Been using the portable version of 1.4 for decades after first coming across it in some PC magazine or something like that many years ago. Not terribly pretty, but it does what I need and it still works.
I thought you meant the $15 utility from Stardock, but if not then I'm fairly confident it's not reading the MFT.
https://github.com/seanofw/spacemonger1/blob/6a41c012534b170...
It's still interesting that they got it to work as fast and precise as they did.
It uses FindFirstFile etc https://github.com/seanofw/spacemonger1/blob/6a41c012534b170...
AFAIR MFT access requires Administrator/SYSTEM rights and there is absolutely no way to read it as a regular user.
The only workaround (used by Everything by VoidTools) is to install a service which would run with a needed rights and communicate with it in the GUI.
You call that a workaround but it’s basically the best possible situation security-wise. If this didn’t work securely then it wouldn’t be possible to implement disk defragmenter or even explorer. It’s so core to Windows NT’s security model that I wouldn’t call it a workaround.
You do similar things even with more modern stacks - assign a permission to an application and grant permissions to the application to the user.
The only real concern is that Windows NT permissions are not as granular as they could be.
For objects, Windows NT permissions are ridiculously granular; e.g. GENERIC_WRITE can be mapped to a half-dozen separately settable type-specific flags, depending on the object type (file, named pipe, etc.). It’s too granular for even an administrator to make sense of, arguably, and the documentation is somewhere between bad and nonexistent. (The UI varies from decent, like the ACL editor you can access from e.g. Explorer, to “you can’t make this shit up”, like SDDL[1].)
For subjects, the situation is not good, like on every other conventional OS. You could deal with that by introducing a “user” for each app, as on Android. But I’m not aware of any attempts to do that (that would expose this mechanism in a user-visible way).
(Then there’s the UWP sandbox, which as far as I tell is build with complete disregard of the fundamental concepts above. I don’t think it’s worth taking seriously at this time.)
[1] https://learn.microsoft.com/en-us/windows/win32/secauthz/sec...
I have no idea if there’s a granular object permission that could give access to the MBR of a disk. I’ve thankfully never had to dig that deep into Windows internals.
I’ve had to work with SDDL before to setup granular permissions for WMI monitoring on a whole lot of computers and my god, did it make me love the Cloud and Linux. I can’t emphasize enough how unintuitive setting these permissions is creates systemic over privileging.
> What's the downside of just reading the MFT?
One possible reason is that it isn't a published part of the filesystem's external interface, and the format is not guaranteed to be static between versions or even point releases (though in reality, while the behaviours may be officially undefined that are unlikely to change significantly).
Also, it requires admin elevation to access. Anything running elevated is a potential security concern as it can access much else too.
> Why doesn't Microsoft do it in file explorer
Not sure, but it could be because that would be seen as an unfair advantage so to avoid anti-trust allegations they would have to publish the format and make stability guarantees for it, so others could use it as easily/safely. That, and the reasons above & below too.
> and why wouldn't every tool use it instead of walking through the file system?
Largely because walking the filesystem works for all filesystems, local and remote, so you cover everything with one tree walk implementation. Implementing a tree-walk over the MFT data where available is extra work to implement and support for one filesystem, and not many care enough, or are not aware of the potential speed benefit at all, for it to be a huge selling point such that all toolmakers feel compelled to bother.
I am not going to pull every document, but the MFT structure is documented and published. I am uncertain what you mean by "external interface".
"About 9,810 results (0.04 sec)"
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C11&q=mft...
Moreover it is documented by Microsoft itself: https://learn.microsoft.com/en-us/windows/win32/devnotes/mas...
Though all the sub-pages of that state things like “[This structure is valid only for version 3 of NTFS volumes; it may be altered in future versions.]” — while it is true that any API could see breaking changes in future, this suggests that you should expect them, so I'd not call it supported in the same sense of the main file/directory access APIs which I would not expect to see breaking changes in (additional properties & functionality yes, but not existing things changing behaviour).
A lot of people talking about the details, does not constitute official documentation though.
You can find a lot of articles talking about SQL Server's DBCC IND and DBCC PAGE, but that isn't official documentation – they are essentially internal functions and not supported and could change or go away entirely despite having been around for many versions, as they have in Azure). Similarly there articles talking about sys.dm_db_database_page_allocations which sort-of does the job of DBCC IND, but again this is not officially documented & supported.
> I am uncertain what you mean by "external interface".
I meant the published interface. Maybe "supported API" would have been a better phrase to use?
Though as pointed out below, there is at least some official documentation on the MFT structure.
It's probably also racy to access the raw MFT while there are concurrent programs creating new files (or deleting files). That complication can be avoided by using the ordinary OS directory iteration primitives.
Yep but then the tradeoff of performance gains are completely discarded. The easiest solution is to take a snapshot with VSS, which is both fast and makes a quiesced copy of $MFT. From there, one could monitor FS changes if they wanted to have live updates.
>What's the downside of just reading the MFT? Why doesn't Microsoft do it in file explorer, and why wouldn't every tool use it instead of walking through the file system?
One disadvantage is that you can't read the MFT of network shares or device emulators presenting "virtual drive letters" to the OS.
The typical (and slower) Win32 API functions FindFirstFile()/FindNextFile() used to iterate through the files structure work at a higher level of abstraction so they work on more targets that don't have an NTFS MFT. Indeed, if you point WizTree to a SMB network share, it will be a lot slower because it can't directly read the MFT.
It's conceivable that Microsoft developers could have programmed Windows Explorer differently to have an optimized code path of reading MFT for local disks and then fall back to slower FindFirstFile()/FindNextFile() for non-MFT disks. Maybe that adds too much complexity and weird bugs. I notice that most of the 3rd-party "Win Explorer replacement" utilities also don't read MFT.
Surely this would have been worth doing, even if it meant flushing out bugs elsewhere.
With RAM sizes now, it's curious why any OS wouldn't just cache some or all of metadata for some local volumes on a block basis rather incur the greater resource usage of transforming disk into different structures, and then caching and track individual entries.
More MFT goodness: the file search tool Everything (https://www.voidtools.com/)
I want to like Everything but every time I start it up it takes 30 sec to 1 minute to update it's index
Try Everything 1.5a - an "alpha" version with many improvements, in development for years but inexplicably hidden away on their website. Never experienced any instability.
https://www.voidtools.com/forum/viewtopic.php?t=9787
Wow! Shocked that this is the first I've heard of this given that I've been using Everything for years now. Thanks for the link.
Love everything, but I had no idea there was an update... I'll have to try it right away, thanks!
Uninstall, re-install as a service which may now be default.
Better as a service too because the GUI doesn't need to request admin rights.
You can set it to run on startup or as a service so it updates the index in the background.
You should not be starting it when you want to search. You should open it when you log in, and leave it in the tray. It will do a full index on launch then subscribe to filesystem notifications to keep itself up to date for as long as it’s open.
Do that and it’s alarmingly fast and responsive except for the minute or two right after launch.
Contrasting seemingly all the other responses to this, I use it the same way you do (only opening it when needed) and I'm fine with the delay: even at its slowest rebuilding the index and searching is faster than the in-built windows Search.
It should be starting at boot if you installed it as a service, so the indexing will be done then. After that opening and searching is instant.
I am building an advanced filemanager (FileNinja) for Windows with full integrated everything search & query. you have the option of saving bookmarks to virtual folders that consist of everything searches. Instant directory sizes, tags, custom file descriptions for ntfs. Anyone interested? https://youtu.be/JREufgkf5pk?si=sP05UCOrskpX8OTq
Do you have a git repo to follow?
You can check out the following sites https://github.com/sandeberger, http:\\thefile.ninja or my homepage at https://kodar.ninja. The project is not opensource.
haha I like the voiceover, the video is fun
I'm interested! Great marketing video by the way, a good example of using AI-powered voiceovers to level up the one-man-marketing polish capabilities.
So that's why Everything is so fast. Nice.
essential tool
It's crazy how the Windows Search Indexer still doesn't use MFT.
It doesn't even bloody support network drives so there's no such reason.
WizTree also understands things like OneDrive and Dropbox, and know that files "stored in the cloud" aren't taking up any disc space -- WinDirStat thinks my drive is 140% full.
What about hard links?
Wiztree and WinDirStat will both double count hard links. I have a 12TB hard drive holding "17TB" because of sparse files and hard links. Windows file manager properties agree with Wiztree and WinDirStat as far as space used. I think the file manager looks for free space and calculates that separately, while Wiztree and WinDirStat are just adding up used space.
You’ve got me interested but I’m finding it quite annoying that WizTree doesn’t actually have pictures of the software UI on the website. At least not under any of the obvious links I’ve checked.
If you want to see screenshots of any piece of software, just search the name of the software on your favorite search engine and go to ‘images’.
(This might seem obvious, but it took me a long time to realize, hence why I’m passing the tip on.)
Which is enough for me to not use it because WinDirStat still only takes a minute. Cool software though.
Exactly this.
Didn't try AltWinDirStat, but did try FastWinDirStat.
The thing is, FastWinDirStat uses a licensed propietary component. No problem for me, but the author did have some back and forth with another user on GitHub.
Seems FastWinDirStat license don't match with using a closed source library, or something...
As for its actual functioning, it does as it says. Works much faster than WinDirStat
Looks like a pretty clear violation of the WinDirStat license. They took WinDirStat which is GPL, linked it with some other proprieraty code and distributed the result.
(They could have been clear-ish (with caveats) by distributing only the source code and let the users do the compiling and linking, similarly to how you could download ZFS and build it into Linux. But you mustn't distribute the result further.)
having used both on my pc, can attest to the speed claims. wiztree has yet to demonstrate annoying freeware/donationware pop ups during daily use.
FileLocator Pro is a good search tool that also uses the MFT.
WizTree is no longer free for commercial use.
I believe version 3.38 was the last version that is completely "free as in beer" with optional donations.
Thanks for this; it is incredibly faster. Never heard of it before.
SpaceSniffer is a much easier to use tool.
I'm a big Filelight fan. It used to not work well on NTFS volumes, it would miss files flagged Archived, has that been solved?
Seeing a description directly in the README for the folders in the repo and their contents makes me really happy. I wish more projects would do that.
I wish there was a duplicate file finder that used the MFT scan to pre-process the data instead of the FS tree walk