return to table of content

Show HN: Dut – a fast Linux disk usage calculator

montroser
15 replies
16h12m

Nice work. Some times I wonder if there's any way to trade away accuracy for speed? Like, often I don't care _exactly_ how many bytes is the biggest user of space, but I just want to see some orders of magnitude.

Maybe there could be an iterative breadth-first approach, where first you quickly identify and discard the small unimportant items, passing over anything that can't be counted quickly. Then with what's left you identify the smallest of those and discard, and then with what's left the smallest of those, and repeat and repeat. Each pass through, you get a higher resolution picture of which directories and files are using the most space, and you just wait until you have the level of detail you need, but you get to see the tally as it happens across the board. Does this exist?

lenkite
4 replies
14h11m

Wish modern filesystems maintained usage per dir as a directory file attribute instead of mandating tools to do this basic job.

hsbauauvhabzb
2 replies
13h33m

This is an excellent point and I wholeheartedly agree!

masklinn
1 replies
13h25m

Is it? That would require any update to any file to cascade into a bunch of directory updates amplifying the write and for what? Do you “du” in your shell prompt?

Not to mention it would likely be unable to handle the hardlink problem so it would consistently be wrong.

IsTom
0 replies
8h37m

That would require any update to any file to cascade into a bunch of directory updates amplifying the write and for what?

You can be a little lazy about updating parents this and have O(1) update and O(1) amortized read with O(n) worst case (same as now anyway).

nh2
0 replies
12h59m

CephFS does that.

You can use getfattr to ask it for the recursive number of entries or bytes in a given directory.

Querying it is constant time, updates update it with a few seconds delay.

Extremely useful when you have billions of files on spinning disks, where running du/ncdu would take a month just for the stat()s.

201984
3 replies
16h1m

Thanks!

What you described is a neat idea, but it's not possible with any degree of accuracy AFAIK. To give you a picture of the problem, calculating the disk usage of a directory requires calling statx(2) on every file in that directory, summing up the reported sizes, and then recursing into every subdirectory and starting over. The problem with doing a partial search is that all the data is at the leaves of the tree, so you'll miss some potentially very large files.

Picture if your program only traversed the first, say, three levels of subdirectories to get a rough estimate. If there was a 1TB file down another level, your program would miss it completely and get a very innaccurate estimate of the disk usage, so it wouldn't be useful at all for finding the biggest culprits. You have the same problem if you decide to stop counting after seeing N files, since file N+1 could be gigantic and you'd never know.

montroser
2 replies
14h57m

Yeah, maybe approximation is not really possible. But it still seems like if you could do say, up to 1000 stats per directory per pass, then running totals could be accumulated incrementally and reported along the way.

So after just a second or two, you might be able to know with certainty that a bunch of small directories are small, and then that a handful of others are at least however big has been counted so far. And that could be all you need, or else you could wait longer to see how the bigger directories play out.

geertj
1 replies
14h36m

You would still have to getdents() everything but this way you may indeed save on stat() operations, which access information that is stored separately on disk and eliminating these would likely help uncached runs.

You could sample files in a directory or across directories to get an average file size and use the total number of files from getdents to estimate a total size. This does require you to know if a directory entry is a file or directory, which the d_type field gives you depending on the OS, file system and other factors. An average file size could also be obtained from statvfs().

Another trick is based on the fact that the link count of a directory is 2 + the number of subdirectories. Once you have seen the corresponding number of subdirectories, you know that there are no more subdirectories you need to descend into. This could allow you to abort a getdents for a very large directory, using eg the directory size to estimate the total entries.

olddustytrail
0 replies
6h41m

Another trick is based on the fact that the link count of a directory is 2 + the number of subdirectories.

For anyone who doesn't know why this is, it's because when you create a directory it has 2 hard links to it which are

    dirname
    dirname/.
When you add a new subdirectory it adds one more link which is

    dirname/subdir/..
So each subdirectory adds one more to the original 2.

mos_basik
2 replies
14h23m

Something like that exists for btrfs; it's called bdtu. It has the accuracy/time trade-off you're interested in, but the implementation is quite different. It samples random points on the disk and finds out what file path they belong to. The longer it runs the more accurate it gets. The readme is good at explaining why this approach makes sense for btrfs and what its limitations are.

https://github.com/CyberShadow/btdu

renewiltord
0 replies
13h32m

Damn, `ext4` is organized differently entirely. You can't get anything useful from:

    sudo debugfs -R "icheck $RANDOM" /dev/nvme1
    sudo debugfs -R "ncheck $res" /dev/nvme1
and recursing. That's a clever technique given btrfs structs.

jszymborski
0 replies
14h12m

That's so cool.

BeeOnRope
2 replies
16h0m

This seems difficult since I'm not aware of any way to get approximate file sizes, at least with the usual FS-agnostic system calls: to get any size info you are pretty much calling something in the `stat` family and at that point you have the exact size.

fsckboy
1 replies
15h43m

i thought files can be sparse and have holes in the middle where nothing is allocated, so the file size is not what is used to calculate usage, it's the sum of the extents or some such.

BeeOnRope
0 replies
15h37m

Yes, files can be sparse but the actual disk usage information is also returned by these stat-family calls, so there is no special cost to handling sparse files.

robocat
13 replies
10h22m

but I don't like how unintuitive the readout is

The best disk usage UI I ever saw was this one: https://www.trishtech.com/2013/10/scanner-display-hard-disk-... The inner circle is the top level directories, and each ring outwards is one level deeper in the directory heirarchy. You would mouse over large subdirectories to see what they were, or double click to drilldown into a subdirectory. Download it and try it - it is quite spectacularly useful on Windows (although I'm not sure how well it handles Terabyte size drives - I haven't used Windows for a long time).

Hard to do a circular graph in a terminal...

It is very similar to a flame graph? Perhaps look at how flame graphs are drawn by other terminal performance tools.

bmicraft
0 replies
8h14m

Also Filelight (KDE)

krackers
2 replies
9h57m

DaisyDisk on mac does that. Also it's blazing fast, it seems to even beat "du" so I don't know what tricks they're pulling.

supernes
1 replies
9h45m

I think they're reading some of the info from Spotlight metadata already collected by the OS for indexing, but I could be wrong.

radicality
0 replies
1h44m

That’s probably it. It’s likely powered by whatever thing gives you quick directory sizes in Finder after you do View Options (cmd+j), and select “Show All Sizes”. I have that setting always on for all directories and pretty sure it’s cached as it’s fast.

ajnin
2 replies
7h44m

I don't like radial charts because the outer rings have a larger area, which makes it look like files deep in the hierarchy take more space than in reality. And also it leaves the majority of screen space unused.

I prefer the more classic treemap view, my personal favorite being the classic version of SpaceMonger but it's Windows only and very slow.

roelschroeven
1 replies
2h39m

WizTree also uses a treemap view and is very fast. It's also Windows-only though.

OskarS
1 replies
8h26m

I've used graphical tools very similar to this, and I always come back to this:

   du -h | sort -rh | less
(might have to sudo that du depending on current folder. on macos, use gsort)

You just immediately see exactly what you need to delete, and you can so quickly scan the list. I'm not a terminal die-hard "use it for everything" kinda guy, I like GUIs for lots of stuff. But when it comes to "what's taking up all my space?" this is honestly the best solution I've found.

pnutjam
0 replies
16m

I like to use:

    du -shx \*
I used to pipe that to:

    | grep G
to find anything gig sized, but I like your:

    | sort -rh
Thanks!

rlue
0 replies
10h7m

That’s called a ring chart or a sunburst chart.

miew
11 replies
12h17m

Why C and not Rust or even Zig?

Galanwe
4 replies
12h15m

Why Rust or Zig and not C?

nottorp
3 replies
11h31m

Better for the author's resume, if they want to make it hype driven.

Also some nebulous "being more secure". Never mind that this tool does not have elevated privileges. You gotta watch out for those remote root exploits even for a local only app, man.

_flux
2 replies
7h12m

I mean you could extract an archive you've downloaded to your filesystem and said archive could have funky file names and then you use this tool..

But I suppose it's not a very likely bug to have in this kind of tool.

nottorp
1 replies
6h35m

Or “they” could enter your house at 2 am, drug you and hit you with a $5 wrench until they get access to your files :)

gsck
0 replies
3h49m

Never understood this line of thought. Not everything needs to be super secure. Not everything is going to be an attack vector. No one is going to deploy this onto a production server where this program specifically is going to be the attack vector.

Memory safety is cool and all, but a program that effectively sums a bunch of numbers together isn't going to cause issues. Worst case the program segfaults

estebank
2 replies
1h36m

it would be more difficult to make raw syscalls in Rust

Would you like to expand on this? Is it because of type conversions that you'd have to do?

I use[1] flexible array members to minimize allocations

I was under the impression that FAM was a non-standard extension, but alas it is part of C99.

From what I'm seeing you have intrusive list where each `entry` points to the previous and next element, where the path itself is a bag of bytes as part of the entry itself. I'm assuming that what you'd want from Rust is something akin to the following when it comes to the path?

    struct Entry<const NAME_LEN: usize> {
        ..
 mode: Mode,
 name: InlineName<NAME_LEN>,
    }

    struct InlineName<const NAME_LEN: usize> {
 value: [u8; NAME_LEN],
    }

201984
1 replies
54m

For syscalls, I would have needed to either pull in dependencies or write FFI bindings, and neither of those options are appealing when I could simply write the program in Linux's native language.

For the FAM, your example looks like it requires a compile-time constant size. That's the same as hardcoding an array size in the struct, defeating the whole point. Short names will waste space, and long ones will get truncated.

estebank
0 replies
25m

For the FAM, your example looks like it requires a compile-time constant size. That's the same as hardcoding an array size in the struct, defeating the whole point. Short names will waste space, and long ones will get truncated.

You made me realize that the const generics stabilization work hasn't advanced enough to do what I was proposing (at least not in as straightforward way): https://play.rust-lang.org/?version=nightly&mode=debug&editi...

Those are const arguments, not const values, which means that you can operate on values as if they didn't have a size, while the compiler does keep track of the size throughout.

I'd have to take a more detailed look to see if this level of dynamism is enough to work with runtime provided strings, which they might not be, unless you started encoding things like "take every CStr, create a StackString<1> for each char, and then add them together".

estebank
0 replies
1h47m

This is not an appropriate question to ask, that I see sometimes in these threads. "Because the author wanted" is good enough a reason for them to write a program in C. It being a new project written in C can also be good enough reason for you not to use it: dust already exists and is written in Rust, which you can use instead.

craftoman
0 replies
9h17m

Because C is like a cult Toyota Supra with twin turbo and Rust or Zig is like another cool boring Corvette roadster.

shellfishgene
8 replies
13h20m

What I need is a du that caches the results somewhere and then does not rescan the 90% of dirs that have not changed when I run it again a month later...

kstrauser
7 replies
13h1m

And it would know they did not change without scanning them because how?

svpg
4 replies
12h47m

It could hash the contents of a dir. Along the lines of git

legends2k
1 replies
12h32m

And to hash something needs reading all of its data. I think deducing the file size would actually be faster in some file systems and never slower with any.

mort96
0 replies
11h11m

Faster in all file systems I'd guess, stat is fast, opening the file and reading its contents and updating a checksum is slow, and gets slower the larger the file is.

Galanwe
1 replies
12h32m

Except hashing requires... reading.

There is not much to be done here. Directories entries are just names, no guarantees that the files were not modified or replaced.

The best you could do is something similar to the strategies of rsync, rely on metadata (modified date, etc) and cross fingers nobody did `cp -a`.

shellfishgene
0 replies
12h6m

I would be fine with the latter, the program could display a warning like "Results may be inaccurate, full scan required" or something.

I guess I'm just annoyed that for Windows/NTFS really fast programs are available but not for Linux filesystems.

shellfishgene
1 replies
7h21m

Maybe it could run in the background and use inotify to just update the database all the time, or at least keep track of what needs rescanning?

shellfishgene
0 replies
7h14m

Thinking about this some more, does this system not already exist for the disk quota calculation in the kernel? How does that work? Would it be possible for a tool to scan the disk once, and then get information about file modifications from the system that's used to update quota info?

INTPenis
7 replies
11h34m

Reminds me of someone's script I have been using for over a decade.

    #/bin/sh
    du -k --max-depth=1 "$@" | sort -nr | awk '
         BEGIN {
            split("KB,MB,GB,TB", Units, ",");
         }
         {
            u = 1;
            while ($1 >= 1024) {
               $1 = $1 / 1024;
               u += 1
            }
            $1 = sprintf("%.1f %s", $1, Units[u]);
            print $0;
         }
        '

mshook
4 replies
10h53m

I don't understand the point of the script, it's nothing more than:

  du -h --max-depth=1 "$@" | sort -hr

out-of-ideas
1 replies
10h45m

`-h` is not available in all `sort` implementations

BlackLotus89
0 replies
8h22m

Even the busybox port has it. The only sort implementation I know of that doesn't have -h is toybox (I guess older busybox implementations are missing it as well), but I'm using -h for well over a decade and seldom had it missing

dotancohen
0 replies
10h12m

The point is that it is faster.

INTPenis
0 replies
4h48m

I found this online a long time ago, and it's been with me across BSD, Macintosh and Linux. So I can't say why it is that way, and I didn't know about sort -h before today.

matzf
0 replies
10h52m

Any particular reason for doing the human readable units "manually"? `du -h | sort -h` works just fine.

kelahcim
0 replies
9h46m

I will definitely try this one and compare with my daily stuff

`du -s -k * | sort -r -n -k1,1 -t" "`

Neil44
5 replies
9h51m

On Windows I always used to use Windirstat but it was slow, then I found Wiztree which is many orders of magnitude faster. I understand it works by directly reading the NTFS tables rather than spidering through the directories laboriously. I wonder if this approach would work for ext4 or whatever.

Filligree
1 replies
7h59m

NTFS is pointlessly slow, so bypassing the VFS provides a decent speedup in exchange for the ridiculous fragility.

Linux doesn’t have the same issue, and I’d be quite concerned if an application like this needed root access to function.

luma
0 replies
2h0m

I think you underestimate how much of a speedup we're talking about: it can pull in the entire filesystem in a couple seconds on a multi TB volume with Bs of files. I have yet to see anything in the linux world (including the OP) that comes anywhere near this performance level via tree walking.

utensil4778
0 replies
49m

it works by directly reading the NTFS tables rather than spidering through the directories

Maybe I'm just ignorant of linux filesystems, but this seems like the obvious thing to do. Do ext and friends not have a file table like this?

jeffbee
4 replies
15h36m

It's also interesting that the perf report for running dut on my homedir shows that it spends virtually all of the time looking for, not finding, and inserting entries in dentry cache slabs, where the entries are never found again, only inserted :-/ Great cache management by the kernel there.

ETA: Apparently the value in /proc/sys/vm/vfs_cache_pressure makes a huge difference. With the default of 100, my dentry and inode caches never grow large enough to contain the ~15M entries in my homedir. Dentry slabs get reclaimed to stay < 1% of system RAM, while the xfs_inode slab cache grows to the correct size. The threads in dut are pointless in this case because the access to the xfs inodes serializes.

If I set this kernel param to 15, then the caches grow to accommodate the tens of millions of inodes in my homedir. Ultimately the slab caches occupy 20GB of RAM! When the caches are working the threading in dut is moderately effective, job finishes in 5s with 200% CPU time.

201984
3 replies
15h23m

Are you referring to the kmem_cache_alloc calls in the profile? If so, that's all in kernel space and there's nothing I can do about it.

https://share.firefox.dev/3XT9L7P

jeffbee
2 replies
15h21m

No, see how your profiles have `lookup_fast` at the leaves? Mine has `__lookup_slow` and it is slow indeed.

201984
1 replies
15h2m

I just saw your edit. You have WAY more stuff under your home directory than I do. I only have ~2.5M inodes on both my laptop drives combined. The difference in the buff/cache output of `free` before and after running `dut` is only 1 GB for me.

Also, TIL about that kernel parameter, thanks!

jeffbee
0 replies
32m

Yeah I have a TB of bazel outputs in my cache directory. Unfortunately automatically deleting old bazel outputs is beyond the frontier of computer science and has been pushed out to future releases for 6 years and still going: https://github.com/bazelbuild/bazel/issues/5139

jepler
3 replies
16h36m

did you consider using io_uring? if not, was there a reason other than portability?

201984
2 replies
16h29m

io_uring doesn't support the getdents syscall, so there's no way to traverse the filesystem with it. I considered using it for statx(2) to get the disk usage of each file, but decided not to because (a) it would be complicated to mix normal syscalls and io_uring and (b) perf showed the kernel spending most of its time doing actual work and not syscall boilerplate.

nh2
0 replies
12h51m

Are you sure the perf may not be misleading?

E.g. memory accesses might show up as slower die to CPU caches being flushed when switching between user and kernel space.

I would be extremely interested in a quick (standalone?) benchmark of e.g. 1M stats with vs without uring.

Also https://github.com/tdanecker/iouring-getdents reports big uring speedups for getdents, which makes it surprising to get no speedups for stat.

If uring turns out fast, you might ignore (a), just doing the getdents first and then all stats afterwards. Since getdents is a "batch" syscalls covering many files anyway, but stat isn't.

jepler
0 replies
15h46m

I appreciate the explanation!

notarealllama
2 replies
16h58m

If this accurately shows hidden stuff, such as docker build cache and old kernels, then it will become my go-to!

nottorp
0 replies
11h33m

Uh, even basic du "shows hidden stuff" accurately doesn't it?

dot files are just a convention on unix.

201984
0 replies
16h48m

As long as it has permissions, it totals up everything under the directory you give it including names that start with a '.'. It won't follow symlinks though.

kccqzy
2 replies
17h17m

I'm away from my Linux machine now but I'm curious whether/how you handle reflinks. On a supported file system such as Btrfs which I use, how does `cp --reflink` gets counted? Similar to hard links? I'm curious because I use this feature extensively.

201984
1 replies
16h50m

I've actually never heard of --reflink, so I had to look it up. `cp` from coreutils uses the FICLONE ioctl to clone the file on btrfs instead of a regular syscall.

I don't handle them specifically in dut, so it will total up whatever statx(2) reports for any reflink files.

vlovich123
0 replies
14h46m

You’ll probably end up with dupes (and removing these files won’t have the effect you intend) but I don’t know that there’s a good way to handle and report such soft links anyway.

nh2
0 replies
13h6m

fts is just wrapper functions.

You cannot around getdents and stat family syscalls on Linux if you need file sizes.

201984
0 replies
6h52m

Those are single threaded, so they would have kneecapped performance pretty badly. 'du' from coreutils uses them, and you can see the drastic speed difference between that and my program in the README.

inbetween
2 replies
2h3m

I often want to know who there is a sudden growth disk usage over the last month/week/etc, what suddenly take space. In those cases I find myself wishing that du and friends would cache their last few runs and would offer a diff against them, this easily listing the new disk eating files or directories. Could dut evolve to do something like that?

beepbooptheory
0 replies
1h48m

  du[t] > .disk-usage-"`date +"%d-%m-%Y"`"
And then use diff later?

imiric
2 replies
6h45m

I've been using my own function with `du` for ages now, similar to others here, but I appreciate new tools in this space.

I gave `dut` a try, but I'm confused by its output. For example:

  3.2G    0B |- .pyenv
  3.4G    0B | /- toolchains
  3.4G    0B |- .rustup
  4.0G    0B | |- <censored>
  4.4G    0B | /- <censored>
  9.2G    0B |- Work
  3.7G    0B |   /- flash
  3.8G    0B | /- <censored>
   16G  4.0K |- Downloads
  5.1G    0B | |- <censored>
  5.2G    0B | /- <censored>
   16G    0B |- Projects
  3.2G   42M | /- <censored>
   17G  183M |- src
   17G    0B | /- <censored>
   17G    0B |- Videos
  3.7G    0B | /- Videos
   28G    0B |- Music
  6.9G    0B | |- tmp
  3.4G    0B | | /- tmp
  8.8G    0B | |- go
  3.6G    0B | |   /- .versions
  3.9G    0B | | |- go
  8.5G    0B | | |     /- dir
  8.5G    0B | | |   /- vfs
  8.5G    0B | | | /- storage
  8.5G    0B | | /- containers
   15G  140M | /- share
   34G  183M /- .local
  161G    0B .
- I expected the output to be sorted by the first column, yet some items are clearly out of order. I don't use hard links much, so I wouldn't expect this to be because of shared data.

- The tree rendering is very confusing. Some directories are several levels deep, but in this output they're all jumbled, so it's not clear where they exist on disk. Showing the full path with the `-p` option, and removing indentation with `-i 0` somewhat helps, but I would almost remove tree rendering entirely.

201984
1 replies
6h22m

It is being sorted by the first column, but it also keeps subdirectories with each other. Look at the order of your top-level directories.

  3.2G    0B |- .pyenv
  3.4G    0B |- .rustup
  9.2G    0B |- Work
   16G  4.0K |- Downloads
   17G  183M |- src
   28G    0B |- Music
   34G  183M /- .local
If you don't want the tree output and only want the top directories, you can use `-d 1` to limit to depth=1.

imiric
0 replies
5h46m

Ah, I see, that makes sense.

But still, the second `Videos` directory of 3.7G is a subdirectory of `Music`, so it should appear below it, no? Same for the two `tmp` directories, they're subdirectories of `.local`, so I would expect them to be listed under it. Right now there doesn't seem to be a clear order in either case.

dima55
2 replies
13h37m

Ideas for a better format: do what xdiskusage does.

metadat
1 replies
12h31m

What specifically do you feel xdiskusage does well?

dima55
0 replies
1h23m

It graphically displays the relative sizes of things, and allows you to interactively zoom into any particular subdirectory to see the relative sizes of the things inside it

trustno2
1 replies
12h8m

Does it depend on linux functionality or can I use it on macos?

Well I can just try :)

Ringz
0 replies
10h15m

From the author:

„Note that dut uses Linux-specific syscalls, so it won't run on MacOS.“

timrichard
1 replies
17h5m

Looks nice, although a feature I like in ncdu is the 'd' key to delete the currently highlighted file or directory.

201984
0 replies
16h41m

This isn't an interactive program, so ncdu would be better for interactively going around and freeing up space. If you just want an overview, though, then dut runs much quicker than ncdu and will show large files deep down in subdirectories without having to go down manually.

teamspirit
1 replies
15h21m

Nice job. I've been using dua[0] and have found it to be quite fast on my MacBook Pro. I'm interested to see how this compares.

[0] https://github.com/Byron/dua-cli

201984
0 replies
15h12m

I benchmarked against dua while developing, and the results are in the README. Note that dut uses Linux-specific syscalls, so it won't run on MacOS.

TL;DR: dut is 3x faster with warm caches, slightly faster on SSD, slightly slower on HDD.

jftuga
1 replies
15h26m

You should include the "How to build" instructions near the beginning of the main.c file.

201984
0 replies
15h16m

Done

hsbauauvhabzb
1 replies
17h14m

This looks handy. Do you have any tips for stuff like queued ‘mv’ or similar? If I’m moving data around on 3-4 drives, it’s common where I’ll stack commands where the 3rd command may free up space for the 4th to run successfully - I use && a to ensure a halt on failure, but I need to mentally calculate the space free when I’m writing the commands as the free space after the third mv will be different to the output of ‘df’ before any of the commands have run.

201984
0 replies
16h39m

I haven't run into a situation like that, but if I did, I'd be doing mental math like you. `dut` would only be useful as a (quicker) replacement for `du` for telling you how large the source of a `cp -r` is.

classified
1 replies
6h52m

I get boatloads of "undefined reference" errors. Where's the list of dependencies?

201984
0 replies
4h38m

The only dependency is a recent Linux C standard library. What are the missing symbols? On older versions of glibc you do have to add -pthread.

IAmLiterallyAB
1 replies
15h9m

I'm surprised statx was that much faster than fstatat. fstatat looks like a very thin wrapper around statx, it just calls vfs_statx and copies out the result to user space.

201984
0 replies
14h50m

Out of curiosity, I switched it back to fstatat and compared, and found no significant difference. Must've been some other change I made at the time, although I could've sworn this was true. Could be a system update changed something in the three months since I did that. I can't edit my post now though, so that wrong info is stuck there.

tonymet
0 replies
1h17m

great app. very fast at scanning nested dirs. I often need recursive disk usage when I suddenly run out of space and scramble to clean up while everything is crashing.

tiku
0 replies
9h35m

Ncdu is easy to remember and use, clicking through etc. would be cool to find a faster replacement, same usage instead of a new tool with parameters to remember..

tamimio
0 replies
12h56m

Someone, please create a Gdut, a fork that will produce graphs for a quick and easy way to read, it’s almost impossible to read in small vertical screens.

tambourine_man
0 replies
15h35m

https://dev.yorhel.nl/doc/ncdu2

I wasn't aware that there was a rewrite of ncdu in Zig. That link is a nice read.

sandreas
0 replies
12h37m

Nice work! There is also gdu[1], where the UI is heavily inspired by ncdu and somehow feels way faster...

1: https://github.com/dundee/gdu

rafaelgoncalves
0 replies
13h9m

neat tool, congrats on the release and thank you for this and the analysis/comparison.

nh2
0 replies
12h36m

I don't know why one ordering is better than the other, but the difference is pretty drastic.

I have the suspicion that some file systems store stat info next to the getdents entries.

Thus cache locality would kick in if you stat a file after receiving it via getdents (and counterintuitively, smaller getdents buffers make it faster then). Also in such cases it would be important to not sort combined getdents outputs before starting (which would destroy the locality again).

I found such a situation with CephFS but don't know what the layout is for common local file systems.

mixmastamyk
0 replies
12h57m

Not as featureful, but what I've been using. If you can't install this tool for some reason, it's still useful. I call it usage:

    #!/bin/bash

    du -hs * .??* 2> /dev/null | sort -h | tail -22

mg
0 replies
13h3m

I have this in my bashrc:

    alias duwim='du --apparent-size -c -s -B1048576 * | sort -g'
It produces a similar output, showing a list of directories and their sizes under the current dir.

The name "duwim" stands for "du what I mean". It came naturally after I dabbled for quite a while to figure out how to make du do what I mean.

jmakov
0 replies
13h41m

Would be great to have a TUI interface for browsing like ncdu.

frumiousirc
0 replies
7h7m

dut looks very nice.

One small surprise I found came when I have a symlink to a directory and refer to that with a trailing "/". dut doesn't follow the link in order to scan the real directory. Ie I have this symlink:

    ln -s /big/disk/dev ~/dev
then

    ./dut ~/dev/
returns zero size while

    du -sh ~/dev/
returns the full size.

I'm not sure how widespread this convention is to resolve symlinks to their target directories if named with a trailing "/" but it's one my fingers have memorized.

In any case, this is another tool for my toolbox. Thank you for sharing it.

chomp
0 replies
15h49m

GPLv3, you love to see it. Great work.

bitwize
0 replies
2h54m

Neat, a new C program! I get a little frisson of good vibes whenever someone announces a new project in C, as opposed to Rust or Python or Go. Even though C is pretty much a lost cause at this point. It looks like it has some real sophisticated performance optimizations going on too.

bArray
0 replies
9h24m

I think that 'ls' should also be evaluating the size of the files contained within. The size and number of contained files/folders really does reveal a lot about the contents of a directory without peeking inside. The speed of this is what would be most concerning though.

anon-3988
0 replies
13h9m

I have been using diskonaut, its fast enough given that it also produces a nice visual output.