return to table of content

The File Filesystem (2021)

RetroTechie
34 replies
22h17m

Useful enough that it should be an OS-level standard feature, imho.

Unix-like OSes allow mounting disk images to explore their contents. But there's many more file formats where exploring files-inside-files is useful. Compressed archives, for one. Some file managers support those, but (imho) application-level is not the optimal layer to put this functionality.

Could be implemented with a kind of driver-per-filetype.

duped
17 replies
21h56m

Really what you'd like to see is a way to write the mount command for each file type (do one thing well) and another command to detect the file type and dispatch accordingly (probably similar to the `file` command), all in user space.

The only thing standing in the way of this today is that MacOS doesn't expose a user space file system API. You can do this on Linux, Windows, and BSDs today.

(No, file provider extensions don't cut it, Apple devs who read this, please give us a FUSE equivalent, we know it exists).

Groxx
15 replies
20h57m

Does https://osxfuse.github.io/ cover this? Or is there some fundamental issue? (beyond "it's not built in")

duped
8 replies
20h33m

Well that requires a kext so it's a nonstarter, and fuse-t uses NFS which is extremely janky and unreliable on MacOS.

The fundamental issue is that macOS doesn't provide an API for this natively.

skissane
7 replies
19h29m

fuse-t uses NFS which is extremely janky and unreliable on MacOS

I was wondering what issues you were talking about, and then I found this - https://github.com/macos-fuse-t/fuse-t/issues/45 - data corruption

The fundamental issue is that macOS doesn't provide an API for this natively.

The API is there, Apple just doesn’t want to give anyone outside of Apple the entitlement that lets them use it. I don’t understand why Apple won’t.

Well, I understand that would require them to document it, ship public headers, and support it for external developers - but why not?

duped
3 replies
16h12m

The API is there, Apple just doesn’t want to give anyone outside of Apple the entitlement that lets them use it.

If no one can call it it's not an API, it's an implementation detail. And I don't even think its exposed by headers, just alluded to by people who claim APFS is implemented in user space.

I was wondering what issues you were talking about, and then I found this

Worse than this, it's possible to DoS a Mac with an NFS server just by refusing to reply to a request. That's unacceptable for a user space file system (although FUSE is only kinda better, in that it can force processes that read from the FS into uninterruptable sleep that prevents them from being killed).

Well, I understand that would require them to document it, ship public headers, and support it for external developers - but why not?

Because Apple doesn't give a fuck about developers. Every developer will eventually learn this, but for those that haven't - Apple doesn't want you writing software for their platform, unless you're an Apple employee and on an Apple team paid to do it. It's why their docs suck, it's why to learn anything you need to watch ADC videos instead of read manpages, and it's why all the cool stuff is behind protected entitlements that you can't get or will be limited in using.

skissane
1 replies
16h3m

Worse than this, it's possible to DoS a Mac with an NFS server just by refusing to reply to a request.

I wonder if their SMB/CIFS client implementation has these kinds of issues? It probably gets used more heavily

And I don't even think its exposed by headers

Apple (accidentally?) released some of the private headers for this feature in one of their open source releases: https://github.com/apple-oss-distributions/msdosfs/blob/rel/...

duped
0 replies
14h54m

Maybe? It's kind of hard to tell. It's not exactly easy to write any of these servers from scratch to find out. But I wouldn't be surprised - they want app developers to be using the file provider extension API, which is unsuitable for everyone who isn't making a Dropbox clone.

That link is very interesting. It doesn't smell like any other Apple API as they're exposing a vtable with good documentation comments. It would be interesting to hack with this with SIP disabled to see how it works. I'm especially curious about how mount/unmount work and how the plugin registers itself with the OS, or what application is the client/host.

mike_hearn
0 replies
9h49m

No, it's almost certainly not because they don't give a fuck about developers. They definitely do.

It's much more likely that they want to:

a. Dogfood the API using internal use cases first when they can still make changes to the API without breaking anything. Note that the latest MacOS releases moved some filesystems into userspace using this new API. They probably learned some stuff by doing that.

b. Work out how to protect system stability from crappy userland filesystems. As you point out, bugs in FUSE providers can hang apps.

c. Work out how such an API interacts with their sandboxing system and how to avoid FUSE-style filesystems being used to subvert the sandbox. This is a common source of exploits in FUSE-style systems and is one of the key learnings from GNU/Hurd: UNIX software is written on the assumption that filing systems aren't malicious and invalidating that assumption creates new bug classes.

d. Work out what the most important use cases are and try to ensure those use cases will have a good or at least uniform UX first.

Providing a FUSE-like API is presumably also just not a high priority. By far the most common use case in terms of number of users is the Dropbox use case. FUSE is mostly used for toys and experiments beyond that (like filefs). Those matter and I'm sure there are friendly geeks on the Darwin team who'd like to enable those, but Linux also works for exploration. Certainly Apple management would not be happy about an engineer who decided to enable nerd experimentation but undermined the security system whilst doing so.

And it's worth remembering that you can have root on macOS. It means disabling SIP and adding a kernel boot arg, but that only takes a few minutes and then you can grant apps any entitlements you like:

https://github.com/osy/AMFIExemption

That's no good for people who aren't developers, but most FUSE filesystems are designed for developers anyway.

nine_k
2 replies
17h58m

why don't they

It would make macOS more of a general-purpose OS, would increase the amount of functionality from which third parties would benefit, but Apple themselves would likely not. That would increase the number and variety of tech support requests, ever so slightly but still, and would introduce a few new attack surfaces.

Instead, Apple's strategy is to tighten the macOS more and more, and turn it into a specialist OS completely controlled by Apple, with a few companies like Adobe and Ableton licensing access to its internals.

skissane
0 replies
17h34m

Apple used to be a lot more developer-friendly company. It is part of what got them where they are now - the fact that so many developers use Macs, which in turn encourages business software vendors to support Macs

Stuff like this is of little interest to ordinary users (at least not directly), but appeals to developers

By de-emphasising the developer is experience, they are undermining one of the factors that got them to where they are today

samatman
0 replies
1h11m

I've been using OSX since 2003, and developing on it for more than ten years. At no point have I seen anything that it's reasonable to call "tightening macOS", let alone the absurd claim of complete control except for an inner circle of elite companies.

The closest thing would be adding the attestation system, so that unsigned binaries have to be explicitly given permission to run... once. That's a security feature which trades a bit of convenience for a lot of protection, especially for the average user. I have no problem with that sort of thing.

I see this sort of sentiment very frequently from non-users of the operating system, but never from those of us who actually use it. Go figure.

skissane
5 replies
20h42m

Recent macOS versions do have a general purpose built-in API for user mode filesystems. That API is incompatible with FUSE. The big problem is it is undocumented and you need an entitlement from Apple to use it, and Apple won’t give you that.

Apple do have a publicly available API for cloud file systems (Dropbox-style products), but it makes a lot of assumptions which makes it effectively unusable for other use cases.

Then there are third party solutions like osxfuse. These have the problem that they rely on kernel extensions and Apple keeps on making those harder and harder, and is aiming to get rid of them; plus, they are all now proprietary licensed, albeit often with a free license for open source use.

One approach that does work without any kernel extensions or private APIs is to make your user filesystem an NFS server and then mount that. One competitor to osxfuse does that, but it also is proprietary

ranger_danger
4 replies
16h56m

According to the MacFUSE author, their specific approach is not actually undocumented:

Apple has put it in an umbrella called "unsupported" (in the kernel interfaces section) ... either Apple will not take this interface away, and if they do, it will be to provide a better interface

http://preserve.mactech.com/articles/mactech/Vol.23/23.03/Ma...

skissane
3 replies
16h9m

Yes, that’s not what I’m talking about though

MacFUSE uses the kernel mode VFS API

I’m talking about the undocumented user mode filesystem LiveFS/UserFS/com.apple.filesystems.lifs API which was added in Monterey (macOS 12), and since Ventura (macOS 13) is used to implement the OOTB FAT and exFAT filesystem support. Using that requires private entitlements (e.g. com.apple.private.LiveFS.connection) which Apple (thus far) won’t give to anyone else

ranger_danger
2 replies
15h36m

What about the File Provider API?

skissane
1 replies
15h21m

It is designed for the cloud storage use case (Dropbox, Google Drive, etc) - it creates local copies of files and synchs them with remote ones. Not what you want to do in the general case.

skissane
0 replies
1h31m

I take it from the downvotes people think what I said is factually wrong?

If that's the case, I wish someone would point out where I've got it wrong instead of just silently downvoting

stuaxo
0 replies
11h4m

I want stuff to work like ZipMagic did in the early 90s/ early 2000s.

You could cd into zip files, they would act as directories and files at the same time.

I seem to remember Linus saying a file could act like a directory in Linux a long time ago too.

Though I don't think Linux has filters for the filesystem like Windows does so implementation might be more tricky.

frizlab
4 replies
22h3m

Honest question: How is this useful? I don’t see any use-case where this would come in handy.

russfink
0 replies
3h44m

Exporting data to some format would be easy.

crabbone
0 replies
1h41m

We used this in Gitlab CI. Unfortunately, the only way they deal with artifacts is by putting them in Zip files. Cache between builds would thus be stored as a Zip file. However, fully extracting it before each build would sometimes take as much, if not more time than to just build fresh. Mounting a Zip file as a filesystem allows extracting entries on-demand, at the time a file access would've been made. This was a notable speedup in our compilation process.

TimeBearingDown
0 replies
21h21m

You could seed compressed archives of massive text files or similar via BitTorrent while making the contents available to your apps in read-only mode.

RetroTechie
0 replies
19h46m

It allows you to use any tool available for regular files, on the files-in-files as well.

As opposed to extract contents and then work on that (requiring extra steps + disk space). Or be limited to what specialized utilities support.

jraph
3 replies
21h56m

It exists :-)

For zip archives, there are fuse-zip and mount-zip which are FUSE filesystem.

As an intermediate between OS level and application-level, there are desktop environment level: gvfs for GNOME and KIO for KDE, but they are compatible only in their own ecosystems.

lambdaxyzw
1 replies
7h31m

Would be nice to have something that integrates with 7z - it supports a lot of weird archive types, including "weird" ones I care about (for example PE files, better known as ".exe files").

russfink
0 replies
3h42m

Or zstd. I have some dd blobs of partitions, the blobs are zstandard-compressed, would like to mount them.

ramses0
0 replies
17m

ratarmount for tar files.

lmm
2 replies
20h14m

This was a core design feature of reiserfsv4, but Linux ultimately refused to merge it, probably not helped by the whole murdering-his-wife thing.

skissane
1 replies
19h24m

This was a core design feature of reiserfsv4, but Linux ultimately refused to merge it

IIRC, because it contained these strange beasts which functioned as both files and directories - i.e. cat would return data, but then you could cd into them and run ls. Linus (among others) didn’t want to permit those violations of the file-directory dichotomy into the Linux kernel.

stuaxo
0 replies
10h54m

Oh that's funny- I remember a much much earlier Linus mentioning how this would he possible in Linux, I didn't know anyone actually did it.

I think you really should be able to "cd" into any kind of structured data.

xk3
0 replies
2h14m

Compressed archives, for one

You can look inside of archives pretty easily with `lsar` (part of the unar package). It works with disk images like ISO 9660 files too

But yes, especially for nested archives, having deeper OS support would be nice.

kybernetikos
0 replies
20h45m

It's not exactly the same, but nushell provides ways of exploring inside files.

crabbone
0 replies
1h44m

I thought archivemount already did that. Am I missing something?

Anyway, even if that's not what you are looking for, FUSE is a more general mechanism that will allow you to do what you want (well, it seems like, at least) and much more.

Sophira
0 replies
13h36m

This already exists - avfs[0] does this as a FUSE filesystem. It's not the most intuitive to use, but it works, and is extensible.

[0] https://avf.sourceforge.net/

paulgb
15 replies
21h16m

This is really neat, but when I saw the headline I got excited that it was something I have been looking for / considering writing, and I figure the comments here would be a good place to ask if something like this exists:

Is there a FUSE filesystem that runs in-memory (like tmpfs) while mounted, and then when dismounted it serializes to a single file on disk? The closest I can find are FUSE drivers that mount archive files, but then you don't get things like symlinks.

khc
5 replies
20h48m

does it have to be fuse? cant you mount a disk image with loopback

andrewflnr
2 replies
20h40m

Wouldn't a disk image have a fixed size? It could be a pain to resize.

ranger_danger
1 replies
20h37m

qcow2 is auto-expanding

metadat
1 replies
20h42m

But will the disk image be fully stored in memory? No.. not with loopback. Either that, or it won't be mutable in memory with commit on unmount.

generalizations
0 replies
18h23m

Put the disk image inside a ramdisk and it's in memory. Write a script for saving to physical disk when dismounting, and you're done.

ranger_danger
4 replies
20h38m

I can't think of anything _exactly_ like that, but I think you can get close by just copying some type of image file to /tmp and then moving it to disk when you're done after unmounting.

AgentME
3 replies
19h50m

/tmp isn't stored in memory; it's usually a normal on-disk filesystem that's cleared regularly. You want /dev/shm instead, which is a purely in-memory filesystem on normal Linux systems.

throwway120385
1 replies
19h41m

The point they were trying to make is that it doesn't have to be, and it isn't in several of the Linux systems I've used over the years. Assuming that it is is a bad idea.

arjvik
0 replies
16h37m

/dev/shm always is though

Scaevolus
1 replies
20h41m

Not purely in-memory, but something like https://github.com/jrwwallis/qcow2fuse maybe? It's clunky compared to OSX's DMGs, but if you squint it achieves similar ends.

Otherwise you could achieve this with a tmpfs wrapped to serialize to a tarball (preserving symlinks) when unmounted.

ranger_danger
0 replies
20h31m

Oh nice, I didn't even know that existed. I've been using qemu-nbd and parted by hand and it gets cumbersome, so this might help a lot. Thanks!

speps
0 replies
20h37m

Closest I found: https://github.com/guardianproject/libsqlfs

The libsqlfs library implements a POSIX style file system on top of an SQLite database. It allows applications to have access to a full read/write file system in a single file, complete with its own file hierarchy and name space. This is useful for applications which needs structured storage, such as embedding documents within documents, or management of configuration data or preferences. Libsqlfs can be used as an shared library, or it can be built as a FUSE (Linux File System in User Space) module to allow a libsqlfs database to be accessed via OS level file system interfaces by normal applications.
hnlmorg
0 replies
12h9m

Why does it have to be in memory?

I’m sure you’re already aware of this, but there are all kinds of very real scenarios that could lead to corrupted data if you’re only flushing the buffer upon unmounting.

Sounds like you’ve got an interesting problem you’re trying to solve though.

yjftsjthsd-h
7 replies
1d

I would gently suggest naming it filefs or something; ffs already means https://man.freebsd.org/cgi/man.cgi?ffs(7)

That said - good idea/approach; seems like an excellent way to cleanly extend the unix approach to structured file formats:)

galkk
2 replies
23h8m

Op’s ffs do not target FreeBSD and it l seems like referenced system is FreeBSD only. Claiming naming rights is a stretch here

yjftsjthsd-h
0 replies
22h32m

FFS predates FreeBSD and is in some capacity supported by all 3 major BSDs. I'm fairly confident that Linux actually supports it through the ufs driver ( https://github.com/torvalds/linux/tree/master/fs/ufs ); whether the use of different names in different places makes it better or worse is an exercise for the reader.

eichin
0 replies
22h41m

That's ahistorical. FFS is the berkeley Fast File System, from BSD 4.2, in 1983.

irusensei
1 replies
2h54m

Yeah please rename it to JFS (JSON File System). Oh wait...

yjftsjthsd-h
0 replies
1h15m

There is a reason I suggested "filefs"; 3 chars isn't really enough to easily be unambiguous.

tambourine_man
0 replies
23h5m

There's another expansion to that acronym that I can think of. I think the joke is implied.

isoprophlex
5 replies
1d

For fuck's sake! Not everything needs to be a file!

(This is a joke. I love the idea and execution.)

vidarh
3 replies
22h41m

Daniel Stenberg, of cURL fame, co-write an Amiga-editor called FrexxEd where the open buffers were exposed as files in the filesystem.

Meaning you could write any shell script to manipulate an open buffer (not that important as it also exposed all editor functionality both via IPC via Arexx and via FPL - a C-like scripting language), and that you could e.g. compile without saving (that was very helpful on a system where a lot of people might only even have a single floppy drive, and where being able to have the compiler in the drive and compile straight from the in-memory version in RAM so you didn't have to keep swapping floppies was highly useful (just remember to save before actually trying to run the program - no memory protection...)

salgernon
1 replies
22h4m

Classic MacOS in the 80s had "MPW" macintosh programmers workshop - that treated open text windows as files and selections within the windows as files, so it wasn't uncommon to have a portion of an otherwise documentation file have a "click here and hit enter", which would use the selected text as stdin for some semi-ported unix tool. (no memory protection or multitasking, so true pipelines with backpressure didn't work)

bombcar
0 replies
18h11m

BBEdit has something akin - you can select text and run a Unix command on the selection via temp files. Very useful

fsckboy
0 replies
16h34m

I want a webbrowser that does that, lets me shell-cd into each tab as a directory

mypalmike
0 replies
22h45m

I see what you did there.

amiga386
2 replies
23h23m

All you need now is a giant pile of rules for which revisions to select and you have the unholy demon that is Rational ClearCase

skissane
0 replies
19h31m

I always thought Oracle ADE was a cooler demon. Shame the internal talk about productising it never went anywhere.

TheGlav
0 replies
22h38m

What!? you didn't add a versioned database layer on a server with code stored in clearcase that stored those ClearCase config specs to manage the configuration of your config specs to manage the configuration of your version control system that had your application configuration in it?! How did you even operate? /s

ThinkBeat
2 replies
22h37m

No XML, Excel,PDF or CSV support yet.

IlliOnato
1 replies
15h16m

I guess support for XML would be tricky, because XML is just way more complex format than the ones already supported. It is still essentially a tree, but with additional structure.

Representing elements and their contents is easy enough. But attributes, comments, processing instructions, entities... And remember, an XML document can include a DTD (it does not have to be in a separate file).

To present it as a file system in a useful, non-convoluted way? I will be very, very interested if it's possible, but not holding my breath.

eyelidlessness
0 replies
14h16m

On the one hand, I can’t help but point out you forgot to mention the other big inherent complexity that would make XML-as-FS a uniquely complex beast: namespaces.

On the other hand, I can’t help but point out that a related technology comes very close to demonstrating how you might map XML to a file system: XPath. Probably the biggest issue would be syntax, and again largely due to namespaces.

PMunch
2 replies
11h53m

Oh this is cool! I recently wrapped libfuse in Nim and after porting the 'hello' filesystem example I made one which is more or less exactly this. However my version you pipe data and have to provide a mountpoint, then when it's done it writes the result over stdout. That means you can inline it in a pipe chain but also that you have to make sure to grab the output.

At the moment I'm exploring other stuff which could be made into file systems. I've got a statusbar thing for the Nimdow window manager which allows you to write contents to individual files and it creates a bar with blocks on them as the output. It makes it super easy to swap out what is on your bar which is pretty neat.

Another tool I've made is a music player. It uses libvlc and when given a folder it reads all the media with ID3 tags and sets up folders like 'by-artist', 'by-album', etc. Each file is named as '<track number> - <song title>' and contains the full path to the actual file. To play a song you cat one of these files into 'control/current' and write the word play to 'control/command'. There's a bit more to it like that like a playlist feature and some more commands, but that's the basic idea. The goal is to have a super-scriptable music player.

lambdaxyzw
0 replies
7h24m

This makes me think, it would be nice if there was an easy built-in way to expose information about a process using the filesystem. Something like "cat /proc/$pid/fs/current_track" to get a name of a current song from a music player, or "ls /proc/$pid/fs/tabs" to list open tabs in my browser (and maybe use this to grab the html or embedded images).

I mean right now it's possible to do this using FUSE, but that's convoluted and nobody does it.

fishyjoe
0 replies
1h40m

Would you mind sharing the Nim code? I've been interested in working with FUSE for a while, and use Nim for a few projects.

No worries if not, I'm just curious!

waldrews
0 replies
15h32m

Could this be used in Windows by exposing a Samba file share from WSL or Docker?

timrobinson333
0 replies
10h4m

It's an interesting idea but I think the usefulness would be greatly enhanced if it could handle json arrays; most needed json structures contain array elements in my experience

secwang
0 replies
18h29m

remind me of djb's envdir

sambeau
0 replies
23h46m

Ha. I did this back in 2003. It's surprisingly fast, and makes it simple to do granular locking.

I used it as a per-user database for a web-templating language for a giant web-site building tool.

qazxcvbnm
0 replies
24m

What happens if your JSON key has a slash?

purple-leafy
0 replies
11h11m

When I saw the title I thought it was a meme.

But wow what a clever idea. Not sure id ever need to reach for it personally as I do most data processing in a higher level language, but I can imagine people can find use cases.

Nice out of the box thinking

planede
0 replies
2h13m

Hmm, this opens the possibility to also commit these files as directory structures. I wonder how this would affect merges and conflicts.

freeney
0 replies
1d

This looks awesome, I need to give it a try asap. I can very well see myself using this to navigate or search inside JSON files

chuckadams
0 replies
22h12m

Neat. Now how about a filesystem that takes a directory of files and exposes it as a single json file? You could call it the Filesystem File, and mount it in the File Filesystem if you wanted...

agumonkey
0 replies
20h44m

All I see is a generic tree walk mechanism, here implemented in folders/files, but in plan9 it's .. plumber ?