HN comments for: I summarized my understanding of Linux systems

dfc

24 replies

2d8h

2024-03-14 10:04:11 UTC

My mental model of Linux does not have the CPU/Memory in user space. What am I missing?

vbezhenar

15 replies

2d8h

2024-03-14 10:10:01 UTC

Userspace program directly uses CPU and memory (unless you're using VM). In contrast to that, your userspace program does not directly access your network device or SSD, but uses kernel routines to access those indirectly.

sophacles

6 replies

2d4h

2024-03-14 14:09:22 UTC

It doesn't directly access memory. The addresses in your userspace program are not the actual addresses of memory in the ram sticks - there is a table of mappings that the kernel sets up. When your process asks the kernel for memory, it says "i need 5KB, please put that at address XYZ". The kerenel goes and finds 5KB unused, probably at some other address ABC, and creates a mapping in the table that says XYZ translates to ABC. Then the kernel sets the MMU of the CPU to use the table for your process, and switches back to unpriveleged mode, letting your process run again. Your process in unpriveleged mode sends an instruction to write to the memory at XYZ, but the cpu will translate that instruction to ABC and write there instead.

VMs (well not emulated vms, but if you're doing an x86 vm on x86 or an arm vm on arm) do something similar - an inaccurate (but useful for the concept) way to think of it is that the cpu does 2 layers of MMU for user processes in a vm.

The kernel code isn't running directly when your program accesses memory, but it sets up the cpu so that the kernel still has control over your memory, and you only have access to what the kernel allows - its mediated by the kernel.

wrs

5 replies

2d2h

2024-03-14 15:40:00 UTC

I think the point of the diagram is that the abstractions or “API” that user space gets to use includes memory it can read and write directly, and a CPU to execute instructions. Of course in reality there only “appears to be” memory and a CPU, but that’s why it’s an abstraction. Just like there “appears to be” a filesystem for user space to use, when in reality there’s a block interface to a disk, or wherever you want to draw the line.

sophacles

4 replies

2024-03-14 17:59:14 UTC

I think what I was getting at was that memory sort of sits in-between.

My instructions are executed directly on the cpu. My file reads and writes directly translated from a stream of bytes to block instructions by code in the kernel.

Memory is a wierd in-between place, or maybe a 3rd option, since the kernel has to run a bunch of code on my behalf for me to use memory, sort of like the filesystem thing, but I'm using the direct hardware units afterward, sort of like instructions.

vbezhenar

3 replies

1d5h

2024-03-15 12:53:00 UTC

CPU is pre-configured by OS as well. So IMO it's the same as memory. May be it would be more appropriate to say that userspace program accesses directly some parts of CPU and RAM.

With modern computers, AFAIK even OS does not have absolutely full control over CPU.

sophacles

1 replies

1d2h

2024-03-15 16:12:07 UTC

But they don't directly access memory. They access an address that may be translatable by the mmu to a physical location in ram. They may also write to an address that the kernel hasn't allocated a page for yet, but that the kernel has agreed to map into the process' memory. In this case the kernel handles the trap and maps a page of actual ram (etc) and then the process continues forward progress.

vbezhenar

0 replies

9h1m

2024-03-16 09:24:41 UTC

That's just the way CPU works. It has nothing to do with kernel or userspace. Kernel code will behave identically.

justsomehnguy

0 replies

23h36m

2024-03-15 18:49:25 UTC

With modern computers, AFAIK even OS does not have absolutely full control over CPU.

It's even more amusing with Type 2 hypervisors.

akira2501

4 replies

1d22h

2024-03-14 20:12:43 UTC

Ring 3 is not "directly using the CPU." And mmap is not "directly using the memory."

suprjami

3 replies

1d21h

2024-03-14 21:24:34 UTC

Hardware ring has nothing to do with "directly using the CPU", it controls what access level the program has.

Forget virtualisation. Compile a userspace program which just adds numbers into a stack variable. That program is running directly on the CPU in the unprivileged ring.

A userspace program in a VT-x virtual machine is exactly the same.

If those programs attempt privileged access then that access will fail and a trap is raised. That's what the CPU ring controls.

akira2501

2 replies

1d20h

2024-03-14 21:52:20 UTC

Hardware ring has nothing to do with "directly using the CPU"

Why wouldn't it? Several features are simply not available in ring 3. Several features are configured for you in a way you cannot change. Several instructions will simply fault your program.

which just adds numbers into a stack variable

Yes.. and when you eventually overflow that stack, what happens? How did the stack segment selector get created? Can you change that selector or it's attributes? Can you set the stack pointer to any valid memory address you like?

A userspace program in a VT-x virtual machine is exactly the same.

What does an IOMMU do?

If those programs attempt privileged access then that access will fail and a trap is raised.

Right.. so you are not directly using the CPU. You're not even in control of what timeslices are afforded to you by the OS. You are in an exceptionally limited environment most of which you cannot control or alter and much of which you cannot even observe.

The fact that instructions get dispatched according to the system ABI when you run a program is not material to this problem, and in particular, is not at all correctly represented by this diagram.

suprjami

0 replies

1d8h

2024-03-15 09:37:00 UTC

You don't have adminstrative access to all of Hacker News. Therefore, you are not really on Hacker News. This is your logic.

Veserv

0 replies

1d18h

2024-03-15 00:02:03 UTC

You are directly using the CPU, you just do not have full access to the entire CPU. There is no userspace ALU that your numbers get crunched on, there is no userspace register file your working set is stored in (actually, they might do that internally, but logically there is no such distinction). You are in a hotel room. Just because you can not stomp around in the ducts does not mean you are not directly using the hotel room, you just have limited access to the rest of the hotel.

persnickety

2 replies

2d7h

2024-03-14 10:26:15 UTC

If you're using a hypervisor, then the userspace program inside the VM is also using the CPU and memory directly. You'd have to do full emulation to avoid that.

Even with full emulation, I'd say memory is being accessed directly, unless you really go out of your way to make it weird.

icedchai

1 replies

2d1h

2024-03-14 16:31:30 UTC

With full emulation, I'd argue memory access is not direct. Memory access from the emulated system will go through user space code in the emulator. That code may translate it to actual memory access, or perhaps an emulated, memory mapped I/O device like a frame buffer. Either way, there is something in the middle.

You could argue that nothing is direct unless you're running on a bare metal system, no MMU, no page tables. How do you define "direct"?

persnickety

0 replies

1d12h

2024-03-15 06:09:03 UTC

For the definition I used earlier, the bytes don't change shape in between. That makes MMU and interpreted access direct, and compression indirect.

suprjami

7 replies

2d7h

2024-03-14 10:44:00 UTC

Nor does mine.

Userspace assembly runs directly on the CPU* executing in the unprivileged ring. When the userspace program makes a system call by calling a kernel entry function which is mapped into the process's address space by the dynamic loader, then part of that entry into kernelspace is to put the CPU into the privileged ring and kernel assembly then runs on the CPU.

The process scheduler can stop execution to kick a task off the CPU and switch to another one, depending on OS and kernel some things can be kicked off the CPU and some cannot.

Userspace memory allocations are serviced by virtual memory where the page tables track the translation of virtual memory pages into physical memory pages using the MMU.

The kernel is involved during allocation and page fault, but iiuc a regular successful virtual memory access is a hardware operation only.

I don't have a diagram of how this works. Neither processes nor memory are my usual area of kernel.

You'd be better to read the x86 version of the XV6 book to learn how this stuff really works. It's really well written and implements enough to be tangibly useful. Reading the code is optional when just learning concepts. Reading the XV6 code will hopefully help you understand the O'Reilly Linux books better, which will hopefully help you understand the actual Linux kernel better.

(*yes I'm aware CPUs don't directly execute assembly anymore, but the microcode guarantees the observable CPU state at any Instruction Pointer matches the expectation as if you were running assembly on a PDP or C64, or close enough for 99.999% purposes and definitely enough for debugging your program in gdb)

Koshkin

5 replies

2d6h

2024-03-14 12:23:34 UTC

execute assembly

I always thought that assembly was a type of programming language.

ooterness

1 replies

2d3h

2024-03-14 15:04:31 UTC

Ben Eater's excellent educational video series includes one explaining the difference:

https://www.youtube.com/watch?v=oO8_2JJV0B4

In short, assembly for a given CPU is very nearly one-to-one with the machine language for that CPU. It's not correct to conflate the two, but close enough when speaking informally.

deaddodo

0 replies

1d16h

2024-03-15 02:10:22 UTC

While it's true that assembler doesn't quite fit into the same class as compiled/interpreted languages, it would be disingenuous to say it's not a programming language. It's simply a very low-level, machine specific one.

It's even blurrier when you consider that most modern assembler dialects have plenty of high level functionality (structs, macros, labels, etc) that do not correlate to machine instructions.

suprjami

0 replies

1d21h

2024-03-14 21:17:06 UTC

Here I'm referring to assembly as mnemonic for machine code, but yes it would have been more correct to say machine code instead.

pertique

0 replies

2d5h

2024-03-14 12:36:53 UTC

It is. From my understanding, CPUs execute machine code. Assembly has to be passed through an assembler to get machine code, and that assembler can make other changes as well, so they are not always one to one. Written assembly will usually translate veryclosely to machine code, though.

marcosdumay

0 replies

2d3h

2024-03-14 14:38:03 UTC

The GP has a slightly weird language, but mixing assembly and machine code in informal speech isn't rare at all.

imetatroll

0 replies

17h42m

2024-03-16 00:44:06 UTC

Do you happen to have any other suggestions about reading?

I am currently reading "Asynchronous Programming in Rust: Learn asynchronous programming by building working examples of futures, green threads, and runtimes" and there is vague talk about how cpus process things, but it really would like to know more. I am even curious about what is actually happening in hardware. It seems hard to determine where to start.

I don't have formal education in this field unfortunately.

cookiengineer

23 replies

2d9h

2024-03-14 08:36:30 UTC

I can recommend taking a look at /proc and /sys, because that will clear up a lot of how things are intertwined and connected.

procfs is what's used by pretty much all tools that do something with processes, like ps, top etc.

The everything is a file philosophy becomes much more clear then. Even low level syscalls and their structs are offered by the kernel as file paths so you can interact with them by parsing those files as a struct, without actually needing to use kernel headers for compilation.

eBPF and its bytecode VM are a little over the top, but are essential to known about in the upcoming years cause a lot of tools is moving towards using their own bpf modules.

Cloudef

22 replies

2d9h

2024-03-14 08:49:26 UTC

The everything is a file philosophy becomes much more clear then.

To be honest, everything is a file is kind of a lie in unix. /proc and /sys are pretty much plan9 inspiration.

arghwhat

19 replies

2d8h

2024-03-14 10:07:48 UTC

A more accurate term is that everything is a file descriptor.

The main difference is that plan9 uses read and write for everything, whereas Linux and BSD uses ioctls on file descriptors for everything.

vbezhenar

17 replies

2d8h

2024-03-14 10:12:52 UTC

Everything is a descriptor. When I'm opening a TCP connection, there's no file, so calling it a file descriptor feels wrong.

And at that point, the whole "everything is" turns into nonsense, because yes, everything is a pointer to something, so what.

arghwhat

16 replies

2d7h

2024-03-14 11:05:08 UTC

There are named files (which have a file path) and anonymous files (which do not). You can see these in /proc/$PID/fd/$FD if you're curious - when the link doesn't start with '/', it's anonymous. Even process memory is just an anonymous file on Linux, and arguably a cleaner one as it operates on proper fds, instead of plan9 where a string "class name" (not a path) is used to access the magical '#g' filesystem.

The difference to plan9 is not the files, but the way plan9 uses text protocols with read/write to ctl files. To open a TCP connection - if memory serves me right - you first have to write to a ctl file, which creates a new file for the connection. Then, you write the dial command to the ctl file of that connection, and after which you can open the connection file. On Linux, a syscall creates an anonymous file, and then everything after is operations on this anonymous file.

There's some ideological benefits, but plan9 creates a mess of implicit text protocols, ugly string handling, syscall storms and memory inefficiencies. Their design is pretty much solely a limitation caused by the idea that all filesystems should exist through the 9p protocol, which as a networked protocol cannot share data (fds, structs), only copy (payloads to/from read, write). With the idea that all functionality had to be replaceable and mountable from remote machines, the only possible API became read/write for everything.

I'd argue that fd-using syscalls and ioctls - basically per-file syscalls - is a superior approach to implement everything-as-a-file.

Cloudef

11 replies

2d6h

2024-03-14 11:29:45 UTC

Whichever superior depends on your use case and needs. Plan9's approach is very powerful whenever you need anything distributed, and makes lots of boilerplate to achieve that basically unnecessary. Linux nowadays is flexible for both approaches (in theory, the ecosystem might not be there), and I'm glad user namespaces are a thing.

candiodari

6 replies

2d6h

2024-03-14 12:02:48 UTC

There's some ideological benefits, but plan9 creates a mess of implicit text protocols, ugly string handling, syscall storms and memory inefficiencies.

On the other hand, linux ioctl and syscalls have infinite binary structs you need to know (and cannot let the compiler reorder fields in), which then doesn't make cross-platform development any easier.

arghwhat

4 replies

2d3h

2024-03-14 14:55:27 UTC

Having to know structs is not really an issue - you also need to know text formats, JSON schemas, what not.

Re-ordering of structs is always forbidden with the binary format being strictly specified, so there's nothing to worry about there. Can't exactly shuffle bytes in a text format either, and plan9 control strings tend to have positional arguments.

The current structs do leave something to be desired though.

candiodari

1 replies

2d1h

2024-03-14 17:00:23 UTC

Really? The issue is that those structs don't cross-compile correctly if you aren't very careful in their use. I've personally managed to write an IRC bot that had a select loop that worked fine on i386 but didn't on amd64 until I figured out what was happening to the size of the structure behind it.

arghwhat

0 replies

1d8h

2024-03-15 09:37:12 UTC

The kernel decided to have a different uAPI on different architectures. This is not just struct size that might differ, but the fields available and their possible values. Imagine a JSON field being a numeric constant in one arch, but a string in another. I get why they did it, but it certainly causes... surprises.

I imagine the issue you experienced might have been either that you hardcoded the numeric size instead of using sizeof and then building on a different architecture, or that you redeclared the struct for one architecture, ignoring that you were building for another architecture.

Not to undermine the annoyance or surprise-factor of your bug, I am tempted to say that it falls into generic logic bugs rather than the fault of any system. Changing to an encoded with dynamic length for example would just introduce different classes of logic bugs, like the whole series of possible issues within JSON parsing/serialization and requiring dynamic buffers.

A struct on the other hand only requires referencing the right struct declaration - the only amount of care you need is to include and use them. In my opinion, this is the maximum possible convenience for such system.

akdev1l

1 replies

1d23h

2024-03-14 19:11:43 UTC

Structured json data that you mentioned does allow to “shuffle bytes”

It doesn’t matter which order the client sends {“a”: 1, “b”: 2 }, it’s one object with “a” and “b” regardless

arghwhat

0 replies

1d21h

2024-03-14 20:36:11 UTC

Do the same to a JSON array.

Position having meaning is a common part of most data representations, not something specific to structs. If you need to, you can also engineer order independence in structs with arrays of type and value unions, but there's no need to as the order has never been a problem.

Cloudef

0 replies

2d6h

2024-03-14 12:04:57 UTC

I was quite disappointed by the ABI differences between different architectures when I was doing network transparent uinput.

https://git.cloudef.pw/uinputd.git/tree/common/packet.h#n34 https://git.cloudef.pw/uinputd.git/tree/server/uinputd.c#n16

(Excuse to write code to use my PS Vita as gamepad :D)

arghwhat

3 replies

2d6h

2024-03-14 11:52:01 UTC

Now, proper plan9-style namespaces, that I miss. :)

User namespaces are still a hell of a lot clunkier than "each process inherits its parents' namespace".

Cloudef

2 replies

2d6h

2024-03-14 12:01:22 UTC

If you setup user namespace, the child processes will inherit that namespace. The difference is that plan9 is fully built on this idea and isn't multi-user, on linux you have to opt-in to this. It's very useful and underused (mostly used by containers). I wanted to ship my AWS lambdas this way, but sadly AWS lambdas don't allow user namespaces.

https://github.com/aws/aws-lambda-base-images/issues/143

moody__

1 replies

2d2h

2024-03-14 15:37:19 UTC

Plan 9 is very multi-user, namespaces are actually one component in how it is multi user specifically. A given terminal may be designed to only have one user on it but CPU servers are still multi-user multiplexers, that part has not been given up.

Cloudef

0 replies

2d2h

2024-03-14 15:39:34 UTC

Yes, by multi-user i meant, plan9 doesn't have the unix/linux user model, but rather "multi-user" is done with the namespaces. Container people would be more familiar with the plan9 way.

moody__

1 replies

2d2h

2024-03-14 15:32:30 UTC

I am not exactly sure where you got the "class name" part from, I've typically refereed to those as kernel filesystems or sharp devices. For the record these kernel filesystems are not technically 9p, they present an interface much similar but reads and writes to them are not marshaled and unmarshaled from 9p. It is however possible to export their files over 9p if one desires, I can import a remote machines /net stack and use it to announce or dial out. Plan 9 gives us proto-VPNs just be virtue of its design.

There was perhaps a time where the differences between having everything in binary ioctls and bound to specifically one device was a necessary component in order to reach reasonable performance, but I don't believe that is the case anymore. Anecdotally these days everything on Plan 9 feels snappier. We have some benchmarks that show that 9front outperforms linux with naive pipe io and context switches. What Plan 9 misses in micro optimizations it makes up for by having a incredibly consistent and versatile base.

I want to reiterate the benefits of the network transparency by talking about how drawterm works. Drawterm can be thought of the plan 9 equivalent of windows RDP. How it works is that internally drawterm creates routines to expose a /dev/draw, /dev/mouse and /dev/keyboard through whichever native way there is on the target system (macos, windows, linux, etc). It then attaches to the remote system and overlays these files over a namespace. Programs like our window manager rio can then be run completely transparently, forwarding not compressed images, but individual draw RPC messages. There is no need for any special code on the plan 9 host side in order to accommodate drawterm, again it is something that just falls out of the core design of the system.

Cloudef

0 replies

2d2h

2024-03-14 15:38:04 UTC

Even on linux people avoid syscalls because syscalls are slow and bad. So I don't really see the problem with plan9's approach either. Make the common scenario useful, optimize for special cases (sendfile, io_uring). In fact read/write lets you batch bigger amount of data than single ioctl can actually be more performant.

zozbot234

0 replies

22h25m

2024-03-15 20:00:32 UTC

Their design is pretty much solely a limitation caused by the idea that all filesystems should exist through the 9p protocol, which as a networked protocol cannot share data (fds, structs), only copy (payloads to/from read, write). With the idea that all functionality had to be replaceable and mountable from remote machines, the only possible API became read/write for everything.

It's not clear to me that 9p itself could not be extended to allow for shared memory. With low-level control over the operating system and rebuilding of existing binaries, distributed shared memory becomes a possibility. (I.e. the existing VM system ought to be enough to implement whatever cache coherence is needed for shared memory over the network.)

MisterTea

0 replies

1d20h

2024-03-14 21:56:46 UTC

magical '#g' filesystem.

Whats magical about the segmet(3)[1] device? The '#' devices are kernel file servers. There's no magic.

[1] http://man.9front.org/3/segment

sph

0 replies

2d3h

2024-03-14 14:50:38 UTC

Relevant talk by Benno Rice, "What UNIX cost us": https://youtu.be/9-IWMbJXoLM?si=OblWX3OMXWrSFinb

wolletd

1 replies

2d8h

2024-03-14 09:56:20 UTC

Also, a lot of devices require very specific ioctl() commands to work with and don't provide everything as a file.

For example, you can't set the baudrate of a serial port by writing it to some /proc node.

arghwhat

0 replies

2d8h

2024-03-14 10:10:14 UTC

nit: The ioctl syscall targets files just the same as the write syscall.

Everything being a file, and everything being read/write calls are different things. There's pros and cons.

tmalsburg2

7 replies

2d5h

2024-03-14 12:54:57 UTC

I learned a lot about this from the book "The design of the Unix operating system" by Maurice J. Bach.¹ It's an old book and many details deviate from actual present-day Linux, but it nonetheless gives a great overview of the key components and ideas.

¹ https://books.google.de/books/about/The_Design_of_the_UNIX_O...

guerrilla

5 replies

2d4h

2024-03-14 14:19:03 UTC

This is one of my favorite books. A true classic. There are follow-ups in that style for Linux and FreeBSD as well. I think Robert Love wrote the former.

temeya

2 replies

2d3h

2024-03-14 14:57:34 UTC

And Marshall Kirk McKusick wrote the latter, "The Design and Implementation of the FreeBSD Operating System"

guerrilla

0 replies

2d1h

2024-03-14 16:51:26 UTC

Thanks, pretty sure that's the one I meant.

deaddodo

0 replies

1d16h

2024-03-15 02:06:03 UTC

I used this book as a primary reference for OS design (along with The Dinosaur Book) when designing my hobby OS.

The FreeBSD kernel/world is almost exquisite in it's engineering simplicity. Compared to the sometimes chaotic world of GNU/Linux (in my experience).

bigfatfrock

1 replies

2d2h

2024-03-14 15:57:30 UTC

Thank you! I was going to ask for the latest linux variant - are you speaking of Love's "Linux Kernel Development", or "Linux in a Nutshell"?

I've been a primary linux user for a couple decades now but I'm not too keen on digging into kernel hacking but love details like the OPs post.

guerrilla

0 replies

2d1h

2024-03-14 16:50:58 UTC

The Kernel Development book. It gives a tour. Read the UNIX one first though.

cpach

0 replies

2d3h

2024-03-14 14:27:51 UTC

Seems to be available on the Internet Archive: https://archive.org/details/DesignUNIXOperatingSystem

timeforcomputer

3 replies

2d6h

2024-03-14 12:11:23 UTC

Nice! I want to do something similar and map my understanding of Linux. I find some diagrams on Wikipedia fascinating (example: https://en.m.wikipedia.org/wiki/File:Linux_Graphics_Stack_20..., but more to do with the user library ecosystem rather than kernel and program runtime). These diagrams make me want to learn about each part and be able to comprehend in principle what is happening on my machine. Eventually...

Jasper_

2 replies

2d2h

2024-03-14 16:12:46 UTC

Any diagram by ScotXW on Wikipedia is somewhere between misleading and completely wrong, and they're a constant pain on the Linux graphics community.

If you're curious about the details in this case, ScotXW confuses EGL and OpenGL, the arrows aren't quite right, and the labels aren't quite right either (DRM is labeled "hardware-specific" but KMS isn't? The label for "hardware specific Userspace interface to hardware specific direct rendering manager" is quite weird), and some important boxes are flat out missing. It's nitpicking for sure, but when the diagram goes out of its way to add extremely weird details, it demands nitpicking.

Nobody in the Linux graphics community would draw the diagram like this.

hn_user82179

1 replies

17h53m

2024-03-16 00:32:14 UTC

I remember when Wikipedia first became popular, there were a lot of warnings about how you couldn't trust the information because "anyone could edit it". I feel like, at least to my level of understanding whatever I'm reading about, it's been sufficient and I've never identified something wrong/inaccurate (except for perhaps recent news or recently debated political topics). This is the first time that I've seen that downside of Wikipedia, as I use it for understanding things like this and never would've known that the diagram I was learning from was wrong. Thanks for commenting this, it's good to know

timeforcomputer

0 replies

17h4m

2024-03-16 01:21:51 UTC

The specific article is: https://en.wikipedia.org/wiki/Free_and_open-source_graphics_.... I can understand the "multiple issues" section here, this seems super-technical in a certain way in comparison to what I usually see on Wikipedia (although wk does get very technical on very specific isolated things in e.g. math, usually non-theory tech pages are a summary), but I still found it motivating to dig into Linux. I wouldn't be surprised if it was removed.

I love wikipedia and I read to procrastinate by reading the articles for everything I am interested in so I have found quite a lot of factually incorrect information or statements of fact which are really philosophical opinions. However usually these problems coexist with a certain change in writing style (loss of formatting, grammatical errors, random capitalizations, change of tone, etc.). I find I haven't found many problems with content written in the usual "wikipedia" style, so I assume the hardcore wk editors who enforce this style care a lot about factual accuracy. However I don't read enough outside of wikipedia so I wouldn't know if everything is correct...

(actually now that I think about, maybe I am more likely to agree with things in the wikipedia style. But I think the style errors and factual errors are at least a bit correlated.)

peter_d_sherman

3 replies

2024-03-14 17:31:29 UTC

A future simple linux-like (or unix-like) OS -- could theoretically be created with only 4 syscalls:

open() read() write() close()

Such a theoretical linux-like or unix-like OS would assume quite literally that "everything is a file" -- including the ability to perform all other syscall/API calls/functions via special system files, probably located in /proc and/or /sys and/or other special directories, as other posters have previously alluded to...

Also, these 4 syscalls could theoretically be combined into a single syscall -- something like (I'll use a Pascal-like syntax here because it will read easier):

FileHandleOrResult = OpenOrReadOrWriteOrClose(FileHandle: Integer; Mode: Integer; pData: Pointer; pDataLen: Integer);

if Mode = 1 then open();

if Mode = 2 then read();

if Mode = 3 then write();

if Mode = 4 then close();

FileHandle is the handle for the file IF we have one; that's for read() write() and close() -- for open() it could be -1, or whatever...

Mode is the mode, as previously enumerated.

pData is a pointer to a pre-allocated data buffer, the data to read or write, or the full filename when opening...

(And of course, the OS could overwrite it with text strings of any error messages that occur... if errors should occur...)

pDataLen is the size of that buffer in bytes.

When the Mode is open(), pData contains a string of the path and file to open.

When Mode is read(), pData is read to, that is, overwritten.

When Mode is write(), pData is used to write from.

All in all, pretty simple, pretty straightforward...

A "one syscall Linux or Unix (or Linux-like or Unix-like) operating system", if you will... for simplicity and understanding!

(Andrew Tannenbaum would be pleased!)

Related: "One-instruction set computer" (OISC): https://en.wikipedia.org/wiki/One-instruction_set_computer

zzo38computer

0 replies

1d21h

2024-03-14 21:02:27 UTC

I had considered that too, but what I had also considered, and that I think is better, is a different single syscall, which is more like a actor model or like a capability-based system. (One problem with the "everything is a file" like Plan9 is that then the operating system has to parse the file paths every time you want to do any I/O; what I describe below ignores that problem since you can link directly to objects instead.)

A process has access to a set of capabilities (if it does not have any capabilities, then it is automatically terminated (unless a debugger is attached), since there is nothing for the program to do).

A "message" consists of a sequence of bytes and/or capabilities. (The message format will be system-independent (e.g. the endianness is always the same) so that it works with emulation and network transparency, described below.)

A process can send messages to capabilities it has access to, receive messages from capabilities it has access to, create new capabilities (called "proxy capabilities"), discard capabilities, and wait for capabilities.

Terminating the process is equivalent to a mandatory blocking wait for an empty set of capabilities; discarding all capabilities also terminates the process. A non-blocking wait for an empty set of capabilities means that you wish to yield, so that other processes get a chance to run, before this process continues.

Some further options may be needed to handle multiple locking and transactions, and to avoid some kinds of race conditions, but mostly that is just it.

This is useful for many things, including sandboxing, emulation, network transparency (this can be done by one program keeping track of which capabilities need to be sent across the network link and assign an index number to each one, and then the other end will create a proxy capability for each index number and use that number when it wants to send back), security with user accouts, etc; the kernel does not need to know about all of these things since they can be implemented in user code.

Other things (outside of the kernel) can also be implemented in terms of proxy capabilities, and I had ideas about those other parts of the operating system too, for example it has a hypertext file system (with no file names, but files can contain multiple numbered streams, which can include both bytes and links to other files (which can be either to the current version or to a fixed version; if to a fixed version then copy-on-write will be used if the file is modified)), and the "foreign links table", and a common (binar) data format, and a command shell with some similarities than Nushell (but also many differences), and the system uses the "Extended TRON Code" character set, and details about the working of the package manager and IME and window manager, etc.

samatman

0 replies

1d21h

2024-03-14 20:47:49 UTC

You've effectively reinvented 9p here. Which is good!

There are some differences which may interest you: https://9fans.github.io/plan9port/man/man9/intro.html

I think you may find that some of the additional complexity of 9p is necessary, but perhaps not all of it.

richardwhiuk

0 replies

1d22h

2024-03-14 20:06:48 UTC

That's already kind of how syscalls work - you shove the syscall number in a register, and then call an interrupt.

knorker

3 replies

2d3h

2024-03-14 14:33:15 UTC

Whenever I've made notes like this, it's never been useful to my future self nor to anyone else.

The only use I've had of this kind of documentation is that the process of writing it, made me understand it better. Basically write-only documentation.

I would call myself a Linux expert, and while I can kinda see what you mean with this diagram, it would not have been useful to me back before I was an expert.

persolb

0 replies

2d3h

2024-03-14 15:22:10 UTC

It almost resembles mind mapping. It is a useful ‘process’ to figure out what you think/know. And it might be a pretty picture. But it isn’t very useful as documentation.

falserum

0 replies

1d21h

2024-03-14 20:49:57 UTC

I found it useful. Allowed me to compare if I have similar idea to the author.

codelobe

0 replies

1d21h

2024-03-14 21:14:16 UTC

Usually I would agree. I typically make a "Crash-Course in $PLATFORM" document while keeping notes. These I very commonly reference in order to externalize my memory since it seems to be approaching capacity. I don't care about Ruby on Rails, but once I did, and I can reference my notes if I ever need to touch that platform again.

xwowsersx

2 replies

18h13m

2024-03-16 00:12:54 UTC

Could someone suggest hands-on resources for learning about kernels, such as a book or series on writing your own kernel? I'd like to gain a deeper understanding of their workings and I think hands-on or project based learning would help.

hnthrowaway0328

1 replies

18h3m

2024-03-16 00:23:03 UTC

osdev would help. It's not a book but a website.

xwowsersx

0 replies

18h1m

2024-03-16 00:24:19 UTC

Taking a look, it has a ton of resources. Thanks!

smitty1e

2 replies

2d10h

2024-03-14 08:12:56 UTC

I think it needs three areas, not two:

1. User space

2. Kernel

3. Hardware/network

The kernel protects users from hardware, and hardware from users.

topspin

0 replies

2d9h

2024-03-14 08:27:52 UTC

This is reasonable and correct. I would also have found places in that map for: dcache, block devices, character devices, scheduler, page cache and console/tty/pty. The first two replace "filesystem hierarchy". The second and third are ancient and fundamental classes of UNIX devices.

t1tos

0 replies

2d9h

2024-03-14 08:30:51 UTC

this is analagous to the fs hierarchy: root protects from the user

richardwhiuk

2 replies

2d6h

2024-03-14 12:16:29 UTC

I don't understand what the boxes on this diagram are meant to represent.

It feels like an elaborate mechanism to draw something wrong in the hopes people will correct it.

sevagh

0 replies

2d1h

2024-03-14 17:06:25 UTC

Interview prep.

projektfu

0 replies

2d2h

2024-03-14 15:41:05 UTC

FWIW, I also don't really understand what the boxes are supposed to represent, given that the arrows represent dependencies like PID <-- process. I thought a PID was an attribute of a process?

To me, a block diagram might show [CPU Scheduler], [Virtual Device Manager], [VFS Manager], [Memory Manager], [Interrupt Handlers], etc...

Of course, my knowledge of Linux internals is limited and perhaps it has a separation of the concept of PID and process where there is a literal dependency.

thesuperbigfrog

0 replies

2d5h

2024-03-14 13:18:21 UTC

"The Linux Programming Interface" by Michael Kerrisk is one of the best technical resources I have found and used to understand Linux:

https://man7.org/tlpi/

Description from the book's website:

"The Linux Programming Interface (TLPI) describes system programming on Linux and UNIX.

TLPI is both a guide and reference book for system programming:

If you are new to system programming, you can read TLPI linearly as an introductory guide: each chapter builds on concepts presented in earlier chapters, with forward references kept to a minimum. Most chapters conclude with a set of exercises intended to consolidate the reader's understanding of the topics covered in the chapter.

If you are an experienced system programmer, TLPI provides a comprehensive reference that you can consult for details of nearly the entire Linux and UNIX (i.e., POSIX) system programming interface. To support this use, the book is thoroughly cross referenced and has an extensive index."

pjmlp

0 replies

2d3h

2024-03-14 15:17:46 UTC

UNIX IPC is kind of missing, streams, pipes, message boxes, shared memory.

SUN RPC for NFS, yellow pages,...

begueradj

0 replies

2d4h

2024-03-14 13:56:35 UTC

Why a UML book is listed as a reference ?