I also finally learned how signals work from top to bottom, and boy is it ugly. I’ve always felt that this was one of the weakest points in the design of Unix and this project did nothing to disabuse me of that notion.
Would love any resources that goes in more details, if any HN-er or the author himself knows of some!
If you haven't already, I would start with Advanced Programming in the Unix Environment by Stevens
https://www.amazon.com/Advanced-Programming-UNIX-Environment...
It is about using all Unix APIs from user space, including signals and processes.
(I am not sure what to recommend if you want to implement signals in the kernel, maybe https://pdos.csail.mit.edu/6.828/2012/xv6.html )
---
It's honestly a breath of fresh air to simply read a book that explains clearly how Unix works, with self-contained examples, and which is comprehensive and organized. (If you don't know C, that can be a barrier, but that's also a barrier reading blog posts)
I don't believe the equivalent information is anywhere on the web. (I have a lot of Unix trivia on my blog, which people still read, but it's not the same)
IMO there are some things for which it's really inefficient to use blog posts or Google or LLMs, and if you want to understand Unix signals that's probably one of them.
(This book isn't "cheap" even used, but IMO it survives with a high price precisely because the information is valuable. You get what you pay for, etc. And for a working programmer it is cheap, relatively speaking.)
I believe this was the 3rd time I’ve seen this book being recommended this week. It must mean something.
It is a must for anyone serious about UNIX programming.
Additionally one should get the TCP/IP and UNIX streams books from the same collection.
Is the Unix streams book “Unix Systems V network programming”?
That one is also relevant, yeah.
Although, I did a mistake, I was thinking about all Richard Stevens books for networking, that go beyond plain TCP, UDP, IP.
https://en.wikipedia.org/wiki/W._Richard_Stevens
Unfortunelly given their CS focus, they are kind of on the expensive side, I read most of them via libraries, or eventually getting my own copies.
It's been the standard reference for decades for a reason. I learned from it, too. There's really nothing else quite like it available.
It's well written and full of practical advice and fun to read.
It might mean the Baader–Meinhof effect.
Not positive, but pretty sure that this, and the Unix Network book were golden for us in the 90s when we were writing MUDs. Explained so much about Socket communications (bind/listen/accept,...) Been a long time since I looked at that stuff, but those were fun times.
I believe that's the book I still have on my shelf. IIRC "UNIX Network Programming" and I learned a lot about networking and a lot about how UNIX works reading it cover to cover. I think I learned more from that book than any other.
Mr Stevens replied to something I wrote back in the day. I can't recall if it was a Usenet post or email, but I was over the moon!
Signals are at the intersection of asynchronous IO/syscalls, and interprocess communication. Async and IPC are also weak points in the original Unix design, not originally present. Signals are an awkward attempt to patch some async IPC into the design. They're prone to race conditions. What happens when you get a signal when handling a signal? And what to do with a signal when the process is in the middle of a system call, is also a bit unclear. Delay? Queue? Pull process out of the syscall?
If all syscalls are async (a design principle of many modern OSes) then that aspect is solved. And if there is a reliable channel-like system for IPC (also a design principle of many modern OSes) then you can implement not only signals but also more sophisticated async inter-process communication/procedure calls.
As I wrote in some older discussion about UNIX signals on HN, the root problem (IMHO, of source) is that signals conflate three different useful concepts. The first is asynchronous external events (SIGHUP, SIGINT) that the process should be notified about in a timely manner and given an opportunity to react; the second is synchronous internal events (SIGILL, SIGSEGV) caused by the process itself, so it's basically low-level exceptions; and the third is process/scheduling management (SIGKILL, SIGSTOP, SIGCONT) to which the process has no chance to react so it's basically a way to save up on syscalls/ioctls on pidfds. An interesting special case is SIGALRM which is an asynchronous internal event.
See the original comment [0] for slighlty more spellt out ideas on better designs for those three-and-a-half concepts.
[0] https://news.ycombinator.com/item?id=39595904
At least the first two are also conflated in a typical CPU’s trap/interrupt/whatever-your-architecture-calls-it model, which is what Unix signals are essentially a copy of. So this isn’t necessarily illogical.
SIGHUP and SIGINT have no CPU-level equivalent.
Sure. What I meant is, a CPU’s trap/interrupt mechanism is very often used to signal both problems that arise synchronously due to execution of the application code (such as an illegal instruction or a bus error) and hardware events that happen asynchronously (such as a timer firing, a receiver passing a high-water mark in a buffer, or an UART detecting a break condition). This is not that far away from SIGSEGV vs SIGHUP.
Some things (“imprecise traps”) sometimes blur the difference between the two categories, but they usually admit little in the way of useful handling. (“Some of the code that’s been executing somewhere around this point caused a bus error, now figure out what to do about it.”)
A story about the problem with delivering interrupts to a process in kernel mode in unix:
https://www.dreamsongs.com/RiseOfWorseIsBetter.html
IPC was actually introduced in "Columbus UNIX."
https://en.wikipedia.org/wiki/CB_UNIX
"signalfd is useless" is a good article: https://ldpreload.com/blog/signalfd-is-useless
It goes into the problems with Unix signals, and then explains why Linux's attempt to solve them, signalfd, doesn't work well.
That is a good article. I found myself nodding in agreement while reading it, thinking "Yeah, I've been bitten by that before".
How does Windows handle this? There's still signals, but I believe/was under the impression that signals in Windows are an add-on to make the POSIX subsystem work, so maybe it isn't as broken (for example, I think it doesn't coalesce signals).
Windows has a slightly better concept: Structured Exceptions (https://learn.microsoft.com/en-us/windows/win32/debug/struct...). It is a universal concept to handle all sorts of unexpected situations like divide by zero, illegal instructions, bad memory accesses... For console actions like Ctrl+C it has a separate API which automatically creates a thread for the process to call the handler: https://learn.microsoft.com/en-us/windows/console/handlerrou... . And of course Windows GUI apps receive the Window close events as Win32 messages.
Normal windows apps doesn't have a full POSIX subsystem running under them. The libc signal() call is a wrapper around structured exceptions. It is limited to only a couple well-known signals. MSVCRT does a bunch of stuff to provide a emulation for Unix-style C programs: https://learn.microsoft.com/en-us/cpp/c-runtime-library/refe...
In contrast to Unix signals, structured exceptions can give you quite a bit more information about what exactly happened like the process state, register context etc. You can set the handler to be called before or after the OS stack unwinding happens.
I am such a moron. Every one of those three links above is colored as 'visited' for me.
I have obviously read this up before and just didn't remember :-(
Unix signals do... a lot of things that are separate concepts imo, and I think this is why there are people who don't like it or take issue with it.
You have SIGSTOP/SIGCONT/SIGKILL, which don't even really signal the process, they just do process control (suspend, resume, kill).
You have simple async messages (SIGHUP, SIGUSR1, SIGUSR2, SIGTTIN, SIGTTOU, etc) that get abused for reloading configuration/etc (with hacky workarounds like nohup for daemonization) or other stuff (gunicorn for example uses the latter 2 for scaling up and down dynamically). There's also in this category bizarrely specific things like SIGWINCH.
You also have SIGILL, SIGSEGV, SIGFPE, etc for illegal instructions, segmentation violations, FP exceptions, etc.
And also things that might not even be good to have as async things in the first place (SIGSYS).
---
As an aside, it's not the only approach and there's definitely tradeoffs with the other approaches.
Windows has events, SEH (access violations, other exceptions), handler routines (CTRL+C/CTRL+BREAK/shutdown,etc), and IOCPs (async I/O), callbacks, and probably some other things I'm forgetting at the moment.
Plan 9 has notes which are strings... which lets you send arbitrary data to another process which is neat, but it using the same mechanism for process control imo has the same drawbacks as *nix except now they're strings instead of a single well-defined number.
The Windows mechanisms you're mentioning were also added over the course of many, many years. Much of Windows also happened a long time after UNIX signals were invented.
If you're including all that other stuff, it's probably fair to include all of the subsequent development of notification mechanisms on the UNIX side of the fence as well; e.g., poll(2), various SVR4 IPC primitives, event ports in illumos, kqueue in FreeBSD, epoll and eventually io_uring in Linux.
Yeah, it definitely is (especially since SIGIO is a thing :)). Even the Unix signals had more added to them over time (SIGWINCH and friends iirc came from the BSDs).
A lot of the mechanisms are very OS specific but I do think they're good comparisons to have with signals as well.
Except much of these UNIX later development were done by their derivatives and are often available with certain degree of incompatibility among them (or not even at all)
There were differences between BSD and SYSV signal handling that were problematic in writing portable applications.
https://pubs.opengroup.org/onlinepubs/009604499/functions/bs...
It's important to remember that code in a signal handler must be re-enterant. "Nonreentrant functions are generally unsafe to call from a signal handler."
https://man7.org/linux/man-pages/man7/signal-safety.7.html
reentrancy is not sufficient here - at least that provided by mutex style exclusion. the interrupted thread may have actually been the one holding the lock, so if the signal handler enters a queue to wait for it, it may be waiting quite a while
That's why the word reentrant is used, not thread safe.
I always felt VMS' mailbox system was much more elegant, but I imagine it's an ugly mess under the surface too.
https://wiki.vmssoftware.com/Mailbox
I like Plan 9's notes: http://man.postnix.pw/9front/2/notify
I wanted to say the exact same thing! I would love to get more details about that.
Would love to read a blog post about that.