I was at Google in 2005 on the other side of the argument. My view back then was simple:
Even if $BIG_COMPANY makes a decision to compile everything with frame pointers, the rest of the community is not. So we'll be stuck fighting an unwinnable argument with a much larger community. Turns out that it was a ~20 year argument.
I ended up writing some patches to make libunwind work for gperftools and maintained libunwind for some number of years as a consequence of that work.
Having moved on to other areas of computing, I'm now a passive observer. But it's fascinating to read history from the other perspective.
In what way would you be stuck? What functional problems does adding frame pointers introduce?
It “wastes” a register when you’re not actively using them. On x86 that can make a big difference, though with the added registers of x86_64 it much less significant.
Wasting a register on comparatively more modern ISA's (PA-RISC 2.0, MIPS64, POWER, aarch64 etc – they are all more modern and have an abundance of general purpose registers) is not a concern.
The actual «wastage» is in having to generate a prologue and an epilogue for each function – 2x instructions to preserve the old frame pointer and set a new one up, and 2x instruction at the point of return – to restore the previous frame pointer.
Generally, it is not a big deal with an exception of a pathological case of a very large number of very small functions calling each other frequently where the extra 4x instructions per each such a function will be filling up the L1 instruction cache «unnessarily».
Those pathological cases are really what inlining is for, with the exception of any tiny recursive functions that can't be tail call optimised.
It's not just the loss of an architectural register, it's also the added cost to the prologue/epilogue. Even on x86_64, it can make a difference, in particular for small functions, which might not be inlined for a variety of reasons.
If your small function is not getting inlined, you should investigate why that is instead of globally breaking performance analysis of your code.
Right, but I was asking about functional problems (being "stuck"), which sounded like a big issue for the choice.
It caused a problem when building inline assembly heavy code that tried to use all the registers, frame pointer register included.
I wasn't talking about functional problems. It was a simple observation that big companies were not going to convince Linux distributors to add frame pointers anytime soon and that what those distributors do is relevant.
All of the companies involved believed that they were special and decided to build their own (poorly managed) distribution called "third party code" and having to deal with it was not my best experience working at these companies.
Oh, I just assumed you were talking about Google's Linux distribution and applications it runs on its fleet. I must have mis-assumed. Re-reading... maybe you weren't talking about any builds but just whether or not to oppose kernel and toolchain defaulting to omit frame pointers?
Google didn't have a Linux distribution for a long time (the one everyone used on the desktop was an outdated rpm based distro, we mostly ignored it for development purposes).
What existed was a x86 to x86 cross compilation environment and the libraries involved were manually imported by developers who needed that particular library.
My argument was about the cost of ensuring that those libraries were compiled with frame pointers when much of the open source community was defaulting to omit-fp.
Would it not be easier to patch compilers to always assume the equivalent of -fno-omit-frame-pointer
That was done in 2005. But the task of auditing the supply chain to ensure that every single shared library you ever linked with was compiled a certain way was still hard. Nothing prevented an intern or a new employee from checking in a library without frame pointers into the third-party repo.
In 2024, you'd probably create a "build container" that all developers are required to use to build binaries or pay a linux distributor to build that container.
But cross compilation was the preferred approach back then. So all binaries had a rpath (run time search path to look for shared library) that ignored the distributor supplied libraries.
Having come from a open source background, I found this system hard to digest. But there was a lot of social pressure to work as a bee in a system that thousands of other very competent engineers are using (quite successfully).
I remember briefly talking to a chrome OS related group who were using the "build your own custom distro" approach, before deciding to move to another faang.
You do get occasional regressions. eg. We found an extremely obscure bug involving enabling frame pointers, valgrind, glibc ifuncs and inlining (all at the same time):
https://bugzilla.redhat.com/show_bug.cgi?id=2267598 https://github.com/tukaani-project/xz/commit/82ecc538193b380...
Please name the individuals who are blocking progress on frame pointers. It's such a clear and obvious win that the rest of us should have the opportunity to persuade them. https://news.ycombinator.com/item?id=34660813
The clear and obvious win would have been adoption of a universal userspace generic unwind facility, like Windows has --- one that works with multiple languages. Turning on frame pointers is throwing in the towel on the performance tooling ecosystem coordination problem: we can't get people to fix unwind information, so we do this instead? Ugh.
Yes, although the universal mechanisms that have been proposed so far have been quite ridiculous - for example having every program handle a "frame pointer signal" in userspace, which doesn't account for the reality that we need to do frame unwinding thousands of times a second with the least possible overhead. Frame pointers work for most things, and where they don't work (interpreted code) you're often not that interested in performance.
Yep. That's my proposal.
Yes, it does. The kernel has to return to userspace anyway at some point, and pushing a signal frame during that return is cheap. The cost of signal delivery is the entry into the kernel, and after a perf counter overflow, you've already paid that cost. Why would the actual unwinding be any faster in the kernel than in userspace?
Also, so what if a thread enters the kernel and samples the stack multiple times before returning to userspace? While in the kernel, the userspace stack cannot change --- therefore, it's sufficient to delay userspace stack collection until the kernel returns to userspace anyway.
You might ask "Don't we have to restore the signal mask after handling the profiling signal?"
Not if you don't define the signal to change the signal mask. sigreturn(2) is optional.
This sounds vastly more complex already than following a linked list. You've also ignored the other cost which is getting the stack trace data out of the program. Anyway I'm keen to see your implementation and test how it works in reality.
Efficient things often end up being more complex and supporting more features that brute force approaches. Frame pointers have a hard time letting us interpret managed stack frames, for example, and a simplistic atomic-context in-kernel FP walker will stop traversing the stack if it hits a page that happens not to be resident.
io_uring would be a good candidate --- no-privilege-transmission data flows. Even if you don't want to use it, you can have userspace batch up a few dozen userspace stack collections and flush them to the perf or ftrace event buffer all at once, at regular intervals. Doing so would amortize whatever reporting overhead you have in mind.
Ah, that word "reality", which is the last retort of people who've exhausted their technical arguments.
I propose that a frame pointer daemon be introduced too, for managing the frame pointer signals. We shall modify _start() to open up an io_uring connection to SystemD so that a program may share its .eh_frame data. That way the kernel can still unwind its stack in case apt upgrade changes the elf inode.
Neither of you has identified anything technically wrong with unwinding via signal and neither of you has proposed a mechanism through which we might support semantically informative unwinding through paged-out code or interpreted languages.
Sarcasm is not a technical argument.
I don't need to. Fedora and Ubuntu have already changed their policies to restore frame pointers. As far as I can tell, your proposal is no longer on the table. If you aren't willing to accept the decision, then you should at least understand that the onus is on you now to justify why things need to change.
We have to deal with reality if we want to measure and improve software performance today. The current reality is that frame pointers are the best choice. Brendan's article outlines a couple of possible future scenarios where we turn frame pointers off again, but they require work that is not done yet (in one case, advances in CPUs).
Your argument would be more compelling without the swipe in the final sentence.
Cosmopolitan Libc does frame pointer unwinding once per function call, when the --ftrace flag is passed. https://justine.lol/ftrace/
I think this came off somewhat aggressive. I vouched for the comment because flagging it is an absurd overreaction, but I also don't think pointing out isolated individuals would be of much help.
Barriers to progress here are best identified on a community level, wouldn't you say?
But people, please calm down. Filing an issue or posting to the mailing list to make a case isn't sending a SWAT team to people's home. It's a technical issue, one well within the envelope of topics which can be resolved politely and on the merits.
What area?