return to table of content

Who killed the network switch? A Hubris Bug Story

orf
51 replies
19h31m

Hubris is really, really nice. I've spent half an hour reading some of the kernel code and it’s exceptionally clear and well written - a far cry from ifdef macro soup, two letter variable name loving, comment starved C code I’ve seen previously. A good bit of bedtime reading!

I recommend leafing through it: https://github.com/oxidecomputer/hubris/blob/b44e677fb39cde8...

hinkley
45 replies
18h49m

It bothers me deeply how much of C ethos can be boiled down to, "We can't be bothered to learn to type at a reasonable speed"

Disk space for source code hasn't really been a problem for forty years (binaries definitely), and yet we are still being stingy with variable names.

xorcist
13 replies
8h15m

Identifiers aren't short in C (or any other language) because disk storage is expensive but because readability suffers with long identifiers.

The example here is very much C-style (short, lowercase, underscore). The list of argument is called "args" (short and descriptive). It is not called argumentListArray.

It is not a coincidence that math have very short and well known identifiers. The rate of change in horizontal position is called x'. It is not called posHorizontalChangeRate simply because the latter is harder to read.

There is, of course, such a thing as having too short identifiers.

repler
11 replies
8h2m

The list of argument is called "args" (short and descriptive)

You’re proving the original point - it should be named “arguments” because that is what it is.

Saving … 5 bytes? by naming it “args” instead is exactly OP’s point.

CogitoCogito
10 replies
7h49m

I don’t personally find “arguments” any clearer than “args”. Not sure what’s gained by calling it arguments other than 5 extra characters.

bluGill
7 replies
7h29m

How long haye you been programming? You got used to the short name and now it is a word to you.

krisoft
3 replies
5h39m

I don't think args is the right variable name to pick for this argument. As you say it is what that thing is called idiomatically. I think it is fine.

It is more about if your code needs to parse an internationaly formated phone number from a string into a struct. (or pick anything else specific only to your problem domain) How do you call that function? Do you call it prsphi? (prs for parse, ph for phone number and i for international) Or do you call it parse_international_phone_number? Or maybe international_phone_string_to_phonenumber_struct? (or something similar in CamelCase)

That is where the difference starts to matter. And I for one don't want to read code with many prsphi's around.

cesarb
1 replies
4h8m

Do you call it prsphi? (prs for parse, ph for phone number and i for international) Or do you call it parse_international_phone_number?

That's a false dichotomy; you could call it for instance parse_int_phonenum, abbreviating "international" and "number" to the still understandable "int" and "num" (and smashing together "phone number" into "phonenum" since it's a single concept in this code base). It's nearly half the length of your suggestion, while still being understandable at a glance.

krisoft
0 replies
1h8m

Hah :D I was confused why would you change the signature to parse integers when clearly I said it was parsing strings. Turns out you were not calling it "parse integer" but "parse international". I guess that's a point against that name then?

bluGill
0 replies
4h33m

If you work wit the code enough prsphi becomes as idiomatic as args. The question is which should be allowed to become idiomatic in the first place. Arguments is of course common enough across a lot of different projects that you can make a stronger argument that perhaps it should - and in fact it might be better as nobody thinks of the dictionary definition: "A discussion in which disagreement is expressed; a debate." when they see args.

I have never worked with phone numbers outside of a class assignment so prsphi is not something I would understand. However if I was in a codebase that worked with phone numbers often I'd probably get to know it and like the shortness.

CogitoCogito
2 replies
5h9m

That same argument applies to the word “argument”. What an “argument” is in the context of a function call is significantly more complex than the shorthand of “argument” vs “arg”.

In any case, I can remember when I started programming and no I didn’t have any problem remembering that arg is short for argument. I’ve seen acronyms and shortening of words all my life before programming after all. However I do remember being confused by the concepts argument or keyword argument in the abstract.

Short said, I really think this is making a mountain of a molehill. If someone doesn’t know what it means the answer is “arg is short for argument” and then you move on.

bluGill
1 replies
4h30m

If someone doesn’t know what it means the answer is

Who do they ask? They have just interrupted someone else's flow to ask - worse they might ask the wrong person and interrupt more than one persons flow in getting an answer. There is a high cost to even trivial questions and the more of them someone needs to ask the worse.

The point is there is a balance you need to find that balance.

CogitoCogito
0 replies
4h12m

If someone can’t deal with such totally standard and easily discoverable terminology, they don’t stand a chance. If they can’t ask their coworkers questions when confused, they (and the company) don’t stand a chance. You are far beyond the point of “balance” here.

I take pedagogy very seriously and work very hard for both experienced and inexperienced people to understand what I do, but this is just ridiculous. This is a non-issue.

eigenket
1 replies
7h10m

Someone else gave these same examples but I find them compelling. Do you find atoi, strrchr or srand similarly clear? What about mbstowcs?

CogitoCogito
0 replies
5h14m

I wouldn’t call them similarly clear no. But that’s not really to pertinent to the question of args vs. arguments.

p_l
0 replies
6h20m

Identifiers in C are short because expectations set by standard library which evolved when C had hard limit of 6 characters for externally visible symbols.

tjoff
10 replies
18h24m

If it bothers you so much I'm happy to tell that that is not the reason for it.

spc476
7 replies
17h40m

Could you spend some effort to explain the reason then?

refulgentis
4 replies
16h45m

Fwiw I've written some Real C++, think, audio encoders/decoders for voice recognition for 1B MAUs on the server, audio encoder on 2B devices.

Each and every time there was just insane excuses that boiled down to only a few people wanted to write C++ and they found infinite ways to justify not doing things, like all us humans do.

My favorite was the "don't write comments because as soon as you do they're out of date."

Even though this is recognizably trollish tripe at the extreme it was implemented, I still find it hard to ignore that brainworm in my day to day coding.

Retric
2 replies
16h38m

I think that’s a common mutation of the far more useful and therefore relevant “Don’t trust comments” which everyone learns at some point.

citrin_ru
1 replies
11h7m

“Don’t trust people” is a heuristic to avoid being scammed. But it is much nicer to live in a society where you can trust most of the people. The same way it is nice to work with code base where you can trust comments by default. Which is the case for many open source projects.

Retric
0 replies
8h30m

Personally it’s always the programs you can generally trust that end up reinforcing the idea. On average you may be far better off trusting comments but that one time they fail you is all the worse because of that trust.

It’s the same deal with compilers and computer hardware. They work so well that they become a blind spot, and yet I’ve had both fail me.

dspillett
0 replies
7h56m

> My favorite was the "don't write comments because as soon as you do they're out of date."

A better version of that is “if the code and comments disagree, there is a good chance that both are wrong”.

If code and comments get out of sync, at best someone has been careless, and you need to be on the lookout for more carelessness in changes made at the time. At worst changes have been made without fully understanding what is going on, so there is a huge can of edge-case worms about to burst open.

cesarb
1 replies
6h36m

The reason is readability, not disk space or typing speed. When your screen resolution is 24 lines of 80 characters (a common screen resolution back when C became popular), having longer variable names means statements often have to be split over several lines, and less code fits on the screen at the same time. Even today, having shorter lines makes it easier to see several versions of the code side-by-side (for instance, when doing a merge, it's not unusual to see three versions of the code at the same time).

p_l
0 replies
4h8m

The reason was C compilers having little memory to work with thus limit of 6 characters for identifiers.

throwaway11460
0 replies
17h50m

Who cares though? Just use an IDE or Copilot or something.

IggleSniggle
0 replies
14h57m

Ah, yes, we are all elucidated once again. Thanks for the ctx

StressedDev
8 replies
13h28m

C (the language) does not cause people to write unmaintainable or hard to understand code. Programmers who choose to write unmaintainable or hard to understand code are the problem. I have seen great C code, good C code, mediocre C code, etc. It really depends on who is writing this. This is true of all languages.

In short, technology cannot fix people problems.

ninkendo
1 replies
7h6m

C (the language) does not cause people to write unmaintainable or hard to understand code

Well then it’s good that OP didn’t claim that C the language causes people to write such code. They said “The C ethos”, not “The C language.” It’s not about the language’s technical requirements, it’s about what’s idiomatic in a language, how it’s taught, and what style is used by the vast majority of the existing corpus of code written in that language.

Look at the C standard library’s function names, vsnfprintf/strdupa/acosh/ftok… Compare it to something like Objective-C at the other extreme, where method and variable names tend to always have fully spelled out identifiers with no abbreviations and a full description of what’s being done (`- [NSString stringByAppendingString]`, etc.)

Is it due to some technical requirement? Is stringByAppendingString illegal in C because it’s too long? Is strdup illegal in ObjC because it’s too short? Of course not! But why do we see this everywhere so consistently? Why does C have short indecipherable function names and ObjC have such long ones, if the language doesn’t require it?

Because idioms matter. If you’re learning C, you’re learning the way it is typically taught. You’re reading other C code. You’re encouraged to program the way other C programmers program. You’re likely using the standard library a lot. Likewise ObjC.

This means two things:

- Yes, in a very obvious sense, it’s not the language’s fault, it’s the programmers’s fault.

- But also, paradoxically, it is the language’s fault, because a language is not just a set of syntax in a vacuum, it’s also a corpus of existing code, a set of idioms, a community of people, and a way of thinking of things. C absolutely causes people to write hard-to-understand code when viewed through this lens.

Your comment has been written in so many ways in so many threads discussing programming languages, it’s absolutely tiring. Yes, you can write terrible code in any language, and you can write great code in any language (well almost any… probably not brainfuck.) Nobody is arguing that. When we discuss whether one language is more readable than the other, we’re always talking about the qualities of typical code you actually see in a language, and about what style of code that language encourages.

p_l
0 replies
6h22m

It used to be a language limitation, first practical, later codified for portability. Originally (C89) it was 6 characters for anything externally visible, these days (C99) it's 63 ASCII characters for internal identifiers and 31 characters for external identifiers (implementation can allow longer, but does not have to - those are the minimal significant i.e. preserved identifier lengths).

Same reason why Pascal has symbols limited to 10 characters and doesn't preserve case - because original implementation mapped identifiers into 10 6-bit characters packed into 60 bit word.

Similarly some of the oldest Common Lisp functions retain aspects of encoding characters to fit into 36 bit words efficiently.

hnfong
1 replies
7h45m

It depends whether you include the standard library into consideration.

atoi()? strrchr()? srand()?

I’d argue the obtuseness of the standard library function names at least influence the legibility of the programs written against them.

pif
0 replies
3h50m

the obtuseness of the standard library function names

I suppose you are too young to remember that back in the day there was no IDE helping you with autocompletion, thus one character less in the name is one character less to type.

Furthermore, I'm ready to bet you are from the USA, and you forgot that most of the world (even the programming world) does not speak English natively, thus "SeedRandom" is NOT clearer than "srand": it's just as cumbersome and longer to type while reading character by character on a paper manual.

hnaccount_rng
1 replies
12h8m

That’s a statement that comes up a lot. And while it is technically true, it really isn’t in practice. We can and do design systems to expose less of their surfaces that represent open knifes and more of the safe handle side. It is near universally true, that the easier to access way is the one that will be taken more often.

To be fair to C, it was designed long before we as a society really appreciated that. So there is a lot of old code whose authors just couldn’t know better. And some (probably few) will even have managed to write readable code! But that doesn’t mean that languages can’t encourage good behavior and discourage bad. (I’m not even sure where I come down wrt Rust here) And that will (somewhat reliably) end in better or worse code. Not necessarily in a reliable way. But it’s e.g. really hard to write python code where local control flow isn’t reasonably obvious (since scoping is enforced by white space). Not impossible of course

pastage
0 replies
9h54m

There are lots of example of hard to understand Python local control flow, Python is a long way from being consistent in encourge good behaviour. When comparing the same type of code let say arg parsing C is usually worse. In Python you can get lost in library hell looking at the details of that code but in C it's just knowledge of basic functions of the language how ever terse it might be. On avarage I agree.

pjc50
0 replies
6h44m

This comes up every time someone advocates for C, and it's basically a refusal to learn anything about process and safety culture since the 1950s. "Poka-yoke" was invented in the 1960s. The programming equivalent, the use of type systems and other proof systems to automatically detect or avoid errors, is strongly resisted by C developers, who seem to want to keep writing CVEs.

npteljes
0 replies
4h45m

While that's true, what languages have is culture. Formatting conventions, names of built-in stuff, books that teach the basics, expectations of devs that already know the language, and the newbies are expected to work with. It's often not on the programmer, but rather on the complex interactions between the programmers.

hughesjj
2 replies
15h51m

I get that we as an industry went a bit overboard when Java +ooo was in vouge novelty with entire sentences being method names, but I'm with you that all the people I see making super compressed var names with contexts longer than a for loop initializer

And don't even get me started on trying to text to speak your `mCtx` or whatever

OJFord
1 replies
15h27m

And don't even get me started on trying to text to speak your `mCtx` or whatever

I'm not saying TTS is there (I have no idea) but from an ideal point of view, to have it be equivalent, surely it would read 'm context', which then isn't so bad (maybe m is clear from the ..err.. context).

ElFitz
0 replies
10h30m

You could run a LLM pass to "expand" abbreviations and acronyms before running it through text to speech.

Although more expensive, it worked wonders for me with text containing units of measurement (°F, °C, m/s, etc).

dspillett
1 replies
7h47m

> It bothers me deeply how much of C ethos can be boiled down to, "We can't be bothered to learn to type at a reasonable speed"

I can understand it from way back when. People were moving from assembly and other languages where everything was terse by requirement so variable/function/other names were short out of habit, though even then your assembly should have had comments about what was in each register & why etc. Also, comments were relatively terse because people were working on small screens and in extreme cases were concerned about the size of code files. None of this is adequate excuse for doing things wrong now, decades later, but such habits have momentum: people with the habits write books and local style guides, and also short names are baked into the standard library, so new people are “infected” by the habits, and so it rolls forward.

cesarb
0 replies
6h46m

Also, comments were relatively terse because people were working on small screens

The standard screen size was 24 lines of 80 columns, and you couldn't expect a reader of your code to have a terminal with higher resolution than that. So the standard is to make your code fit in 80 columns; using longer identifiers means statements will end up taking multiple lines, which means less code can fit on the screen, making it less readable.

Even today, terminal emulators like Gnome Terminal still open by default with 24 lines of 80 columns, so there's still value in making code readable in that resolution (though nowadays, the main advantage is making it easier to read code side-by-side in multiple windows).

anilakar
1 replies
10h25m

Most of the bad habits stem from how C is taught.

It should not be the first language freshmen are taught in universities. All the effort will be wasted on teaching basic programming instead of the actual language and good programming habits.

Pointers and the memory model are core concepts and should be taught early on. The worst programming books I have seen introduce them in the very last chapter, often called "advanced language features" or something similar.

It is not a compiled scripting language for Unix systems either and should not be programmed with a text editor on a remote machine. The programmer should have a proper development environment installed locally.

Valgrind and and Clang's sanitizers should be used extensively. There is no such thing as partial credit when it comes to memory correctness.

If a certain QuakeNet IRCop is reading this, thank you. Your uni C course managed to avoid all the pitfalls above.

pif
0 replies
3h45m

should not be programmed with a text editor on a remote machine. The programmer should have a proper development environment installed locally.

"should not", "They should have" ... well, it had to! And they didn't have.

There is no point in trying to forget where C comes from.

postmodest
0 replies
2h43m

It bothers me deeply how much of C ethos can be boiled down to, "We can't be bothered to learn to type at a reasonable speed"

"My computer is a LITERAL TELETYPE, so I will enshrine typographical parsimony in every part of this OS/Language I am creating" -K&R

pif
0 replies
4h43m

Code quality has zero correlation with typing speed.

cdchn
0 replies
13h45m

Its funny because the article even talks about the importance of reducing space for binaries.

akira2501
0 replies
17h31m

how much of C ethos can be boiled down to

A lot of commercial code is like this, regardless of the language; there just happen to be people trying to sell new languages. They desperately want to convince you that this is all the fault of C and it's nebulous "ethos," and if you just use their new language, it will all magically go away and possibly improve human rights somehow.

nebula8804
4 replies
15h59m

AI might kill this convention when everyone pipes their old crusty C code and all of a sudden all the variables are clean and named in the fashion the user loves best (because the AI has learned exactly the quirks of how a specific coder likes it).

chc4
2 replies
14h30m

AI might also make unicorns real and usher in world peace, while we're wishing for things.

TeMPOraL
1 replies
11h33m

Unicorns are merely an engineering problem tho (specifically, genetic engineering).

dxdm
0 replies
10h25m

Unicorns are merely an engineering problem tho (specifically, genetic engineering).

So is world peace, while we're saying things that contain the word "merely". :)

aitchnyu
0 replies
12h49m

One dev uses an autoformatter (black for Python) for his codebase. When he wants to do side by side diff on a laptop, he reduces line width to 70 characters and formats his entire codebase, works with his diff tool, then reformats back to original width.

aus10d
8 replies
20h32m

The work Oxide is doing is truly amazing

brcmthrowaway
7 replies
19h22m

First tailscale, now Oxide is the darling of projects 99% of people will never need

pstuart
0 replies
16h43m

I'm assuming that 1% will make their efforts worthwhile. It is impressive regardless of whether or not you're going to use it.

p_l
0 replies
3h45m

I can understand Oxide (though current vmware apocalypse might make their appeal bigger than expected), but Tailscale?

Tailscale's biggest proposition is that their VPN software is, arguably, the easiest I've ever managed. There systems that are better integrated in some environments, systems that provide various niche features, etc.

But Tailscale is simplest to use.

maybeben
0 replies
19h14m

I look forward to scoring some beefy Milan boxes with weird rails for cheap on ebay because the firmware only loads OpenSolaris.

breakingcups
0 replies
10h35m

Not only is that not really relevant, but I'd argue Tailscale is actually very useful even to many people who don't feel like they currently "need" it.

aus10d
0 replies
18h42m

You don't have to "need" it to admire the scope of what they created. Hardware AND software engineering at its finest solving some very hard problems.

always2slow
0 replies
19h17m

It's ok to have heroes.

StressedDev
0 replies
13h24m

OK, this comment was not really helpful. The article described some really interesting tech. Trashing tech or people who are excited by it does not really make things better.

rablackburn
7 replies
12h15m

This is one of the best job ads I’ve ever seen. Seamless segue into culture and “btw we’re hiring” at the end :)

This is truly a fantastic post-mortem: even an application-level developer like myself could follow it. Though I am in the middle of Rust in Action at the moment so I was primed for this type of content (:

And of course it’s always a pleasure seeing someone else who comments their code at such a high LoC ratio. Literate programming works!

IshKebab
6 replies
10h59m

Only in America though unfortunately.

steveklabnik
3 replies
5h20m

This is not the case, just to be clear about it. We have a bunch of folks in Canada, the UK, and Europe.

We do ask for some overlap in working hours with the US, which does make it harder the further away you get, but that is the constraint, not “US only.”

vhodges
1 replies
42m

Ah, that's good to know since it wasn't clear you were able to hire in Canada, as the peer comment says, I am not sure I would make it past a screening round :) (I don't currently do Rust - though I skimmed the tutorial and it looked doable), but working on a hypothetical web console written in Go has appeal.

The other project I'd be interested in working on (likely as an open source, separate project) would be bringing the dx/ux and tooling to home lab level gear.

Is there a Canadian subsidiary or is it a contract employee kind of relationship (if you can comment!)?

steveklabnik
0 replies
25m

but working on a hypothetical web console written in Go has appeal.

Our console is written in TypeScript on the front end and Rust in the back end, as far as I know: I'm not aware of Go being a meaningful part of our stack (other than in dependencies, like CockroachDB).

Is there a Canadian subsidiary or is it a contract employee kind of relationship (if you can comment!)?

I don't personally know the answer to this, being from the US :)

p_l
0 replies
4h5m

I adore the clear rules Oxide has on their webpage regarding scope of where they hire from.

I'll admit that to this day Github has been a bit of sore memory spot with how they publicly were all "we hire internationally and remote" but for a long time wouldn't publish that "internationally" meant "here's a small list of countries that applies to".

Now if only I believed I had a chance to pass the interviews :D

dcre
1 replies
5h23m

Not true. We have team members in Europe and NZ.

“Most of our team are based outside of the Bay Area. We do ask that your workday overlaps with Pacific Time for at least four hours.”

IshKebab
0 replies
25m

That's quite an ask though - it means I'd finish at 8:30pm at the earliest every day which is completely impossible with children.

monocasa
6 replies
20h31m

FWIW, you can support more than 8 regions by treating that hardware more like a soft fill TLB.

packetlost
3 replies
20h11m

I assume TLB is translation lookaside buffer, but what do you mean by "soft fill" here?

Veserv
2 replies
20h6m

They mean software-filled. On “page fault (really a illegal memory access)” you walk a software data structure to determine if there should actually be accessible memory there then you “page in (update the MPU to make the memory accessible, possibly evicting another entry)” the memory.

This is how some older chips used to work. Hardware page table walkers can be viewed as just a hardware implementation of such code.

chc4
1 replies
14h10m

I was horrified to learn that PowerPC uses an "inverted page table", which is essentially just a TLB implemented as a hash table. If the requested page isn't resident in the inverted page table it delivers an interrupt, where the kernel is supposed to walk it's own internal internal page table data structure and populate the hash table entry for the missing page. It was a big surprise coming from x86 or ARM on personal computer kernels where they have a fully populated page table as the canonical source of mappings, and all walking in hardware!

saagarjha
0 replies
12h7m

It does let you do some cute things in the software implementation though. I’d prefer a hardware walk of course but if I can’t have it might as well enjoy things.

robocat
1 replies
19h42m

I would guess they (a) want soft realtime performance, and (b) would not want to introduce something critical that could interfere with debuggability or potentially reliability. I'd never do it unless it was the last choice available. Virtual paging is nasty and I wouldn't want the doubt.

monocasa
0 replies
19h23m

I think they're just trying to keep it simple (which is totally fair, KISS principle is valuable).

You can keep soft real time perf with this scheme, just like you can keep soft realtime perf on any system with a soft fill TLB.

I say all of this as I used to be the lead for a real time kernel that would allow more than 8 regions on a task on a Cortex-M0/M3.

It's pretty easy to hit that amount with a combo of fine grain access (you probably want your RO section different than your TEXT section, you probably want your stack discontiguous with any mapped section, you've got all of your MMIO regions your task cares about which necessitate other MPU regions because the difference between normal memory and device memory is configured by the MPU), combined with breaking some apart to get greater granularity/packing like they talk about here.

0xfedbee
6 replies
16h45m

Moral of the story: using Rust will not automatically make your code bug proof. It can actually introduce bugs that are of totally new breed and hard to debug.

kibwen
1 replies
16h22m

I'm not sure I see where the bug here is due to Rust, or of a totally new breed. And the latter half of the blog post notes that it was fixed less than three hours after first being noticed.

rablackburn
0 replies
12h12m

And to quote the article itself:

> (…) it’s a rare case of a kernel memory access check bug that had no security implications.

My favourite type of kernel bug.

toast0
0 replies
14h9m

This example showed an introduced bug that was easy to debug, though.

nemothekid
0 replies
15h20m

I don't know, the fact that a buggy memory allocator implementation crashed rather than silently worked until it became a CVE seems like a win to me.

eigenform
0 replies
16h15m

I know you're intentionally spinning this as a bad thing, but it's really not.

Instead of worrying about simple and very common classes of bugs that can be solved statically with the help of the compiler, you are free to worry about whatever other non-trivial bugs in your program are remaining.

You're obviously free to waste your own time if you'd like.

StressedDev
0 replies
13h22m

Nope - This bug was not caused by the language. Also, no one has ever claimed Rust is perfect. What people are claiming is it is a good alternative to C/C++ which prevents a lot of common errors like buffer overflows.

dralley
5 replies
18h50m

Why/when would one use Hubris over something like embassy_rs, or vice-versa?

steveklabnik
4 replies
18h24m

Hubris has some particular design constraints that allow it to do interesting things, but also may disqualify it for certain projects. The main one is that everything is static, ahead of time: you set up tasks when you build the OS and the tasks, there's no "spawn new task" sort of APIs.

Another difference is that Hubris does not use async Rust, and embassy does. That may be something you desire either way. We don't believe async Rust is bad (heck it's used further up the stack quite a bit) but it wasn't a fit for this project.

The documentation explains the design goals: https://hubris.oxide.computer/reference/

Doesn't mean embassy isn't good too, it's just different.

StressedDev
1 replies
13h23m

Thank you for the thoughtful reply. I really appreciated it.

steveklabnik
0 replies
12h46m

You're welcome!

MrBuddyCasino
1 replies
10h46m

Hubris does not use async Rust, and embassy does

As much as I understand why you'd want async on an MCU for efficiency reasons, I'm very glad there is now a good option without it.

I wonder what the Hubris team thinks of rtic and its design.

steveklabnik
0 replies
5h14m

I can only speak for myself, but it’s sort of like “embassy without the HAL,” so closer to Hubris in some ways, and further away in others.

I’m sure that other folks would say the same thing here though: also good, just different.

quasarj
3 replies
17h57m

Somebody named an OS Hubris? Oh the.. I can't even say it

k0stas
0 replies
12h49m

And the author of the post is named Cliff L. Biffle, which is one of the best names I have ever heard.

cdchn
0 replies
13h43m

Seems pretty on-brand.

xelxebar
2 replies
8h47m

Tight nonhierarchical integration of the team. This isn’t a Hubris feature, but it’s hard to separate Hubris from the team that built it. Oxide’s engineering team has essentially no internal silos. Our culture rewards openness, curiosity, and communication, and discourages defensiveness, empire-building, and gatekeeping. We’ve worked hard to create and defend this culture, and I think it shows in the way we organized horizontally, across the borders of what other organizations would call teams, to solve this mystery.

This caught my attention. I'd love to hear more about the motivations for crafting such a culture as well as some particular implementation details. I'm curious if there are drawbacks to fostering "openness, curiosity, and communication" within an organization? Obviously, some go for more rigid hierarchical systems, and it occurs to me that an org chart can (and probably should) be strategically decided upon, but I am somewhat clueless as to the tradeoffs. HN have any insight here?

rcxdude
1 replies
8h26m

Hmmmm, the specific values stated here is hard to evaluate, but in general one disadvantage of having an organisation without a strongly defined structure is that generally there still is a power structure of sorts. But when it's not explicitly defined it's less open, and generally less deliberately chosen, and harder to understand (especially for those who are good at social interaction), so it can allow sometimes even more pathological behaviours because of the shadowy nature of it, but also even if it doesn't get particularly bad it can make co-ordination much harder.

I've experienced this in some companies I've worked for (one was a very large company where there was an explicit power structure of sorts, but it was not particularly stuck to in practice. It was a consultancy which worked on a variety of different projects, and the way you got onto projects was basically by making friends with the people selling and managing them, not so much through any particular official channel. This worked great if you were good at forming the social network needed to do this, it didn't work so well if you weren't). For other examples, there's "The Tyranny of Structurelessness" which is a talk from a feminist who noticed the same kind of thing happen in the organisations she was in, which generally rejected hierarchy as patriarchal, and you can see similar discussions around how this does/doesn't work in Valve, which also doesn't have a particularly well-defined internal structure. Open-source projects can suffer from the same thing (I would characterise some of the rust drama as stemming from a similar problem).

That said, an explicit power structure doesn't need to be hierarchical, even though that's the 'traditional' business organisation. It's possible Oxide's structure is explicit but non-hierarchical. Also, in general such an approach tends to work better at smaller scales than larger ones. Said consultancy is probably the largest company I'm aware of which still mostly operates in a freeform way there, and they do have a kind of scaffolding which keeps it kinda held in place. And of course, this is more of a spectrum than a dichotomy: even the most rigid of power structures on paper still has some implicit, more complex implicit structure underneath: it's the nature of groups of people.

Don't take this as me believing explicit > implicit in general. This is just me talking about some of the observed disadvantages of less explicit organisation. More explicit power structures have their own problems as well (and there's the related 'seeing like a state'/'legibility' issues there).

bcantrill
0 replies
2h46m

So, when hearing about Oxide's culture, these kind of objections come up a lot (namely, concerns about shadow structures). There are three important things to know about Oxide: first, while we think autonomy is important, we do have a CEO -- and the CEO is (and has to be) the final single authority in the company. Now, we also don't disagree much as a company, and that leads to the second important thing to know: our hiring process[0][1] is very, very deliberate -- and we are carefully looking for people who will thrive in our environment. And that it's careful and deliberate is not unrelated to the third thing to know about Oxide: our compensation is uniform.[2] (As it turns out, people are very careful when evaluating someone who is a peer rather than a subordinate!)

With respect to other environments that encourage autonomy, that last element -- that compensation is uniform -- tends to set Oxide apart: if an environment encourages both autonomy and stack-based ranking and compensation, it absolutely will create shadow structure. By removing the organizational tumor of variable compensation, Oxide removes the shadow structure that it creates.

[0] https://rfd.shared.oxide.computer/rfd/0003

[1] https://oxide-and-friends.transistor.fm/episodes/hiring-proc...

[2] https://oxide.computer/blog/compensation-as-a-reflection-of-...

scottlamb
1 replies
19h55m

Nice read!

Nit:

    // Order the task's regions in ascending address order.
    //
    // THIS IS IMPORTANT. The kernel exploits this property to do cheaper
    // access tests.
    regions.sort_by_key(|i| region_table.get_index(*i).unwrap().1.base);
I wouldn't put this comment here. It's not just some detail of this function; it's an invariant of the field that all writers have to respect (maybe this is the only one now but still) and all readers can take advantage of. So I'd add it to the `TaskDesc::regions` docstring. [1]

[1] https://github.com/oxidecomputer/hubris/commit/b44e677fb39cd...

db48x
0 replies
14h17m

It is nice to have the comment next to the sort though, which is otherwise going to be surprising.

Probably the best thing to do is to make a constructor method for the TaskDesc that sorts the regions to enforce the invariants. The code is evidently getting more complex over time, so packing the complexity up into methods is probably now worth spending a little time on.

retSava
1 replies
9h34m

Anyone knows what they mean by "For instance, we have an internal board that serves as a useful I2C debugging probe and has 8 kiB of RAM and 32 kiB of flash.".

What could that interesting thing be? An inline-I2C sniffer perhaps? Or an interface board to inject i2c commands? Or something else?

syntheticgate
0 replies
3h14m

I believe this is referring to a little board with an STM32G031 part that exposes a PMOD i2c interface and we use it for programming serial numbers into FRUID EEPROMs on the manufacturing line. It runs hubris, can be controlled by humility via SWD, and has just enough space to do this function. We have other dev boards using larger STM32 parts and didn't want to consume those for our manufacturing line stations.

Given the size of the part on here, and the limited I/O, it's not very useful outside of this use-case but works great here.

WatchDog
1 replies
9h32m

Refactoring that `can_access` function into it's own crate, seems excessive, leftPad vibes.

steveklabnik
0 replies
5h12m

If you’re going by “how many LOC is in a translation unit” then the whole thing has “leftpad vibes.” The kernel itself is 2k lines. There’s lots of drivers that are only dozens of lines.

1970-01-01
1 replies
4h47m

random crashes that go away when you add some debugging code are the worst kind of crash, but that was the situation we were in

Oh my

Lord-Jobo
0 replies
4h25m

I can think of very few scenarios that would send me into an 8 hour fury and caffeine fueled debug session as quickly as that.

It's one of those problems that throws you into a fit of self doubt and paranoia. You question your basic assumptions, your hardware, your colleagues, and God himself before making any headway and it turns out its often someone else's "fault", who programmed some core feature of a library 25 years ago in an obscure way, because they never could have imagined the Frankenstein-esque behemoth their code would be embedded in nearly 3 decades later.

sbt567
0 replies
19h50m

I adore whatever folks at Oxide does. And this is one of it

moosingin3space
0 replies
1d15h

This is a fantastic in-depth look at debugging a complex problem, and the fact that the rest of the system remained stable is a testament to the quality of the engineering work that the Oxide team put into this. I'm personally quite inspired by this and plan on applying similar techniques in my day job!