Helios: A distribution of Illumos powering the Oxide Rack

I know they’re ex-Sun, but is there any real technical benefit for choosing not-Linux (for their business value prop)?

I know of the technical benefits of illumos over linux, but does that actually matter to the customers who are buying these? Aren’t they opening a whole can of worms for ideology/tradition that won’t sell any more computers?

As someone who runs Linux container workloads, the fact that this is fundamentally not-Linux (yes I know it runs Linux binaries unmodified) would be a reason against buying it, not for.

does that actually matter to the customers who are buying these?

It's not like we specifically say "oh btw there's illumos inside and that's why you should buy the rack." It's not a customer-facing detail of the product. I'm sure most will never even know that this is the case.

What customers do care about is that the rack is efficient, reliable, suits their needs, etc. Choosing illumos instead of Linux here is a choice made to help effectively deliver on that value. This does not mean that you couldn't build a similar product on top of Linux inherently, by the way, just that we decided illumos was more fit for purpose.

This decision was made with the team, in the form of an RFD[1]. It's #26, though it is not currently public. The two choices that were seriously considered were KVM on Linux, and bhyve on illumos. It is pretty long. In the end, a path must be chosen, and we chose our path. I do not work on this part of the product, but I haven't seen any reason to believe it has been a hindrance, and probably is actually the right call.

the fact that this is fundamentally not-Linux (yes I know it runs Linux binaries unmodified) would be a reason against buying it, not for.

I am curious why, if you feel like elaborating. EDIT: oh just saw your comment down here: https://news.ycombinator.com/item?id=39180814

1: https://rfd.shared.oxide.computer/

A team should always pick the tools they are most familiar with. They will always have better results with that, than trying to use something they understand less. With this in mind, using their own stack is a perfectly adequate choice. Factors outside their team will determine if that works out in the long term.

A handful of the team are more familiar with Illumos and the next hundred people they hire after that will be more familiar with Linux.

If your hiring decisions are always based on what people are currently familiar with, you'll always be stuck in the past. You may not even be able to use present day tooling and systems because they could be too new to hire people for.

You're much better off hiring people who are capable of learning, and then giving them the opportunities to learn and advance their knowledge and skills.

Everyone is capable of learning. I can hire someone who is capable of learning Japanese. They can then try to teach the rest of the team Japanese. Does that mean it's a good idea to switch all our internal docs to Japanese? Maybe if I was building a startup in Japan. Similarly, writing internal docs in English for a startup in Japan would be of equal difficulty and value. Hooray, we're learning! And struggling more than needed to build a product.

You're better off hiring experienced people who are highly productive. If they're highly productive with one stack, it makes no sense to change their stack so they're no longer productive, or hiring people who aren't familiar with it and waiting for them to become productive.

There's nothing wrong with using old, well established things. They're quite often better than new things. As long as they're still supported, just use whatever builds a working product. It's the end product that matters.

Everyone is capable of learning. I can hire someone who is capable of learning Japanese. They can then try to teach the rest of the team Japanese. Does that mean it's a good idea to switch all our internal docs to Japanese?

The difference between Japanese and English is much, much bigger than the difference between one Unix OS and one Unix-like OS. This is a remarkably disingenuous argument. If you really don't understand the difference in scope, there's no point in discussing anything with you because you've managed to disprove your opening sentence with yourself as the counterexample.

You're welcome to see it that way if you want. But if you think you can get to know a completely new kernel, OS, etc in a short amount of time, backwards and forwards, you're equally as disingenuous. You can get by editing a few lines in a pinch, but you could equally just learn a few Japanese phrases. Proper understanding requires a deep knowledge that comes from practice and experience with subtle complexity and context.

(Japanese isn't so radically different from English, it mostly just has more words for more contexts. In many ways it's simpler than English. It would be harder to go from Java to Haskell, with their many different language paradigms)

A lot of people out there claim to know Linux, yet few can prove it. OTOH, if they gain a cult following with lots of people using their stack, those people might become more familiar with their stack than most Linux people are with theirs. They could grow a captive base of prospective hires.

That's not the big concern though. The big concern is whether vendor integration and certification becomes a stumbling block. You can hire any monkey to write good-enough code, but that doesn't give you millions in return. Partnerships with vendors and compliance certifications can give you hundreds of millions. The harder that is, the farther the money is. A totally custom, foreign stack can make it harder, or not; it depends how they allocate their human capital and business strategy, whether they can convince vendors to partner, and clients to buy in. Anything very different is a risk that's hard to ignore.

To be clear, we at the time had already hired people with deep familiarity with Linux at the time this decision was made. In particular, Laura Abbott, as one example.

It is true that the number of developers that know Linux is larger than the ones that know illumos. But this is also true of the number of developers who know C as the ones who know Rust. Just like some folks need to be onboarded to Rust, some will need to be onboarded to illumos. That is of course part of the tradeoff.

As someone that knows UNIX since 1993, starting with Xenix, many that are familiar with Linux, are actually familiar with a specific Linux distribution, as the Linux wars took over UNIX wars.

That being the case, knowing yet another UNIX cousin isn't that big deal.

I do not personally agree with this. I do think that familiarity is a factor to consider, but would not give it this degree of importance.

It also was not discussed as a factor in the RFD.

The Linux vs. Illumos decision seems to be downstream of a more fundamental decision to make VMs the narrow waist of the Oxide system. That's what I'm curious about.

Especially since Oxide has a big fancy firmware stack. I would expect this stack to be able to do an excellent job of securely allocating bare-metal (i.e. VMX root on x86 or EL2 if Oxide ever goes ARM) resources.

This would allow workloads on Oxide to run their own VMs, to safely use PCIe devices without dealing with interrupt redirection, etc.

I'm not affiliated with Oxide but I don't think you can put Crucible and VPC/OPTE in firmware. Without a DPU those components have to run in the hypervisor.

Possibly not.

But I do wonder why cloud and cloud-like systems aren’t more aggressive about splitting the infrastructure and tenant portions of each server into different pieces of hardware, e.g. DPU. A DPU could look DPU could look like a PCIe target exposing NVMe and a NIC, for example.

Obviously this would be an even more custom design than Oxide currently has, but Oxide doesn’t seem particularly shy about such things.

It would be great if that RFD will become public someday, if it of course possible, especially if it's a long read.

If you're running in one of the big 3 cloud providers, the bottom-level hypervisors are not-linux. This is equivalent. Are you anti-AWS or anti-Azure for the same reason?

This is the substrate upon which you will run any virtualized infrastructure.

Small note, that's not true for Google Cloud, which runs on top of Linux, though modified.

Disclaimer: Former Googler, Cloud Support

Another Xoogler here: any idea what they mean by it's not Linux at the bottom for other providers? Like, surely it's _some_ common OS? Either my binaries wouldn't run or AWS is reimplementing Linux so they can, which seems odd.

Or are they just saying that the VM my binary runs on might be some predictable Linux version, but the underlying thing launching the VM could be anything?

Old AWS used to be Xen, Nitro afaik uses customised VMM and I don't recall if it's not a custom OS or hosted on top of something.

Azure is Hyper-V underneath IIRC, a custom variant at least (remember Windows Server Nano? IIRC it was the closest you could get to running it), with sometimes weird things like network cards running Linux and integrating with Windows' built-in SDN facility.

Rest of the bigger ones is mainly Linux with occasional Xen and such, but sometimes you can encounter non-trivial VMware deployments.

Nitro is supposed to be this super customized version of KVM.

Correct, that the Hypervisor isn't running Linux.

I think the only provider where that would make sense would be Microsoft, where they have their own OS.

Azure runs a version of Windows, see:

https://techcommunity.microsoft.com/t5/windows-os-platform-b...

When your programs are running on a VM, the linux that loads and runs your binaries is not at the bottom; that linux image runs inside a virtual machine which is constructed and supervised by a hypervisor which sits underneath it all. That hypervisor may run on the bare machine (or what passes for a bare machine what with all the sub-ring-zero crud out there), or may run on top of another OS which could be linux or something else. And even if there is linux in the middle and linux at the bottom they could be completely different versions of linux from releases made years apart.

> Or are they just saying that the VM my binary runs on might be some predictable Linux version, but the underlying thing launching the VM could be anything?

Yup. eg with Xen the hypervisor wasn't Linux, even if the privileged management VM (dom0) was Linux (or optionally NetBSD in the early days). The very small Xen hypervisor running on the bare metal was not a general purpose OS, and didn't expose any interface itself - it was well hidden and relied on dom0 for administration.

As I understand it, there's linux running on the Google Cloud hardware but the virtualized networking and storage stacks in Google Cloud are google proprietary and largely bypass linux -- in the case of networking see the "Snap: a Microkernel Approach to Host Networking" paper.

In contrast, it appears that Oxide is committing to open-source the equivalent pieces of their virtualization platform.

I don't about EC2 but Lambda and Fargate are presumably Firecracker, which is Linux KVM.

AWS "Nitro" hypervisor which powers EC2 is their (very customized) KVM.

https://docs.aws.amazon.com/whitepapers/latest/security-desi...

I suspect a lot of people would (irrationally) freak out if they saw how the public cloud works because it's so different from "best practices". Oxide would probably trigger people less if they never mentioned Illumos but that's not really an option when it's open source.

Linux is a nightmare in the embedded/appliance space because one ends up just having platform engineers who spend their day fixing problems with the latest kernels, drivers, core libraries, etc, that the actual application depends on.

Or one goes the route of 99% of the IoT/etc vendors, and never update the base OS and pray that there aren't any active exploits targeting it.

This is why a lot of medium-sized companies cried about Centos, which allowed them to largely stick to a fairly stable platform that was getting security updates without having to actually pay/run a full blown RHEL/etc install. Every ten years or so they had to revisit all the dependencies, but that is a far easier problem than dealing with a year or two update cycle, which is too short when the qualification timeframe for some of these systems is 6+ months long.

So, this is almost exclusively a Linux problem; any of the *BSD/etc. alternatives give you almost all of what Linux provides without this constant breakage.

This is a really, really good point -- and is a result of the model of Linux being only a kernel (and not system libraries, commands, etc.). It means that any real use of Linux is not merely signing up for kernel maintenance (which itself can be arduous) but also must make decisions around every other aspect of the system (each with its own communities, release management, etc.). This act is the act of creating a distribution -- and it's a huge burden to take on. Both illumos and the BSD derivatives make this significantly easier by simply including much more of the system within their scope: they are not merely kernels, but also system libraries and commands.

This weighed heavily in our own calculus, so I'm glad you brought it up!

including much more of the system within their scope: they are not merely kernels, but also system libraries and commands.

giving limited resources of the dev team it may lead to limited support of the system outside of the narrow set of officially supported/certified hardware with that support falling behind on modern hardware, as it happened with Sun, and vendor lock-in as a result into overpriced and low performing hardware.

There is a reason that back then at Solaris dev there was a joke about embedding Linux kernel as a universal driver for Solaris kernel in order to get reasonable support for the hardware around.

Well they aren't burdened by having to make their own processors, like Sun had to do, or their own full custom chips in general. They just have to support the selection of hardware they pick, and they have complete oversight of what hardware runs on their racks. So I'm not sure if the sun comparison is relevant here, since they can still pick top of the line hardware. Just not any hardware

Any issues with funding or whatever, and their customers would get locked in on the yesterday's "top of the line hardware" (reminds how Oracle used lawyers to force HP to continue support Itanic). Sun was 50K persons company, and they struggled to support even reasonably wide set of hardware. Vendor lock in is like a Newton law in this industry.

that support falling behind on modern hardware, as it happened with Sun, and vendor lock-in as a result into overpriced and low performing hardware.

The Oxide hw is using available AMD SKUs for CPU.

This is less of an issue for us at Oxide, since we control the hardware (and it is all modern hardware; just a relatively small subset of what exists out there). Part of Sun's issue was that it was tied not just to a software ecosystem, but also to an all-but-proprietary hardware architecture and surrounding platform. Sun eventually tried to move beyond SPARC and SBus/MBus, but they really only succeeded in the latter, not the former.

CentOS wasn’t used in embedded systems.

Even Windows was and is used substantially in embedded systems.

I know about that. This is a special edition for embedded though. But CentOS is news to me. CentOS was targeted for servers.

Arista EOS is definitely CentOS Linux release 7.9.2009 (AltArch) based.

Sure it was. So is RHEL.

Embedded isn't limited to devices equal or less powerful / expensive than the Raspberry Pi.

Interesting that you bring up embedded/appliance space, as I have noticed there are plenty of FOSS alternatives coming up, key features not being Linux based, and not using GPL derived licenses.

FreeRTOS, Nuttx, Zephyr, mbed, Azure RTOS,...

Aren't they also ex-Joyent? Joyent ran customer VMs in prod on Illumos for many years so there's a lot of experience there.

bcantrill used to work at Sun then became CTO at Joyent, so the reason why Joyent ran Illumos is probably the same reason as why Oxide is, because Cantrill likes it and judges that it's a good fit for what they are doing.

As I elaborated above, bcantrill did not decree that we must use illumos. Technical decisions are not handed down from above at Oxide.

I saw your comment[1] after I wrote mine, but I'm not saying that he's forcing you guys to use it (that would not a good way of being a CTO at a start-up…), but that doesn't prevent him from advocating for solutions he believes in.

Would you say that Oxide would have chosen Illumos if he wasn't part of the company?

[1]: https://news.ycombinator.com/item?id=39180706

(I work at Oxide.)

Bryan is just one out of several illumos experts here. If none of those were around, sure, maybe we wouldn't have picked illumos -- but then we'd be unrecognizably different.

I came into Oxide with a Linux background and zero knowledge of illumos. Learning about DTrace especially has been great.

I'd like to learn DTrace (especially after the recent 20yr podcast episode), but I worry it'll never make into mainstream Linux debugging, and hence only useful for more niche jobs.

Your concern is completely reasonable -- a thing I'd add though is that both Windows and macOS have DTrace support.

I was excited, but it looks like both MacOS and Windows require special admin permissions for my laptop that I doubt my work would approve (completely reasonable to require this, it just makes it unusable for me).

Would you say that Oxide would have chosen Illumos if he wasn't part of the company?

I don't know how to respond to this question, because to me it reads like "if things were completely different, what would they be like?" I have no idea if you could even argue that a company could be the same company with different founders.

What I can say is that this line of questioning still makes me feel like you're implying that this choice was made simply based on preference. It was not. I am employee #17 at Oxide, and the decision still wasn't made by the time I joined. But again, the choice was made based on a number of technical factors. The RFD wasn't even authored by Bryan, but instead by four other folks at Oxide. We all (well, everyone who wanted to, I say "we" because I in fact did) wrote out the pros and cons of both, and we weighed it like we would weigh any technical decision: that is, not as a battle of sports teams, but as a "hey we need to drive some screws: should we use a screwdriver, a hammer, or something else?" sort of nuts-and-bolts engineering decision.

we weighed it like we would weigh any technical decision: that is, not as a battle of sports teams, but as a "hey we need to drive some screws: should we use a screwdriver, a hammer, or something else?" sort of nuts-and-bolts engineering decision.

I'm not saying otherwise.

In fact, when I wrote my original comment, I actually rewrote it multiple time to be sure it wouldn't suggest I was thinking it was some sort of irrational decision (that's why I added the “it's a good fit for what they are doing”), but given your reaction it looks like I failed. Written language is hard, especially in a foreign language, sorry about that.

It's all good! I re-wrote what I wrote multiple times as well. Communication is hard. I appreciate you taking the effort, sorry to have misunderstood.

Heck, there's a great little mistake of communication in the title: this isn't just "intended" to power the rack, it does power the rack! But they said that because we said that in the README, because that line in the README was written before it ended up happening. Oops!

Many people, including part of the founding team, are ex-Joyent, yes. Some also worked at Sun, on the operating systems that illumos is ultimately derived from.

The main drawbacks to me are

1. No support for nested virtualization, so running a vm inside your vm is not available. This prevents use of projects such as kubevirt or firecracker on a Linux guest, and WSL2 on a Windows guest.

2. No GPU support

If the base hypervisor was Linux, it would be way more capable for users it seems. I also wonder if internally Linux is used for development of the platform itself so they can create "virtual" racks to dogfood the product without full blown physical racks.

With all that said, I do not know the roadmap and admittedly there are already quite a few existing platforms built on kvm, so as their hypervisor improves and becomes more capable it could potentially become strategic advantage.

I also wonder if internally Linux is used for development of the platform itself

Developers at Oxide work on whatever platform they'd like, as long as they can do their work. I will say I am in the minority as a Windows user though, most are on some form of Unix.

so they can create "virtual" racks to dogfood the product without full blown physical racks.

So one of the reasons why Rust is such an advantage for us is its strong cross-platform support: you can run a simulated version of the control plane on Mac, Linux, and Illumos, without a physical rack. The non-simulated version must run on Helios. [1]

That said we do have a rack in the office (literally named dogfood) that employees can use for various things if they wish.

1: https://github.com/oxidecomputer/omicron?tab=readme-ov-file#...

Interesting thanks for the insight.

I will say I am in the minority as a Windows user though, most are on some form of Unix.

Now i'm imagining Helios inside WSI - Windows Subsystem for illumos

You're welcome. I will give you one more fun anecdote here: when I came to Oxide, nobody in my corner of the company was using Windows. And hubris and humility almost Just Worked: we had one build system issue that was using strings instead of the path APIs, but as soon as I fixed those, it all worked. bcantrill remarked that if you had gone back in time and told him long ago that some of his code would Just Work on Windows, he would have called you a liar, and it's one of the things that validates our decisions to go with Rust over C as the default language for development inside Oxide.

Now i'm imagining Helios inside WSI - Windows Subsystem for illumos

That would be pretty funny, ha! IIRC something about simulated omicron doesn't work inside WSL, but since I don't work on it actively, I haven't bothered to try and patch that up. I think I tried one time, I don't remember specifically what the issue was, as I don't generally use WSL for development, so it's a bit foreign to me as well.

that was using strings instead of the path API

Man you can't let Brain live that one down can you?

I didn't bother to git blame the code, I myself do this from time to time :)

Now i'm imagining Helios inside WSI - Windows Subsystem for illumos

I mean... WSL2 is just hyperv with some integration glue, and illumos isn't Linux but unix is unix; that might well be doable.

How is Oxide for GPU-heavy workloads?

There are no GPUs in the rack, so pretty bad, haha.

We certainly understand that there's space in the market for a GPU-focused product, but that's a different one than the one we're starting the company off with. There's additional challenge with how we as a company desire openness, and GPUs are incredibly proprietary. We'll see what the future brings. Luckily for us many people still desire good old classic CPU compute.

Would pass-through to VM work?

At $work I'm running SmartOS servers with GPU passing to a ubuntu bhyve for the occasional CUDA compute and it works wonderfully. Wonder if similar could be possible with Helios?

The software interface isn't the problem: the problem is that there are no physical GPUs in the product. There's nothing to pass through.

yes I know it runs Linux binaries unmodified

Is it that it runs Linux binaries unmodified or that it runs vms and manages VMs which run Linux, and as an end-user, that's what you run your software in?

As far as I recall it's not a VM. They run in "LX Branded Zones" which does require a Linux userland so that the binaries can find their libraries etc but Zones are more like "better cgroups than cgroups, a decade earlier" than VMs.

No, it's a VM, running a bhyve-based hypervisor, Propolis.[0] LX branded zones were/are great -- but for absolute fidelity one really needs VMs.

[0] https://github.com/oxidecomputer/propolis

Do you have a solution for running containers (Kubernetes, etc)? Are you spinning up a Linux VM to run the containers in there, doing VM per container, or something else?

Costumers can decide I would assume. Most likely you install you install some Kubernetes and then just have multible VMs distributed across the rack. And then run multible Pods in each node.

VM per container seems like a waist unless you need that extra isolation.

I wondered if there was any support for running containers built in - something like EKS/AKS/GKE/Cloud Run/etc - but looking at the docs it appears not.

I agree that VM per container can be wasteful - though something like Firecracker at least helps with start time.

From the podcast it seems that they want to deliver a minimal viable product. Their primary costumers already have a lot of their own higher level stack.

They might get into adding more higher level software eventually depending on what costumers want.

It runs VMs -- so it doesn't just run Linux binaries unmodified, it runs Linux kernels unmodified (and, for that matter, Windows, FreeBSD, OpenBSD, etc.).

Keep in mind that Helios is really just an implementation detail of the rack; like Hubris[0], it's not something visible to the user or to applications. (The user of the rack provisions VMs.)

As for why an illumos derivative and not something else, we expanded on this a bit in our Q&A when we shipped our first rack[1] -- and we will expand on it again in the (recorded) discussion that we will have later today.[2]

[0] https://hubris.oxide.computer/

[1] https://www.youtube.com/watch?v=5P5Mk_IggE0&t=2556s

[2] https://mastodon.social/@bcantrill/111840269356297809

Perhaps you could talk a bit about the distributed storage based on Crucible with ZFS as the backing storage tonight. I would really love to hear some of the details and challenges there.

Yes! Crucible[0] is on our list of upcoming episodes. We can touch on it tonight, but it's really deserving of its own deep dive!

[0] https://github.com/oxidecomputer/crucible

The timing of your podcast is the least convenient thing ever for us poor Europeans. And then the brutal wait the next day until its uploaded.

The only thing I miss about Twitter Spaces is that you could listen the morning after.

Yes (hello from Czechia), however there will always be somebody who this is inconvenient for. Also, I have to confess I was at times immersed in other work that I made a few Oxide and Friends live. I might stay up tonight.

I am looking forward to the crucible episode. It sounds like it could be a startup on its own, it wouldn't be the first distributed file/ storage system company.

Do you have the same gut reaction to ESXi?

I sure do. We've finally got to a place where we don't need weird hardware tricks to containerize workloads -- this is why a lot of shops pursue docker-like ops for production. When I buy hardware, long-term maintenance is a factor, and when my whole operations fleet relies on ESX, or in this case a Solaris fork, I'm now beholden to one company for support at that layer. Buying a rack of Supermicro gear and running RHEL or SLES with containerized orchestration on top means I can, in a pinch, hire experts anywhere to work on my systems.

I have no reason to believe Oxide would be anything but responsive and effective in supporting their systems, but introducing bespoke software this deep in the stack severely curtails my options if things get bad.

I can somewhat see your point, but in my experience you can't rely on RHEL or whatever vendor Linux to correctly bring up random OEM hardware. You will slowly discover all of the quirks, like it didn't initialize the platform EDAC the way you expected, or it didn't resolve some weird IRQ issue, etc. Nothing about my experience leads me to believe Linux will JFW on a given box, so I don't feel like Linux has an advantage in this regard, or that niche operating systems have a disadvantage. Certainly I feel like a first-party OS from the hardware vendor is going to have a lot of advantages.

I think the value proposition they're offering is a carefully integrated system where everything has been thoroughly engineered/tested to work with everything else, down to writing custom firmware to guarantee that it's all ship-shape, so that customers don't have to touch any of the innards, and will probably just treat them as a black box. It seems like it's chock-full of stuff that they custom-built and that nobody else would be familiar with, by design. If that's not what you want, this probably isn't the product for you.

This has been / will be the market education challenge; Its the same one Joyent had with SmartOS. Theyre correctly pointing out that the end user or operator will basically never interact with this layer, but it does cause some knee-jerk reactions. All that said, there are some pretty great technical benefits to using illumos derived systems the least of which is the teams familiarity and ability to do real diagnosis on production issues. I wont put words in anyones mouth but I suspect thats going to be critical for them as they support customer deployments w/o direct physical access.

Seems strange to me too but it sounds like the end-users basically never interact with this - it's just firmware humming along in the background. As long as its open-source and reasonably well documented its already lightyears ahead of what else is out there.

It seems healthy to have options, almost like the universe is healing a bit after oracle bought Sun. I can't imagine better hands bringing the oxide system together than that team. As an engineer who works entirely with Linux anymore, I pine for the days of another strong Unix in the mix to run high value workloads on. Comparing openvswitch on Linux, to say, the crossbow SDN facility on Solaris, I'd take crossbow any day. Nothing "wrong" with Linux, but it is sorely lacking in "master plan" levels of cohesion with all the tooling taking their own path, often bringing complexity that requires even for abstraction with yet more complicated tooling on top.

As far as performance and feature set, probably not anymore (I would have answered differently 10 years ago, and if I am wrong today would love to be educated about it).

However, if we are considering code quality, which I consider important if you are actually going to be maintaining it yourself as oxide will have to do since they need customizations, then most of the proprietary Unix sources are just superior imo. That is, they have better organization, more consistency in standards, etc. The BSDs are slightly better in this regard as well, it really isn't a proprietary vs open source issue, it's more about the insane size of the Linux kernel project making strict standards enforcement difficult if not impossible the further you get from the very core system components.

Irregardless of them being ex-Sun (and I am not ex Sun), if I needed a custom OS for a product I was working on, Linux would be close to the last Unix based OS source tree I would try to do it with, only after all other options failed for whatever reason. And that's not even taking into account the licensing, which is a whole other can of worms.

As a customer, I expect most of the technical advantages will be basically being a down stream consumer of ZFS. From a developer / maintainer of an OS, Dtrace and ZFS are large technical wins. Part of the overall value proposition of Oxide, is "correctness". You get an OS/Hardware stack that are designed to work together. You get 20 years of cruft thrown out. You get a lot of tooling, API's, etc written in a performant memory safe language (rust). Also you get a really fantastic podcast about the whole process. And as a customer you get a company that understands their stack from driver to VM and has a ton of internal expertise debugging production problems.

Their customers run virtualised OS on top of this.

This is no different from Azure Host OS, Bottlerocket, Flatcar or whatever.

This maters to them, as knowing the whole stack, some of the kernel code is still theirs from Sun days, and making it available matters to the customers that want source code access for security assement reasons.

I think it's a good idea to have more choice, especially in OSS. A Linux mono culture isn't any better than a chromium mono culture. They might be able to do stuff that just isn't practical if they stuck with Linux. They are also probably more familiar with illumos, or at least familiar enough to know that they can use it to do more than with linux

In one podcast, the reason given was staff familiarity and owning the full stack, not just the kernel I believe.

Not everything needs to be linux. Besides, if monocultures are supposed to be harmful, why is linux being thrown to everything nowadays? Very dangerous to have a single point of failure in (critical) applications.

Perhaps Illumos is particularly well suited for a Hypervisor/Cloud platform due to work upstreamed by Joyent originally for SmartOS?

I'm glad this is out, i'm going to deploy this locally and learn as much about it as possible. Oxide is pretty much the company I dream to work at, both for the tech stack, plus the people working there. Thank you Oxide team!

Can you get me excited? I spent 20 seconds browsing the homepage and walked away with "so the idea is vertical integration for on-premise server purchases? On custom OS? Why? Why would people pay a premium?"

But immediately got myself to "what does a server OS do anyway, doesn't it just launch VMs? You don't need Linux, just the ability to launch Linux VMs"

Tell me more? :)

It seems like the folks on HN tend to think the world runs on AWS (I'm not trying to say they don't have a huge market share), but many huge enterprises still run their own datacenters and buy ungodly amounts of hardware.

The products that are on the market for an AWS-like experience on-prem are still fairly horrible. A lot of times the solutions are collaborations between vendors, which makes support a huge pain (finger pointing between companies).

Or, a particular vendor might only have compute and storage, but no offering for SDN and vice-versa. This sucks because then you have two bespoke things to manage and hope they work together correct.

These companies want a full AWS experience in their datacenter, and so far this looks to be the most promising without dedicating huge amounts of resources to something like Openstack.

OpenStack is pretty smooth sailing these days and I bet you it would be much cheaper to just get 3 FTEs for your OpenStack install than an Oxide rack

Where, exactly, are you getting these 3FTEs qualified to touch production OpenStack infra, for more than a year, where their aggregate cost is less than a rack of equipment?

The rack doesn't require FTEs?

Not three of them; it ought to be about as difficult to administer as a single rack of hw, +Vsphere, if that.

if you need OpenStack you're not running one rack, but a couple dozen.

That...sounds like a market segment you've just discovered for Oxide.

Wouldn't a "full AWS experience in their datacenter" be AWS Outpost?

Is AWS outpost truly a full AWS stack/experience? I thought it wasn't actually meant to be a "data center in a box" experience, but more so a way to run some workloads locally when you are already using AWS for everything else.

Some data products will run successfully in AWS Outposts. Others will not. For example, AWS itself can't run DynamoDB in an AWS Outpost. It recommends users to run ScyllaDB in DynamoDB compatible mode.

e.g., https://www.scylladb.com/2020/09/15/scylla-cloud-on-aws-outp...

Disclosure: I worked at ScyllaDB.

"full AWS experience in their datacenter"

... Including the bill!

With DHH and others promoting a post-SaaS approach (once.com, etc.) we might see hardware refresh as cost-cutting. Astronomical compute bills and lack of granularity bring all things cloudy into sharp focus.

What they are doing is SaaS by stealth.

You buy their product once, but it only has bug and security fixes for 3 years.

Which means every business is going to need to upgrade on a cycle anyway.

People don't usually throw out their server hardware after 3 years. After 3 years is up they'll probably sell service plans. And with the code being all open source some owners may go the self-supported route, though probably most will buy service plans.

Was actually referring to DHH and the 37signals products.

Whilst I think we will see a trend back towards more on-premise hardware I don't think SaaS is going away anytime soon. And in fact it's arguably better for everyone because the software is being continually maintained.

The "(finger pointing between companies)" took me from confusion to 100% understanding, was at Google until recently. It was astonishing to me that it was universally acceptable to fingerpoint if it was outside your immediate group of ~80 people.*

Took me from "why would people go with this over Dell?" to "holy shit, I'm expecting Dell to do software and make nvidia/red hat/etc/etc etc/etc etc etc help out. lol!"

* also, how destructive it is. never, ever, ever let ppl talk shit about other ppl. There's a difference between "ugh, honestly, it seems like they're focused on release 11.0 this year" and "ughh they're usless idk what they're thinking??? stupid product anyway" and for whatever reason, B made you normal, A made you a tryhard pedant

The best elevator pitch I've heard is "AWS APIs for on-prem datacenters". They make turn-key managed racks that behave just like a commercial cloud would with all the APIs for VM, storage, and network provisioning and integration you'd expect from AWS, except made to deploy in your company's datacenter under your control.

I guess the wildcard is price.

AWS's pricing model works kinda at their OMG eyewatering scale - aka all the custom hardware they design is highly cost optimized, but just doing custom hardware has a notable cost. This is easily covered by their scale, to make for their famous margins. [during their low scale times, they did use a good bit of HP/Dell, etc]

Oxide seems to be no different (super custom hardware) only major difference being the "in your datacenter" part. Since you own the cost of your datacenter, Oxide has to come in a lot cheaper to even compete with AWS, but how do you do that with low volume [and from the look of it not-cost optimized, but instead fairly tank-like] bespoke hardware? Feels like the pricing / customer fundamentals are going to be pretty rough here outside perhaps a few verticals.

Datacenter costs are weird. The first big cost is having a datacenter. However once you have the space, power, cooling and that part makes sense, then the actual hardware going into it can have a pretty decent premium and still be highly competitive with AWS. It will also depend heavily on what you are doing and producing, if the answer to that is a large amount of data, and it needs to transit out of AWS, suddenly the cost of a pretty large datacenter is really cheap in comparison. AWS egress fees have a markup that will make your accountants panic. From a hardware standpoint, once you need GPU compute or large amounts of RAM, the prices get pretty dumb as well.

Oxide seems to be a lot more efficient than a rack full of 1U servers with each having 2 PSUs + 2 ToR switches + 1 management switch somewhere for all the OOBMs. All those little fans and power conversions eat a lot of power, the fans and the PSUs all cost something too. Also, have fun managing all of that in a secure manner or debugging anything at all. Once you add the VMware licensing you might end up with more or less the same cost up front and quite likely higher overall cost. And I am not even beginning to talk about racking/ stacking of the whole rack. I haven't seen much support even when Dell/EMC owned VMware and together produced the VXRail lineup and the company I used to work for was presented as the reference project in Saxony, Germany at that time. All of the boxes would add up to about 2 standard racks but it was representative of the other bigish customers in that area and time.

I imagine, some of the customers will order 1-2 racks half full and over a few years possibly add a few sleds, these will probably demand great GUI/ manual experience and possibly competitive Oracle/ SAP/ MSSQL benchmarks and I can imagine Veeam integreation. Other customers such as the DoE or some big enterprise customers will order whole rows of racks and demand perfect automation options. That is just a guess.

That’s the elevator pitch for open stack

You are not wrong that OpenStack is sort of similar in a sense, but the difference is that Oxide is a hardware + software product, and OpenStack is purely software.

That just sounds like a bunch of api's on top of linux.

Just like Dropbox is a bunch of APIs on top of FTP.

It's a mainframe. If you can't get excited for mainframes it'll be hard to be excited about this.

IllumOS is the OS/360 to Oxide's System/360. (It won't get that popular but it's a fair enough comparison for illustrative purposes)

It's a mainframe, for people who do not, actually, know what a mainframe is or does.

I bet it costs a fraction of what a similarly powerful mainframe would cost. However I don't think the customers for each overlap that much. If you need a mainframe, you need one and there is no discussion about possible alternatives because there are none.

Except that it use the same standard CPU as commodity machine. Doesn't have much of the extra reliability stuff. It can go from vertical to horizontal scaling. The OS is open source Unix. And yeah its not like a mainframe at all really.

Why would people pay a premium?

I would pay a premium just to not have to deal with HPE, DELL, etc

Dell's been nothing but fantastic for us (compute, not storage.)

Dell is a mixed bag depending on how well the individual region you are dealing with is doing overall. Things were great for us, but something changed and now getting good support for hardware failures has been a nightmare of jumping through hoops, time zone handoffs to other teams, and forced on-site techs to replace a stick of ram.

so the idea is vertical integration for on-premise server purchases? On custom OS? Why? Why would people pay a premium?

As I understand it, re: vertical integration, the term is actually "hyperconverged". Here, that means it's designed at the level of the rack. Like -- there aren't per compute unit redundant power supplies. There is one DC bus bar conversion for the rack. There is an integrated switch designed by Oxide. There is one company to blame when anything inside the box isn't working.

In addition, the pitch is they're using open source Rust-based firmware for many of the core components (the base board management controller/service processor, and root of trust), and the box presents a cloud like API to provision.

If the problem is: I'm running lots of VMs in the cloud. I'm used to the cloud. I like the way the cloud works, but I need an on-prem cloud, this makes that much easier than other DIY ways to achieve (OMG we need a team of people to build us a cloud...).

The terminology in this space is confusing, but "hyperconverged" isn't really what we're doing. I wrote about the differences here: https://news.ycombinator.com/item?id=30688865

(That said I think other than saying "hyperconverged" your broad points are correct.)

Having a solid on-prem rack product to me is a great thing. I like IaaS services a lot, don't get me wrong, and I think they're the right pick for a bunch of cases, but on-prem servers also have their "place in the sun", so to speak :) I could present any number of justifications that I don't think I'm qualified enough to defend, but the gist is that at the bare minimum, I'm glad the option exists.

As to why I'm personally excited: I enjoy the amount of control having such an on-prem rack would afford me, and there surely could be a great amount of cost-savings and energy-savings in many scenarios. Sometimes, you just need a rack to deploy services for your local business. I like the prospect of decentralizing infrastructure, applying all the things we've learned with IaaSes.

In the last 10 years and 6 different clients/employers I worked there is pretty much no way to run production on the cloud. Only 1 of them had some stuff running in the (GCP) cloud at all.

Of all of the 6 infrastructures I've seen, only 1 of them is half decent, with 6 dedicated teams around the datacenter working closely together (by dedicated I mean, nothing is required of them concerning the core software product that the company develops). Network, Unix/Virtu, Windows, Storage, PC, and datacenter. That's 30+ people just to run a couple big datacenters and a few more server rooms. The service was actually quite good with VMs/zones delivered under an hour and most tech issues solved in half a day. The other infrastructures were either bigger or smaller, with more or less people, and were all terrible, sometimes needing weeks of email exchanges with excel attached to get a single VM.

AWS was the dream everywhere I went for everybody. Oxide may be coming out with a product that will solve a LOT of issues. SmartOS/IllumOS has all the tech to be self-sufficient (virtualization, storage, SDN...), add support for networking and storage and you get a complete product that a handful of people can run (well, you still need a windows team in most cases but fine).

One company making both HW and SW generally leads to really good, integrated experiences. See e.g. Apple.

I am really hoping the broader industry takes note. By owning the platform, the Oxide team was able to dump legacy stuff that no longer makes sense.

I'm excited to see how this compares to SmartOS. I'm pretty heavily invested in SmartOS in my personal infrastructure but its future, post-Joyent acquisition, has been worrying me.

I really wish I did work for an org big enough to use Oxide's gear. Not having to futz around with bogus IBM PC AT-type compatibility edifice, janky BMCs and iDRACs, hardware RAID controllers, etc, would be so unbelievably nice.

I had been using SmartOS for a long time but finally had to bite the bullet and give up. I ended up deciding on Proxmox on a ZFS root and am quite happy with it.

I've been running smartos at least since 2015 where I co-located my server. There have been times where I felt like giving up, but people like danmcd, jperkin and others always stepped in and fixed what needed to be fixed for LX to be usable and working. (Keeping java updated and running is hard, uphill battle. Thanks!) I always ran a mixture of OS and LX zones and bcantrill's t-shirt with "Save the whales, kill your VM" made sense. I've used zones in Solaris 10 even before and they just click with me. FreeBSD's jails are nice, but far from it. And linux's cgroups are a joke. And using KVM/VMs for security containerization is just insane. At dayjob, I've implemented multiple proxmox clusters, because we're linux shop and there's no way to "sell" smartos or tritonDC to die-hard debian colleagues, but I've managed to sell them ZFS. With personal stuff, I like my systems to take care of themselves without constant babysitting and SmartOS or OpenBSD provide just that. I don't dislike windows, I love UNIX. You could really feel those extra 20y UNIX had compared to linux. I migrated all my stuff to proxmox for like 2 months. And then went back to SmartOS, because there was something missing ... probably elegance, sanity, simplicity or even something you'd call "hack value".

And here I am, having compared the SmartOS documentation and ease of installation to Proxmox... and with very few complaints am using Proxmox to host a file server on bare metal/Samba container and a OPNSense on VM.

I remember buying the OpenSolaris Bible in 2008, getting really excited to dig into my second Unix (after FreeBSD). And then, the Sun went down on me... and I stuck with Ubuntu 10 years.

and I stuck with Ubuntu 10 years.

For a while Nexenta had an Ubuntu running on the OpenSolaris kernel.

I feel the same. I used a SmartOS distro called Danube Cloud for a long time and am looking to move and looked at Harvester[1] and OpenNebula, but with everything I know about Kubernetes(and LongHorn) I'm reluctant to use something so heavily based on Kubernetes.

At its peak I reached out multiple times to Joyent to fix their EFI support for virtualization. The Danube team had similar experiences with them, working on live migrations for VMs, and a few months back I did a rebase of the platform image to a more recent illumos stack.

Two of the fundamental issues with Illumos is that they don't seem to understand that they need to fix the horrendous platform build to get community support to keep up with the pace of development of other OS's. The platform build is a huge nasty mess of custom shell scripts, file based status snapshots, which includes the entire userspace in the kernel build. Basically if your openssl version is out of wack the entire thing will fail. Not because it has to, but because it was never adapted to modern needs of someone just wanting to hack on a kernel. It's fixable, but I don't see any desire to fix it, and even if that desire eventually shows up it might just be too little, too late.

[1] https://harvesterhci.io/

the nice thing about the Proxmox + ZFS setup is that it works and is even recommended without using hardware raid controllers. Less headaches either way.

I recently wrote a guide [1] how to use proxmox with ZFS over iSCSI so you can use the snapshot features from a SAN

[1] https://blog.haschek.at/2023/zfs-over-iscsi-in-proxmox.html

SmartOS is being actively developed since the aquisition from Joyent[1] in April 2022.

We've released a new version every two weeks post acquisition, and are continuing to develop and invest.

We also hold office hours events roughly every two weeks on Discord[2], and would love for you to stop by and ask any questions, or just listen along!

[1]: https://www.tritondatacenter.com/blog/a-new-chapter-begins-f... [2]: https://discord.gg/v4NwA3Hqay

IllumOS needs to attract new developers. To do that, the platform build needs to become a lot more straightforward. It's a pretty huge endeavour in my opinion. I'd be happy to help out on that regard, but in the past Joyent has not been very open to outside support.

I've noticed you make this comment repeatedly when illumos is mentioned on HN. I think you're underestimating the irreducible complexity of the build process for what is essentially a whole UNIX operating system, save for a few external dependencies. It's not just a kernel, but an extensive set of user mode libraries and executables. The build is complex in part because it's a complex body of software.

I also think you're overestimating the extent to which make(1S) is the reason we're not more popular than Linux. There are any number of more relevant factors that make someone choose one operating system or another. Also, certainly for me personally my goal is not world domination, merely the sustainable maintenance of a body of software that helps me solve the problems that I work on, and which I enjoy using and developing as a result.

I agree we need (as do all projects!) new developers, both now, and over the long term. We work as we can to make improvements to the build process, and the documentation. We are a relatively niche project, but we do attract new developers from time to time, and we're making changes at least as rapidly as we ever have in the past. There are a number of actively maintained illumos distributions (OmniOS, SmartOS, Tribblix, OpenIndiana, and now Helios) and there are a variety of commercial interests that ship more proprietary appliances on top of an illumos base. For our part at Oxide we continue to encourage our staff to get involved with illumos development as it makes sense for them, and we try to offer resources and assistance to the broader community as well.

If you would like to contribute, we have a guide to getting started: https://illumos.org/docs/contributing/

Please, though, it's "illumos", not "IllumOS"!

I do, yes, but your comment makes it clear that this is a problem that you either don't really think of as a problem, or that you don't know how to address. Building open source communities is hard work. Telling everyone how amazing your product is(even if it is), is only a small part of it. The lesson to take away from your time at Joyent should be that, that way of community building didn't work, and there needs to be some change.

Even in the early 2000s linux had a make menuconfig or make xconfig setting to build linux. And yes this is different, it's a posix distribution. Yocto was a relatively niche project as well and it also addresses the issue of building a collecting of posix applications into a big project, so does gentoo's stage.

I'm sure that at the time of it's creation OpenSolaris was ahead of its curve, but that's how many years ago? You know as well as I do that sprinkling LD_LIBRARY_PATHs here and there and then removing undocumented dot files here and there isn't really a sane way to handle such a build process for a curious third party. Most will probably drop it before it gets to that point.

There have been many many projects that have reworked their entire build architecture, some of which took years to flesh out fully.

What needs to happen for illumos to get a boost of development in the long term is:

1. first for you to acknowledge on a political level that there is an issue that needs to be addressed here, and

2. to then work with the community, and it doesn't have to be across the board, but you need to be willing to invest in some experts and some people interested in solving this, so they can grind out something that is more sane in this current world.

"Read our getting started guide" isn't really all that useful, when most of the complex issues happen after that and are often met with "this isn't how we do things".

sprinkling LD_LIBRARY_PATHs here and there and then removing undocumented dot files here and there

I obviously don't have any context about the issues you were facing at the time, and I can't really figure it out based on the advice you ostensibly received. I'm definitely sorry if we have lead you astray in the past, but those are not workarounds I would encourage people to use today. If there's some aspect of the build process that requires workarounds like you're describing, it's definitely a bug and we'll fix it when we're made aware as best we can.

As for the rest of it, I think you're putting the cart before the horse on some level. An operating system is a large and complex thing to work on, regardless of whether it's built with make or ninja or bazel or whatever other build tool.

The Rust toolchain is another similarly complex body of software, which also has a large and at times inscrutable build process. I know because I have personally contributed to it, and had to figure out how to get it to work. Rust obviously has more active contributors than illumos, but it also has vastly more active _users_ -- it is a body of software that has broad applicability to many people and the work they do.

For illumos to continue to succeed as an actively maintained project, what we need to do is continue to inspire _users_ to want to use it. Nobody wants to work on an operating system they don't personally need to use at all. We draw contributions today from a mixture of community driven distributions making fixes or adding features, and by people employed by companies like Oxide who have a vested economic interest in the deployment of the software.

None of this is to say that we're perfect, or that we're not trying to improve things. Just that we're trying to put build system improvements in the proper context amongst all the other work there is to do with our limited resources. It's probably more important that we have support for new Intel client NICs like you would find in a modern desktop system, for example, than it is that we replace make. It's important that we continue to add system calls and libc facilities that other platforms have adopted in order to ease software porting. It's important that we continue to maintain modern JDKs and Python and Go and Rust and C/C++ compilers. It's important that we keep up with security issues and the endless stream of mitigations imposed by the sieve-like nature of speculative CPUs.

There's actually quite a lot of stuff going on for us all the time, and we do still find time to improve the build system. If you have more specifics in mind, that's fantastic and we'd love to here about them concretely! I would encourage you to channel your enthusiasm into writing an illumos project discussion (IPD) describing the issues you see and the work you'd propose to sort them out! You can see some examples of existing IPDs at https://github.com/illumos/ipd

And as ever, if you hit issues in the build as it stands, please file bugs! We can't fix things we haven't heard about.

Cheers.

> Oxide is pretty much the company I dream to work at, both for the tech stack, plus the people working there.

Thought I was the only one :P

I mean, I phrase it as "my dream job is systems integration at Sun" (but Oxide is the living equivalent)

Sun had some lovely desktop hardware. They also had SPARC.

I really miss Sun.

Not that I'm not rooting for Oxide, but their product is still so niche and early stage that I can't imagine any actual businesses buying their stuff for a long time. They only just shipped their first rack to their first customer at the end of last summer and it's Idaho National Laboratory. State research institutions are basically the only entities positioned to gamble on this right now.

I hope they sooner or later release a smaller, cheaper, homelab product for people to learn or for startups that will lead to future rack sales or workers.

This is a common request and we absolutely understand the desire, but I suspect such a thing, if ever, will be a long time off. Given that the product is designed as an entire rack, doing something like this would effectively be a different product for a different vertical, and we have to focus on our current business. Honestly it's kind of frustrating not being able to reciprocate the enthusiasm back in more than just words, but it is what it is.

For what it’s worth, there’s a somewhat common view at least in the Linux community that it’s important for hardware vendors to make their tech stack targetable from the office or home. This isn’t to be polite or to make money — it’s to foster adoption among developers, which drives sales.

Some examples:

x86 owned the desktop, workstation and laptop world for a long time. So everyone targeted x86, which made x86 the default in the datacenter. It was hard for ARM to break in and it mostly happened when AWS did it by fiat. If ARM had made some loss-leader actually useful laptops and workstations available, it might have happened sooner.

But x86 largely didn’t deploy AVX-512 in client machines, so people who wrote libraries only used it for fun or benchmarking, so it wasn’t widely used, and most users flubbed it anyway. (And might have gotten it right if they had the hardware on their desk.)

People target Nvidia datacenter GPUs. But people have targeted them for a long time, because they have them in their gaming machines too.

Xilinx used to push free academic gear quite hard, because that was a big lead into people learning how to use their gear.

So, if I were giving Oxide straightforward sales advice, absolutely don’t get distracted with small systems. But maybe, if Oxide thought of it as lead generation, Oxide should do it anyway. If I could buy something small enough to be affordable but big enough to be useful [0], I might get one. And I’d target it with my own stuff, and fix bugs, and evangelize it at little cost to Oxide.

[0] For me, maybe 100-150TB of spinning rust (or cheap NVMe or the ability to attach a JBOD), plus anywhere from 4-64 cores, in a format that works on 120V and fits in, say, 16U or less, at a credible price point, would be quite likely to net Oxide a sale. (Just one sale but still!) It could be sold as a developer thing, and there would be absolutely no expectation that it would perform like the real thing. If I found it awesome, I might buy a couple more. But I would also use it and make things work on it and talk about it, and if a whole bunch of people did this, Oxide might get a bunch of real sales.

(Also, I get the idea behind two SKUs, but can buyers at least configure storage and compute separately? Different workloads need radically different ratios.)

I certainly understand that strategy generally, see also Adobe giving Photoshop licenses to students back in the day so that they'd be familiar professionally. It's just that doing so amounts to building an entirely new product, and as a relatively young startup, focus is more important. We're going deep, not wide. Someday :)

(Also, I get the idea behind two SKUs, but can buyers at least configure storage and compute separately? Different workloads need radically different ratios.)

Right now, this early: no. Sleds have compute and storage located together, so the unit of customization is currently "number of sleds in the rack" which according to https://oxide.computer/product/specifications apparently is currently three, not two, at the moment: 16, 24 or 32 sleds.

You are right that these need to be different for certain customers and workloads, we just aren't ready to support those just yet. We'll get there. Same issue, different aspect.

Until then maybe sell a few to schools to provision for student access?

A future of developers and CTOs who grew up with Oxide!

My friends and I had a small server set up in college, and those were some of the best times of those years. :D

See also Cloud Foundry, where the late arrival of something devs could use on a laptop was probably key to its failure to capture the market for PaaS.

On reading the specs, you’re using 2.91TiB U.2 drives. On the list of “oh my gosh too much engineering and too many stock keeping units,” allowing them to be swapped for the much larger U.2 devices one can buy now seems fairly easy. In case there’s a potential thermal issue, most NVMe drives I’ve checked have active power states that are a bit slower but have reduced power consumption.

But Oxide is small and shouldn’t listen to me unless a customer asks for this.

> This isn’t to be polite or to make money — it’s to foster adoption among developers, which drives sales.

I get that in theory… and it makes sense most of the time. I’m not sure it does this time though. You don’t exactly “target” Oxide as an OS or platform. Rather, you use it to run VMs on. Those VMs are whatever you want. Other than that, I’m not sure what else having a home-lab version of Oxide would look like.

A different competitor to Proxmox?

People target AWS and GCP and Azure, and they write actual code that interacts with them, do test deployments there, and do real deployments there.

Right, but you need to be at a particularly high scale when you need to write code that directly interacts with a cloud provider. I can think of no use case where you are at that level that can also scale down to a home lab setup.

Yes, you can do slimmed down cloud deployments, but you're still not running the (actual) S3 or EC2 backends at home.

At most, you’re talking a K8s based deployment (for a workload that could scale up or down). But that’s also not at the level of working with Oxide directly. And I doubt Oxide wants to get into the business of selling access to their own public cloud.

I appreciate the response, I totally understand and don’t expect it to materialize soon, but am still hopeful that someday it will be a possibility.

can't wait to find liquidated oxide gear on ebay in 2035. all my current homelab gear is "ancient" enterprise gear like R720's etc

We'll have to wait for it to hit Groupon.

It is somewhat niche, but Broadcom's purchase of VMWare now puts 0xide closer to Nutanix in that you can go buy a fully supported virtualization platform from a vendor who welcomes your business. I don't know the actual number, but it seems Broadcom is only interested in enterprise customers with huge annual spends.

With Broadcom’s plan for VMWare, Oxide certainly seems to have had excellent timing here.

Large financial institutions surprisingly are good customers for new, still-untested computing technology.

I would not get surprised if Oxide next customers were a few giant banks and funds.

In my experience, some financial institutions have a very good understanding of risk.

They are able to identify, and most importantly, quantify risk in a way that many businesses cannot.

Consequently, they're able to take risks with new hardware/software that other companies shy away from.

I work at a recently IPO'd tech company. Oxide was a strong consideration for us when evaluating on prem. The pitch lands among folks who still think "on prem.... ew".

Looks like a cloud like experience on your own hardware.

If only it were as cheap as dell...

As did some elements of my own company, but business risks like those are not for fledgling public companies. To be honest, right now anyone in a _public_ company advocating for it at this stage of development should have all of their decision making power removed if not outright be shown the door.

That goes double if it's your CTO...which is exactly what ended up happening with us.

I'm not saying "no, never", but clearly "no, not right now".

Just a small note, but from when we announced this back in October, two customers were mentioned: https://oxide.computer/blog/oxide-unveils-the-worlds-first-c...

Oxide customers include the Idaho National Laboratory as well as a global financial services organization. Additional installments at Fortune 1000 enterprises will be completed in the coming months.

This describes every single product in its early days in existence. If you're planning to launch any other way, you've doomed the company before you even launched. Lucky few survive, in spite of, and that's what contributes to the 9/10 startups statistic.

Lazer focus on the first set of customers that will help you cross the chasm. Only then mass market.

We have historically had private institutions with impactful research labs. Are there any of those still kicking?

My company looked at them, and we were very impressed with the product. The only issue was that they are built for general compute and we really needed the option for faster processors.

What even is Oxide Computer? It makes no sense - it was publicized with all sorts of anti-blob, freedom, and posts about management engines and a sort of alternative to RaptorCS/IBM (which now has blobs again)... Yet most of that stuff is now buried/removed and Oxide Computer is just a hardware platform with unnecessary lock-in. For the bunker of the rich to be able to run their own mini-cloud? Sure. For anything else it seems like a bad design.

Yet most of that stuff is now buried/removed

Nothing has changed with regards to our anti-blob and pro-open source stances. I am not sure what you're referring to here.

with unnecessary lock-in.

What lock-in are you referring to here? The way that things run on the rack is via virtual machines, you can run virtual machines on many providers. We even have a terraform provider so that you can use familiar tools instead of the API directly, if you believe that is lock-in (and that stuff is all also fully open source).

I don't expect anyone to see my comments unless they're really looking since I've been shadow banned for many years now - so I appreciate your reply.

To be clearer regarding my questions:

- What happened to Project X (supposedly coreboot++ for latest AMD CPUs)? It seems dead, despite being more reported on than Oxide's attempts in working with AMD (to achieve the same outcomes, presumably - what's the difference?). Loads of well meaning people have approached this with virtue, innocence and skills; perhaps another approach is needed that fully respects the dynamic between the user, the chip manufacturers and the governments and banks they're in debt to.

- Does Oxide attempt to sandbox, completely remove or 'verify as benign' aspects like the PSP? For example, if someone could verify that the PSP cannot possibly be affected over the network, then peace of mind could be more affordable regarding things like supply chain attacks and bad actors with AMD/Intel/Apple management engine secrets.

Not referring to software lock-in, just hardware. And it isn't very nefarious like other hardware lock-in (serialization, see Rossmann Group). Just hardware on the rack-level: replacing oxide gear & upgrading oxide gear (not sure about repair, that could be easy). And if the offering were of a less blobby architecture, then many of us would be happy to pay a bit more for the hardware as a system. However, if the hardware platform is FOSS, then it won't be unnecessarily difficult to mix and match and integrate the Oxide gear with other DC-class gear.

So, your comment was not dead when I saw it. This reply was, but apparently now has been vouched for.

What happened to Project X (supposedly coreboot++ for latest AMD CPUs)?

I don't recall what you're referring to specifically, maybe this was a thing before I started at Oxide. I do know that we deliberately decided to not go with coreboot. I believe the equivalent component would be phbl[1]. It boots illumos directly. Bryan gave a talk about how we boot[1][2] with more reasoning and context.

Does Oxide attempt to sandbox, completely remove or 'verify as benign' aspects like the PSP?

The general attitude is still "remove or work around every binary blob possible," but the PSP is unfortunately not able to be worked around.

However, if the hardware platform is FOSS

We fully intend to do this, by the way. Just haven't yet. It'll come.

1: https://github.com/oxidecomputer/phbl

2: https://www.osfc.io/2022/talks/i-have-come-to-bury-the-bios-...

2: https://news.ycombinator.com/item?id=33145411

To see more on "Project X", see the Phoronix article on it. At the very least, it would be resourceful if the Oxide devs had a chat with the Project X devs who have since given up - learnings can be had and time can be saved. And yes, coreboot itself is now untennable, but is also kind of a slang for the a category of deblobbed software.

If you're referring to this: https://www.phoronix.com/news/Project-X-AMD-Zen-Coreboot

This seems to make no mention of Oxide at all. Perhaps you're connecting two different unrelated organizations together as Oxide appears to have never had any relation to it. I think you're just perhaps confused about what the situation is.

Is it possible you're thinking of the more recent AMD OpenSIL initiative?

For what it's worth, you don't seem to be shadowbanned as far as I can tell. Your original post seems to be dead due to downvotes, but this one seems to be in a totally normal non-dead non-shadowbanned state.

He is banned; his comments here are only alive because I rescued them.

Thank you for explaining!

It's a mainframe. You use it like you use mainframes, but probably easier, as they're adopting more modern functionality. You won't be aware you're using it, just like you aren't aware when you use a zSystem.

I'm really curious: what kind of workload would companies want to run on a custom Unix that isn't Linux/Mac/BSD?

I'm rooting for more mature OS diversity, I just have no idea who the end users would be and what their needs would look like.

This is not a user-facing detail of the product. Customers run VMs on the rack, they do not build their applications for illumos. They're gonna run whatever operating system in those VMs that they need to accomplish their goals.

That being said - there's something to be said for enterprise support. Are there plans to support importing/converting/running third party OVAs? Many vendors will support something running in KVM, I can't recall the last time I saw b-hyve as a supported hypervisor.

I'd imagine as broadcom slowly destroys vmware's market share vendors will look to alternatives, but I doubt b-hyve is even a blip on their radar at this point.

OVAs are basically ZIP files with some XML. If you want, you can convert an OVA to RAW image or VMDK or whatever the latest fancy format is, and bhyve can boot that for you. Better to use Raw.

bhyve, unlike other "famous" hypervisors is pretty stable, has good enough virtualized drivers (altho I'm sure Oxide has made it better) and can boot a VM with 1.5 TB of RAM and 240 vCPU[1]. Something I was not able to do with anything other than bhyve.

I know this is HackerNews, so I have to say it, marketing != engineering. Just because the FreeBSD project's marketing suck, doesn't mean engineering is bad. usually it better than the mainstream ones.

1: https://antranigv.am/posts/2023/10/bhyve-cpu-allocation-256/

There's two parts to the question. There's the file itself and the underlying hardware. The wiki is pretty light on details, does it actually support emulating the same hardware as vmware? I'd assume no to the vmxnet devices but Intel E1000? Adaptec SCSI adapters? Similar USB and VGA?

A lot of the vendor-provided OVAs cut out a bunch of hardware support with the assumption that they only need to support vmware emulated hardware.

I don't know the status of supporting OVA as a file format, but we absolutely support creating and uploading your own images. Here are the current docs on how to do so: https://docs.oxide.computer/guides/creating-and-sharing-imag...

The compute you'd provision on the Oxide rack are virtual machines, they've ported bhyve from FreeBSD and added live migration. I'm pretty sure you could even boot Windows Server on it if you were being held hostage.

As for why they used Illumos, many of the people came from Sun, Joyent, etc. so there's an obvious bias. However they do have a compelling reason that this is not an IBM compatible x86 personal computer, there's no BIOS, no UEFI, no traditional BMC, as far as I can tell they've removed as much proprietary firmware and binary blobs as they could possibly remove, while still using modern x86.

Each sled has a service processor and a hardware root of trust that directly boots the CPU, loads the AMD training blob, and boots the OS. It would be difficult to upstream the changes required to do that into a Linux or BSD for a computer only you currently have. So you'd have to maintain your own downstream fork, there is no one else responsible for the robustness of the OS, so it might as well be OS that you have had to support and develop for years.

ZFS is native on illumos, and the containerization equivalent, etc, is pretty great.

There's a good argument that your servers in the cloud don't need to be on the same OS, as long as you can hire enough talent to work on them.

You'd have no idea that it isn't Linux. You don't run code on this OS, you run code on VMs that it provides.

I’m unfamiliar with illumos so I went to their webpage and the very first thing it says is:

illumos is a Unix operating system

Is illumos an actual Unix (like macOS) or a Unix-like OS (like GNU/Linux)?

Legally, NetBSD isn't actually Unix. The brand doesn't mean what people seem to think it means.

Right, "unix" roughly means

- Derived from Bell Labs unix source

- Legally allowed to use the UNIX trademark (AKA certified Unix)

- A unix-shaped OS (similar but not 100% the same as POSIX complacence)

and those things are basically independent. Most GNU/Linux are unix-likes but not derived from original unix code or certified, but there's been 1-2 that did get certified. The BSDs are (now quite distantly ) derived from unix source but not certified (although ex. UnixWare is IIRC). Solaris was all 3 but OpenSolaris and now illumos are obviously unix-like and still based on the original code but not certified UNIX™.

(Take all this with a grain of salt; I'm typing this all from memory and IANAL)

This isn't NetBSD. NetBSD broke off loooooooong after the release of BSD that Sun used to build this OS.

Actual Unix. Wikipedia is pretty good: https://en.wikipedia.org/wiki/Illumos

It is based on OpenSolaris, which was based on System V Release 4 (SVR4) and the Berkeley Software Distribution (BSD). Illumos comprises a kernel, device drivers, system libraries, and utility software for system administration. This core is now the base for many different open-sourced Illumos distributions, in a similar way in which the Linux kernel is used in different Linux distributions.

Actual Unix. I believe it is in the Solaris family.

Nobody's paid to have it pass Open Group Unix Branding certification tests

https://www.opengroup.org/openbrand/register/

so it can't use the UNIX™ trade mark.

But it's got the AT&T Unix kernel & userland sources contained in it.

PDP-11 Unix System III: https://www.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/ut...

IllumOS: https://github.com/illumos/illumos-gate/blob/b8169dedfa435c0...

It was an open source branch of Solaris that Ian Murdock worked on while he was at Sun under the name Project Indiana. It descends from UNIX SVR4.

Can anyone ELI5 what Oxide's offer is? I've looked at their website and still got no clue. Is it hardware + software I can purchase and use on-premise? Is it a PaaS / yet another cloud provider?

On-prem, fully-integrated compute and storage solution with cloud-like APIs to provision resources, all with a commitment to open source.

Mainframe 2.0

This is really not accurate in any way that matters I don't think. It's a mainframe in as much as you buy a rack and spec it out. It's not a mainframe in that the performance is typical server performance rather than the mainframe profile which is very different and requires different considerations, the compute model is typical server compute rather than the mainframe compute model which (aside from compatibility layers) is a radically different environment to build software for.

Do you know if they support GPUs or whatever is needed to host LLM models?

The current product does not have any GPUs in it. https://news.ycombinator.com/item?id=39183072

I believe you're being downvoted because there is already a big thread about this here, though I think that's a bit unfair to you. I haven't posted in that thread yet because I wanted to let others say what is meaningful about the product to them, but this seems like a good place to put my reply. Regardless of all that: it is hardware + software you can purchase and use on-premise, that's correct.

The differentiator from virtually all existing on-prem cloud products is that we are a single vendor who has designed the hardware and software (which is as open source as we can possibly make it, by the way, hence announcements like this) to work well together. Most products combine various other products from various vendors, and are effectively selling you integration. We believe that that leads to all kinds of problems that our product solves.

Another factor here is that we only have two SKUs: a half rack and a full rack. You don't buy Oxide 1U at a time, you buy it a rack at a time. By designing the entire rack as a cohesive unit, we can do a lot of things that you simply cannot do in the 1U form factor. There is a running joke that we talk about our fans all the time, and it's true. Because our sleds have a larger form factor than a traditional 1U, we can use larger fans. This means we can run them at a lower RPM, which means power savings. That's the deliberate design choice. But we also have gained accidental benefits from doing things like this: lower RPM also means that our servers are way quieter than others. That's pretty neat. Some early prospective customers literally asked if the thing is on when it was demo'd to them, because it's so quiet. Is that a reason to buy a server? Not necessarily, but it's just a fun example of some of the things that end up happening when you re-think a product as a whole, rather than as an integration exercise.

Thanks so much for elaborating!

License

MPL 2.0 is an interesting license choice, for an operating system.

EDIT: why the downvotes?

Quoting from an RFD co-authored by bcantrill and myself describing Oxide's policies around open source:

For any new Oxide-created software, the MPL 2.0 should generally be the license of choice. The exception to this should be any software that is a part of a larger ecosystem that has a prevailing license, in which case that prevailing license may be used.

EDIT: I also am confused about why you are downvoted. Are there any major operating systems distributions that are MPL licensed? I can't think of any off the top of my head. Beyond that it's a simple question.

I had to look up RFD, and I like the idea!

https://oxide.computer/blog/rfd-1-requests-for-discussion

Ah thanks! Yeah I should have mentioned this in my comment, thank you for adding the context.

By the way, you can browse public RFDs here: https://rfd.shared.oxide.computer/

I didn't include any links to any RFDs in my comments today because I have only been referencing non-public ones.

[if Oxide-created software] is a part of a larger ecosystem that has a prevailing license, in which case that prevailing license may be used

How does that work if the prevailing is BSD/MIT/ISC?

You're saying that Oxide can then be licensed under BSD/MIT/ISC?

So I decided to cut off my quote but the next line has the answer:

For example, Rust crates are generally dual-licensed as MIT/Apache 2.

We often produce components that we share with the broader open source world. For example, dropshot[1] is our in-house web framework, but we publish it as a standalone package. It is licensed under Apache-2.0 instead of MPL 2.0 because the norm in the Rust ecosystem is Apache and not MPL.

You're saying that Oxide can then be licensed under BSD/MIT/ISC?

I am saying that we do not have one single license across the company. Some components are probably BSD/MIT/ISC licensed somewhere, and I guarantee that some third party dependencies we use are licensed under those licenses. That's different from "you could choose to take it under BSD," which I didn't mean to imply, sorry about that!

1: https://crates.io/crates/dropshot

MPL 2.0 has been the preferred license for CTO Bryan Cantrill and crew for more than a decade:

“And because any conversation about open source has to address licensing at some point or another, let’s get that out of the way: we opted for the Mozilla Public License 2.0. While relatively new, there is a lot to like about this license: its file-based copyleft allows it to be proprietary-friendly while also forcing certain kinds of derived work to be contributed back; its explicit patent license discourages litigation, offering some measure of troll protection; its explicit warranting of original work obviates the need for a contributor license agreement (we’re not so into CLAs); and (best of all, in my opinion), it has been explicitly designed to co-exist with other open source licenses in larger derived works. Mozilla did terrific work on MPL 2.0, and we hope to see it adopted by other companies that share our thinking around open source!”

https://bcantrill.dtrace.org/2014/11/03/smartdatacenter-and-...

Also discussed around 38 minute of https://youtu.be/Zpnncakrelk?si=DkSW6CM_MS-q1Gyd

Although not explicitly stated there are like deeper roots here “The one important exception to these generalizations is Sun Microsystems' CDDL, which was a true improvement on MPL 1.1, and which continues to cover a substantial amount of important open source software. … I encourage Oracle, the current CDDL steward, to consider relicensing its CDDL code under MPL 2.0, which is as worthy a successor to CDDL 1.0 as it is to MPL 1.1.” from Richard Fontana’s article at the time of the MPL 2.0 release, https://opensource.com/law/12/1/the-new-mpl

With its compatibility with strong, older copyright licenses I’m surprised the license has not had more widespread adoption. It is a not too hot, not too cold porridge of a file level copyleft and CYA OSS license with the strong backing of Mozilla.

I would be interested in how did you first hear of Oxide.

I somehow landed on their podcast because it covered <whatever the hell I thought was interesting at that moment>.

The podcast is for me amazeballs marketing - it does everything but sell their product (might be a good idea to add a pitch in for each out-tro!)

I mean they talk about it, like “we had such a tough time getting the compiler to do something something and then veer off to discuss back in the day stories.

Ah never mind. Keep talking guys hope it works out

Was following @jessfraz on Twitter back then, so I got word of Oxide when they first announced it there.

Between Jess, Bryan, and Adam, it was hard to miss :-)

If you listen to their original podcast 'On The Metal' it was infamous for it's overly repeated use of 2 or 3 pre-recorded self promotions, so much so that a fan recorded their own commercial for them to air.

'Oxide and Friends' however isn't really what I would consider a podcast, but a recording of live "spaces" or group calls, beginning on Twitter and now happening in Discord. IMO it's not really best consumed as a podcast, but rather to participate in live. If you tune in live you'll pick up on the vibe of the recordings a lot better.

https://oxide.computer/podcasts/oxide-and-friends

For me, it was when Pentagram showcased their branding when Oxide was first announced.

Sweet:) And a big thanks for writing what appears to be clear and straightforward documentation; IMO that's an area that the illumos community has historically struggled with. And seeing a new source release talking about consolidations gives me the warm fuzzies, even if this does seem to depart from the traditional gate paradigm unless I'm seriously misreading the repo organization here.

Some (mostly tooling) questions:

- Why gmake? Especially since dmake is needed later anyways?

- Instructions say run rustup with bash explicitly; is that a defect in upstream, or is the local sh not completely posix compatible?

- How is this developed internally? Do Oxide folks run illlmos workstations or is this all developed in Virtual machines or SSHed to servers?

- Why MPL? GPL compatibility?

I can't answer all your questions, because I don't actually work on helios, but I do have an answer to some of them:

Do Oxide folks run illlmos workstations or is this all developed in Virtual machines or SSHed to servers?

I wrote about this topic here: https://news.ycombinator.com/item?id=39181727

That said, some folks certainly run illumos on a workstation.

Why MPL? GPL compatibility?

On MPL: https://news.ycombinator.com/item?id=39181844

That said in that comment I didn't really speak to the "why." We feel like it's a good compromise in the possibility space: more copyleft than BSD, but also less restrictive than the GPL.

FWIW, though for historical reasons we use dmake to build the core operating system, I tend to recommend people use GNU make (gmake) for new Makefiles in other consolidations. It's broadly available (including on other platforms) and has more modern features.

is that a defect in upstream, or is the local sh not completely posix compatible? AFAIK it's an issue with upstream. Just like most open-source projects, there is Linuxism/Bashism in there.

It is great that the software is open-source, but would be you useful to be deployed on other hardware?

And what would happen if, for whatever reason, a company can no longer purchase Oxide racks, will it need to start over its infra, or can it just build around Oxide hardware?

It is not likely that it would be immediately useful outside of our hardware, but the main thing they're doing is deploying virtual machines. If they decided to no longer use the Oxide rack they have purchased, they would move their VMs to whatever infrastructure they choose to succeed it.

Yeah, we definitely only intend Helios to run on either the Oxide rack or in service of software engineering work surrounding the rack (that's what we use the ISO installers and virtual machine images for).

If you're interested in a distribution targeting end user use of illumos on servers, I would absolutely recommend looking at OmniOS! Helios is very closely based on OmniOS r151046 LTS, and we use that LTS of OmniOS release directly for non-Oxide-rack infrastructure systems inside Oxide as well.

I was hoping for this since they announced the server rack... nobody wants a paperweight if (God forbid) oxide were to go out of business.

To be clear about it, the "paperweight problem" is very important to us as well. It's worth remembering that the MPL doesn't care if a copy is posted openly on GitHub or not, and (I am not a lawyer!) we have obligations to our customers under it regardless if non-customers can browse the code.

oh wow ... can't wait to waste a bunch of time trying to get this running in hyper-v.

I would recommend trying SmartOS or OmniOS instead, since the Oxide rack isn't filled with IBM compatible personal computers on sleds, and they have no BIOS or UEFI.

Oh the feels!

The name is, by unfortunate coincidence, also used by another operating system's microkernel[0].

0. https://sr.ht/~sircmpwn/helios/

I really hope Oxide succeeds. I thought they were crazy when they announced what they were going to do. It's not the kind of play you see from a start up, but they were determined. Most folks that started the race at that time are out. I hope their computer is ready to deploy GPUs. Deploying multiple GPUs today is a freaking pain.

I really want one of these racks in my bedroom. Unfortunately, somehow I think I couldn't afford one ;)