When people learn about distributed systems outside of work, how do they actually get hands on experience with it (assuming they don't go spinning up a bunch of machines on aws/gcp/azure/etc)? I find it easiest to learn by doing, writing simple proof of concepts but that seems a bit harder to do in this area than others? What is the hello world/mnist of messing around with distributed systems?
Frankly, just build something.
Use a small k8's distro (kind, minikube, k3s) and build something that talks amongst itself and is resilient.
Sites like leetcode are great for coding and improving, because you get to compare your solutions to those of others. Sadly just building something on your own helps you learn the moving parts, but not optimal, neat, or best practice solutions.
Sites like leetcode overindexes on the rote abilities that you can stamp out. Actually building something is exploring in a creative way which builds a deeper understanding.
If you want to be "optimal, neat, or best practiced" read a book, and get stuck in tutorial hell. If you actually want to learn how to do something, literally go do it. Nobody has ever built anything of value (whether that is financial, intellectual, or emotional) by leetcoding.
Any suggestions for books to read?
What you should read or start with are the designs for Cassandra, Kafka, Foundation DB, etc.
The problems they're trying to solve are related to really large distributed systems that fail a lot, and their design decisions are basically a "this is how we worked around that problem."
You can also look for the LISA archives (https://www.usenix.org/publications/loginonline/thirty-five-...). System administrators were the first people that had to deal with large distributed systems at scale, and university system administrators led the charge.
You might want to hunt down the comp.sys.admin archives (I can't remember the newsgroup anymore).
Most of the ideas and issues behind distributed computing are obvious if you think about it. Many of the actual implementation and mitigation of those are not obvious, though.
And there's also the client side of distributed computing, which I don't think is discussed as much.
As an example, exponential backoff is one of the go-to techniques for clients when the servers are under load. Unfortunately that doesn't really work IRL, because instead of spreading the load you get waves of load coming back over and over. Likewise on the server side you have problems with peak load.
The only one I would really suggest is Designing Data Intensive Applications. But it is very DB centric.
https://www.amazon.com/Designing-Data-Intensive-Applications...
"Just build something" is good advice but it's easier to find some kind of fun thing to build with, say, a web framework that's educational and maybe not a complete throwaway either.
Maybe some Internet of Things applications would provide a good avenue for some distributed systems exploration?
The easiest way is to fire up a bunch of VMs.
The cheapest way is to pick up an old ThinkStation (or other tower), load it up with 128GB (or more) of ram and install ESXI on it. That's a perfectly good baseline, and you can run about 30 4gb linux VMs on it.
Ideally you'd have a bit less than 1 core per VM, just so it's a bit slow. Lots of people assume your nodes are quick, but in real life they may not be. And really, most of the time your machines won't be doing squat.
You might want to have SSDs in there too, because ESXI doesn't have RAID capability (or at least mine didn't). I don't think you can get a cloud device that uses spinning disks anymore, and you wouldn't use it in real life anyway.
A 2tb drive is cheap these days, or just slap all those old small SSDs in there. Everyone has a bunch of those small SSDs left over, and they're perfect.
ESXi needs the RAID to be handled by another device, the simplest case is a hardware RAID card with disks locally attached to it. You can also attach remote disks/volumes from other systems, with or without RAID, over the network/SAN/etc using an HBA, special network card, or the software iscsi initiator stuff in ESXi. You can even have something like a windows server act as the iscsi volume host, and attach to it over the normal network if you don't really care about reliability. The ESXi OS will not appreciate it if you ever turn the remote volume host system off, or if the network drops out. It's really too bad the free and cheap ESXi licenses are going away, it was always so nice to work with...
That's right, Broadcom bought them and the party's open. Download your ESXi while you can!
Most systems are "distributed" actually, even your CRUD apps and CLI tools that write to disk. The better question is "how do you learn to deal with distributed systems intricacies in places where it matters (such as finance)?", the answer for which is super simple;
Yeah, people miss this. If your app interacts with another app - bam distributed.
Its almost as if the world isn't single threaded
FWIW, I built hraftd[1] many years ago to make it easy to play with a simple distributed system, but one that uses a production-grade implementation of Raft[2]. You can spin up a cluster in seconds on a single machine, kill nodes, watch a new Leader get elected, and so on.
It's written in Go, so it'll help if you are familiar with Go. But the code is not difficult to understand even if you don't.
[1] https://github.com/otoolep/hraftd
[2] https://github.com/hashicorp/raft
Oh, and more background here: https://www.philipotoole.com/building-a-distributed-key-valu...
I don't think it's a trivial thing to do outside of work. At most you can play with kubernetes and cloud but in an interview the lack of experience will come out because I think some stuff can only be learned at work. Especially scalability.
Some people are just up front about it - I've read a lot, and practiced the best I can, but am looking for some real world experience to marry that too.
I've been using libvirt on a decently large Linux box I keep under my desk.
I have my current thing set up to create VMs from a downloaded cloud-VM image with minimal updates (add my ssh pubkey and install Python), and then use Ansible for everything further.
You take traditional non-distributed systems and push them to their limits in some regard.
"Singularity" systems are an abstraction afforded to us by the grace of the hardware we run them on. If you start pushing their performance hard enough, however, you inevitably get distributed behavior.
This is also a good potential career reason to try to make software which is as performnt as possible - you'll get all the tasty edge cases and complexity war stories to talk about.
By designing twitter in a 45 min interview.
It's not as easy as playing with more 'normal' stuff, but I usually use VMs on a local hypervisor like ESXi, or a bunch of old desktop/server hardware if I have enough space/power/cooling at the time. Winter helps, big stuff often runs loud and hot. To get specialized hardware when needed, ebay or 'trash' from work and such can help a lot.
Gossip Glomers might be fun if you’re looking for some hands-on exercises :)
https://fly.io/dist-sys/
1. A bunch of communicating local processes.
2. A bunch of communicating local VMs (easier with a beefier machine like my current desktop).
3. Mininet (there are other options) to simulate a network environment, can fully control the topology very easily. Lighter weight than (2), more control for simulating different network effects than (1) alone.