Challenging projects every programmer should try (2019)

While writing a text editor, a compiler, an operating system, or a raytracer might make you a better programmer, it won't make you a better software engineer. In fact, it might make you worse at software engineering, because it embodies the disastrous "Not Invented Here" doctrine.

Hackers like to obsess about Big-O, data structures, HoTT, and other high-theory stuff, yet the following skills, essential for software engineering, are almost never discussed and even more rarely practiced:

- Deciding what to write yourself and what to take from a library

- Identifying high-quality libraries and frameworks that meet your project needs

- Deciding where optimization is worth the effort and where it is not

- Writing code that will still be readable to you (and others) a few years from now

- Thinking about the project as a large-scale, complex system with software and non-software dependencies

In that spirit, I offer the following alternative challenge: Create a web search engine. Don't bother with string matching algorithms etc., others have already done that for you. "Just" make a search engine (and crawler) that can actually work, even if it only supports a subset of the web and a single concurrent user at the beginning.

I don't understand why you think a search engine requires the use of "real engineering skills" like picking and choosing libraries and identifying which opportunities will yield fruitful optimization.

Literally all of the listed projects, text editors, compilers, operating systems, and ray tracers, can exercise the exact same activities.

I'm more inclined to think that your comment is really more revealing about what kind of knowledge is in your own personal wheelhouse. Having worked on a ray tracer that was subsequently used on many feature films, I can assure you that all of your bullet points apply to that project, too.

It's much more a matter of whether you want to do something small scale and fun, or whether you want to suck all the joy out of it by applying the same soul crushing constraints we already get paid to do in our day jobs. Bleh.

Literally all of the listed projects, text editors, compilers, operating systems, and ray tracers, can exercise the exact same activities.

In the linked article, these projects are all explicitly described as opportunities to learn about low-level stuff like how to efficiently store editable text. The difference with a web search engine is that nobody today can build such a thing completely from scratch, therefore it forces you to give up the toxic NIH mentality, which in my experience is usually driven by elitism and ego (on full display in this comment thread), not by some lofty desire to learn something new.

The difference with a web search engine is that nobody today can build such a thing completely from scratch

I'm sorry, but can you substantiate this claim? I've seen no indication that a search engine is not buildable from scratch at all.

Sigh ... Of to the silicone mines it is then ... again /s

You mean, the beach?

It is a joke based on needing to define "from scratch". Similar to, "how do you bake a cake from scratch? First you must create the universe." Op is gathering the raw ingredients to start fabricating his chips.

Yeah, that’s sand. :)

Assuming you get the TCP/IP stack for free, you still need to build fully-featured HTTPS and a "webscale" multi-server database for document storage from scratch. The crawler is easy and so is something like PageRank, but then building the sharded keyword text search engine itself that operates at webscale is a whole other project...

The point is that it's too much work for a single person to build all the parts themselves. It's only feasible if you rely on pre-existing HTTPS libraries and database and text search technologies.

Simple methods of search like exact matching are very fast using textbook algorithms. There are well known algorithm like suffix tree which could search in millions of documents in milliseconds.

I enjoy reading your discussion and just wanted to add that some people write a big scale software from scratch nowadays - for instance Marginalia for web search and Andreas Kling and team for operating system and web browser

Marginalia, which I'm a big fan of, is not "written from scratch" (it would be stupid to do so). Check the project on GitHub, it has lots of third-party dependencies.

I definitely lean more toward NIH than what's conventionally considered wise, but most of the time it's not NIH for the sake of NIH.

I do pull a lot of libraries, but an enormous amount of what the search engine does is very much built from scratch. The libraries generally deal with parsing common formats, compression, serialization, and various service glue like dependency injection. I think the number of explicit dependencies is a bit inflated by the choice to not use a framework like springboot, which pulls many of the same (or equivalent ones) implicitly.

What makes the search engine a search engine, the indexing software (all the way down to database primitives like btrees etc.), a large chunk of the language processing, and so forth; that's all bespoke. I think it needs to be. A lot of existing code just doesn't scale, or has too many customizations that would add unnecessary glue and complexity to my own code.

I'm going to echo SerenityOS Andreas and suggest that it's a skill like any other. If you shy away from building custom solutions to hard problems, you will never be good at it; and it will become a self-fulfilling prophecy that these NIH solutions are too hard to build.

At the same time, there's a time and a place and you should indeed be judicious as to when to roll your own solutions, but maybe that time and place is exactly in a hobby project like the ones suggested in this thread (and is how my search engine started out; a place to dick around with difficult problems).

I'd also add that being able to tackle problems yourself, rather than needing a library to do all the heavy lifting at all times, is a great enabler. Sometimes there is no adequate library, but that doesn't mean the conclusion has to be "welp, I guess we can't do that yet..."

I guess the first step in writing large scale software is to become Swedish!

Long cold winters dimly lit by a flickering CRT.

A simple search engine is certainly doable from scratch in a matter of weeks, complete with most things expected from a search engine.

Similar to compilers or tiny OSes, one can go as hardcore as necessary, or just stick to basic stuff.

Of all the typical personal challenge style projects, databases are probable the hardest to build, and even that is not impossible.

making a database is easy. making a fast database is what's hard.

even implementing all of sql92 or datalog without concern for efficiency is fairly complex

implementing a toy datalog should take no more than a week.

maybe less

Back when online learn-to-code courses like Codecadamy and Udemy were a fad, I remember that one of them (and unfortunately I don't remember which, and Google, ironically, turns up nothing) taught how to build a search engine in Python from scratch as a first project for complete beginners. I thought it had a reasonable level of complexity for this task.

You can still find search-engine-from-scratch courses on Udemy, complete with all the necessary algorithms [1].

[1]: https://www.udemy.com/course/build-a-search-engine-with-pyth...

was it udacity's cs101?

Yes! Thank you.

Incidentally, the first Google search result for

"introduction online cs course that uses Python the build a search engine from scratch"

is udemy CS 101

Even through my smartphone autocorrect mistakes of 'introductory -> introduction' and 'the -> to'

elitism and ego

You know those car guys who will rebuild their engine, just because? Or those retro computer guys that will recap an ancient board rather than buying a modern pc?

For you it is a job, for me it is a hobby. I have no interest in making something 'professional' I want to take it apart to understand how it works. Want to understand how a text editor works? Write one.

What component of that is ego?

It seems to me you're the elitist, you're not far off saying mere users shouldn't be allowed to modify their own software, shouldn't be allowed to install software that hasn't been okayed by the people who know what they're doing.

What component of that is ego?

Your lack of constraints on personal exploration in software is interpreted as hubris by people who came to software seeking high paying regimented recipe-following.

Please can you define "from scratch"?

I encountered some push back on this in my "code editor from the ground up" post[1]. I think the only reasonable definition of from scratch is:

Does not have domain-specific dependencies.

So a code editor based on ACE or CodeMirror would not be from scratch, obviously, but one that involves writing all of the domain-specific logic would be. Using generic libraries doesn't stop something being from scratch. (In my case Tree-sitter is arguably domain-specific, but an early version did use a hand-coded JavaScript tokeniser in its place.)

[1] https://news.ycombinator.com/item?id=34577246

> The difference with a web search engine is that nobody today can build such a thing completely from scratch

Really? I think there are a couple out there. The main issue today is scale, but you could constrain that by limiting your crawler to a fixed set of sites.

If you want to build a compiler from scratch, you must first invent the universe. Peeling back abstractions to see how things could or should work is perfectly fine, even for professionals.

Case in point, I've spent a year excising bloated frameworks from my stack at work and replacing the few corners we needed from those frameworks with, e.g. 50 lines of curl calls. The C compiles instantly and is tailored for our tiny use case, produced much quicker delivery on our one related feature we wanted, and removed chains of dependencies.

Being reliant on far-too-abstract libraries and frameworks to do simple jobs is also a curse. But nobody at work has the experience to know that curl was sitting right there on our image available for our use. And nobody has that experience because nobody took the time to build something from low level libraries. Now we know how and can make an intelligent decision without defaulting in either direction because we were afraid to try.

While nih is ultimately dependent on multiple different aspects, I would argue that, ironically, it's more likely to create nih-qualifying product with your aproach.

Because when you write things from scratch, you actually have the space to innovate. But when building product with preexisting puzzles, there is much less space to actually make any usefull changes that would make your product actually standout from existing alternatives (which nih is all about)

Well, if I’m going to build a web search engine from scratch don’t I first need to write a compiler from scratch? Which means I first need to write an editor from scratch…

It's much more a matter of whether you want to do something small scale and fun, or whether you want to suck all the joy out of it by applying the same soul crushing constraints we already get paid to do in our day jobs. Bleh.

Amen. And further, what better prepares a programmer to assess the relative costs of implementing a thing vs using a library providing that thing than having attempted an implementation?

Learning by doing is a valid approach, and this can even be called fun.

I'm more inclined to think that your comment is really more revealing about what kind of knowledge is in your own personal wheelhouse.

That's my read as well.

This is like telling a student learning an instrument never to practice scales or études because there are other skills necessary for playing in an orchestra. Those other skills are important, but you have to develop your chops at some point, and being a better a programmer will certainly help with basically all those skills you listed anyway.

This is like telling a student learning an instrument never to practice scales or études

No, it's like recognizing that an orchestra conductor doesn't need to play every instrument (or even any instrument!) in order to make music.

Or alternatively, that a violin player doesn't need to understand the physics of acoustic dispersion in order to be the best in the world at playing the violin.

Yes a software engineer understanding data structures and runtime complexity is akin to a violin player understanding the physics of acoustic dispersion.

Is it not? I can't tell if you're being sarcastic or not.

The analogy would be a software engineer having to understand resistors and capacitors and transistors.

Also, I don’t think you’ll find a single conductor on this planet who is not skilled in at least one instrument. Terrible analogy.

I would say, given how absurd the statement is, sarcastic.

So here’s the thing… I did an EE/CS dual degree. I’m going to focus on the EE side of it for a bit.

We learn the high level practical behaviour of devices: resistors, capacitors, BJT and MOSFET transistors, opamps, logic gates, etc in 2nd year. What we learn there is enough to be an effective practical circuit designer.

We also dive deep into how that stuff works under the hood. One course I remember fondly started the first lecture with “This course is called semiconductor physics. To understand semiconductor physics you need to understand solid state physics. To understand solid state physics, you need to understand quantum mechanics. We have a lot of material to cover so let’s get started.” A similar course was Electric and Magnetic Fields, where we spent a ton of time applying vector calculus to situations of varying complexity to get a better understanding of how the things we learned in 2nd year actually work and some of the limitations that you wouldn’t pick up on from the practical (generally first-order) models.

You don’t get to graduate with an EE degree without learning this stuff. And in my mind, that stuff is like a mixture of data structures, runtime complexity, assembler, and cache coherency. Fundamental understanding that while you won’t use directly in your day-to-day work but underpins everything you do.

The job market for EEs...still isn't as hot as for CS people. However, you could also focus on DSPs in EE and use it to get a heads up on machine learning. This assumes you are ready to specialize, however. Anyways, none of it should go to waste.

an orchestra conductor doesn't need to play [...] any instrument! in order to make music

Ok, this is clearly a side-topic AND at the risk of being pedantic: Is this actually true?

Like, I can see how theoretically one could learn to sight-read music well enough to be able to direct an orchestra of individual musicians, then do enough ear-training to identify enough notes to keep tabs on everyone (especially if you're conducting a high school or (god help you :) ) middle school orchestra), etc, etc.

But does anyone actually do that? Has anyone ever done that?

"I'm going to learn how to conduct an orchestra without learning any instruments" kinda feels like "I'm going to become a software engineer using an LLM instead of learning any foundations)"

(To be clear - the LLM path might be viable in the future, possibly the near future, but at least today it's not quite there. My apologies in advance if my analogy doesn't work in the future :) )

No, it's not true, there are no prominent conductors that cannot play an instrument (or sing at a high level).

A conductor should have a deep understanding of music, theory, and rehearsal pedagogy. At least in current western schools of music I don't see how you would explore these topics outside of the study of an instrument. Maybe there is some esoteric path to this; I just can't imagine how you go to Berklee and begin to explore the nuances of a composition without ever engaging with it as an instrumentalist.

On top of that, the conductor isn't just engaged at performances, they're also responsible for leading rehearsals. If they've never deliberate practiced the learning of music from an instrumentalist perspective I think they would be very hard-pressed to structure strategies for getting the larger group to a high level.

I guess it depends what you mean by "play an instrument" - almost all proficient conductors have proficiency in at least one orchestral instrument.

There is at least one (so I'm speculating he's not the only one, but it is likely to be rare) proficient conductor (Leopold Stokowski) who had no real proficiency with any instrument but he did have some very rudimentary piano training... and then pretty much taught himself conducting. Whether that rudimentary piano ability counts as "play an instrument"

the skills of conducting do not require any instrument. Many conductors tell someone how to play their part despite not knowing how to play it themself. However it is hard to imagine anyone learning music theory not in context of learning an instrument.

There was actually a show about this: https://en.m.wikipedia.org/wiki/Maestro_(British_TV_series)

Are you an orchestra conductor or a professional violin player?

I'd rather put the violin player with understanding of acoustic dispersion in the same bracket as a software engineer who understands the effects of magnetic interference on a memory controller.

it's like recognizing that an orchestra conductor doesn't need to play (...) any instrument

i heard this before and it seemed dubious to me

on investigating, it turned out that there were literally zero famous orchestra conductors who don't know how to play any instruments. most of them know how to play numerous instruments and do so in their spare time, though there were one or two who had stopped. from my experience with musicians i think it's likely that they could play any conventional orchestra instrument after a short exploration period, even if they don't have documented histories of playing it

i suspect that this is true of non-famous orchestra conductors as well, but i could only find public information about the famous ones

of course that doesn't prove anything about programming, which is not the same skill as playing a musical instrument or conducting an orchestra, but it does show that you're constructing your arguments without much concern for veracity

the fact that in https://news.ycombinator.com/item?id=38769008 you claimed that web search engines are not built by people who work on string matching or distributed database transaction consistency further calls your credibility into question

This is correct, a conductor does not need this. But you are not the conductor in this analogy, you are a player of one or more instruments.

Nowadays (and increasingly, going forward) it's possible to be a very productive programmer without knowing much low-level stuff. This may seem unfair to those who spent years wrestling assembly and then C pointers but that's just today's reality.

It's not possible to be a "productive" musician on a traditional instrument (i.e. excluding iPads) without knowing how to play scales, chords, etc.

This is also the reason a computer with 4GB of RAM can't run Gmail and Spotify at the same time.

This is also the reason a computer with 4GB of RAM can't run Gmail and Spotify at the same time.

That’s demands for faster delivery of software and ridiculous demands on user experience.

Does looking down on higher level devs make you feel superior or something like that?

That’s demands for faster delivery of software and ridiculous demands on user experience.

Absolutely not. It's about bloat. Many times people "just want to code" or eschew optimizing repeating the mantra that "premature optimization is the root of all evil" that you often end up with bloated software.

Google and Spotify are supposed to get among the best / most productive developers though

Tell that to most guitarists. Vast majority are self taught and can’t read music; a number are excellent players.

They can't read music, but they can still make their hands move in repeatable patterns without thinking about each finger position, which I think is roughly the equivalent of implementing quicksort.

As someone who has played guitar on and off for years as a hobby, I'm shocked by how much harder it is than programming.

I can pick up almost any random codebase on GitHub, and as long as they use libraries and don't touch low level stuff, I can probably start working with it in 10 minutes to a day at most.

Then, I could leave that project for a year and be almost as productive as when I left. So much of the action is on the screen, not in the head, and there's no muscle memory required.

If any instrument was as easy as coding... I think a lot of coders might have music careers instead....

Hard disagree. Those “productive” programmers are a nightmare, their code is a barely decipherable jumble of randomly gluing function calls together until they kinda sorta do something that approximates working some of the time. They cause an unfathomable amount of damage, lost productivity due to bugs and data nightmares, lost data, and headaches.

They do make idiot managers happy though, because look at all those features and scrum points they “finished!” That’s why armies of them will always be there in the software ecosystem, blithely and ignorantly a net drain on whatever unlucky company is currently employing them, busily making a mess for more diligent programmers to clean up for the rest of eternity. It’s called “job security.”

What if we don't want to be "software engineers"?

I'm still not totally convinced that "software engineer" is even a thing, frankly.

Who do you think puts systems like a web search engine together?

It's certainly not people who are lost in details like string matching or how ACID is implemented in distributed databases.

Of course you need all these things in order for the search engine to work – but if you write them yourself, even if you think about them excessively, you will never finish.

Software engineering is knowing that you don't need to know everything, you only need to know where to find everything. "Programming" is an important part of it, but far from the only one.

Who do you think puts systems like a web search engine together?

A really good programmer who is free from the tyranny of software “engineering”.

I believe this more and more as I go through my career. There is usually one person carrying progress hard. If you can get three or four of those people and another one to coordinate them you can do truly amazing things. Usually though you just need one unencumbered by bureaucracy.

Not sure what you guys think software engineering means, but it's definitely not the same as what I think it means.

Software engineering is whatever definition that allows me to procrastinate and sleep at night knowing I don’t need to learn more.

If you don't want to learn you chose the wrong profession.

To expand, the software industry moves fast, and you have to keep learning to stay afloat, or risk stagnating and ending your career early. look at where software was at 5 or 10 or even 20 years ago. There was no react and no rust 20 years ago, nor was there even git! Whatever you know now is going to be out of date in a matter of years. good luck getting a job if that's all you ever want to learn.

Conversely, 10 years ago we also had Grunt, Gulp, etc, which were poor implementations of build systems we had ~40 years ago.

It’s not all progress.

Stagnating yes, but not ending career early. There will be python/java/c/c++ jobs for decades just keeping the lights on for non-tech companies.

Say that to people in this thread.

Who do you think puts systems like a web search engine together?

i've met a good fraction of the first 128 google employees, and this could not be farther from the truth:

It's certainly not people who are lost in details like string matching or how ACID is implemented in distributed databases.

the people who thought those things were unimportant details were the ones who tried to compete against google and failed, like lycos, inktomi, and pets.com

like, check out the wikipedia article on udi manber, who is best known for spending 15 years 'lost' in string matching:

In 2002, he joined Amazon.com, where he became "chief algorithms officer" and a vice president. (...) In 2004, Google promoted sponsored listings for its own recruiting whenever someone searched for his name on Google's search engine. In 2006, he was hired by Google as one of their vice presidents of engineering. (...) In October 2010, he was responsible for all the search products at Google.

the fact that in https://news.ycombinator.com/item?id=38769506 you claimed that orchestra conductors don't need to play any instrument further calls your credibility into question

BTW at Yahoo in the 2000s I learned that Manber had invented at least one new trade-secret algorithm while he was there. The one I ran across wasn't for string matching but it was a fairly basic kind of thing.

Who do you think puts systems like a web search engine together?

Literally by stitching together people you’ve listed here and making them work together:

It's certainly not people who are lost in details like string matching or how ACID is implemented in distributed databases.

Of course you need all these things in order for the search engine to work – but if you write them yourself, even if you think about them excessively, you will never finish.

Being able to write something from scratch is not the same as always writing from scratch. Let me turn this around. Who would you trust with writing search engine, a software engineer who worked on and wrote search engine from scratch or someone who only theoretically knows about it?

Software engineering is knowing that you don't need to know everything, you only need to know where to find everything. "Programming" is an important part of it, but far from the only one.

I hear this excuse more and more when it comes to threads like this. You don’t need to defend your life position, it’s okay to not know everything.

I find your comment quite ironic considering the engineers that built Google search did it by developing novel software systems, to great success. The Google codebase is basically the definition of NIH syndrome.

Maybe the *term" "software engineer" is not an established thing.

But these points raised by GP... are basically self-evident.

- Deciding what to write yourself and what to take from a library

- Identifying high-quality libraries and frameworks that meet your project needs

- Deciding where optimization is worth the effort and where it is not

- Writing code that will still be readable to you (and others) a few years from now

- Thinking about the project as a large-scale, complex system with software and non-software dependencies

... But since they don't have immediate effect (along the tune of "look I made this shiny thing myself in a weekend!"), it's hard to brag about it and perhaps even hard to establish causality that your skills in the areas above contributed to the software/business' success. But you can always brag about how you implemented a Fibonacci heap or something and made your heap operations 0.2ms slower.

I do that shit every day at my job. When I’m just coding for fun at home I want to implement crazy data structures, otherwise what’s the point?

The good news is that the field of computer programming is huge - so there is room for all sorts of preferences, skills, and abilities.

And sure, use of the word "engineer" will cause angst among some - because it used to have a precise meaning which has been diluted (and in the case of software, ignored.)

Indeed the list of projects runs the gamut of where a programmer might go in life. They might do user UI, or games, or work on an OS or compiler. More likely they'll work at some random company doing databases and reports. Or they'll be Web focused doing lots of JavaScript. Or they'll be deeply involved in the field of Advertising (Advertising Engineer anyone?)

Of course wherever you end up, you can still have some fun. And if you're just starting out then it can help you understand your desires, and limitations.

So don't worry too much about the "engineer" word. It's pretty irrelevant. Even the title of the thread is "programmer" not "engineer".

Not ignored! It's precisely because software engineering discards the very parts of programming that are closest to real engineering that this term came to describe it. "Don't bother learning the fundamentals, but focus on bureaucratic ritual instead" is a message that requires excellent marketing.

I honestly don't see a difference between programmer and software engineer. The one I see a difference with is "coder"

This sounds like sensible advice. But there are a few problems with making software purely from ready-made building blocks. Here are two of them:

1) Much more often than not, the ready-made building blocks are crap. Your software will reflect that crappiness, and your life will consist not of writing software, but of maintaining and massaging crap.

2) Even if your ready-made building blocks are high-quality, they limit what kind of software you can write. Ideally, you imagine what you want your software to do, and then you write software to do it. You create the building blocks in that process. If you are only using ready-made building blocks, there is a lot of great software you just cannot write. And I think we all see that, because there isn't much great software around.

These problems might not be a concern for you. If you write software that is pretty much just more of the same, 2) does not apply to you. And while 1) is still problematic, it will be just manageable because you use these building blocks mostly on their happy path.

But I have bad news for you. This kind of software will soon be written not by you, but by an AI.

If you are only using ready-made building blocks, there is a lot of great software you just cannot write. And I think we all see that, because there isn't much great software around.

I claim that the opposite is true: The reason there are so few software projects that are really great is because high-level skills, not low-level ones, are in short supply. Every CS graduate can implement Paxos. The number of people with the experience needed to take an implementation of Paxos, and many other components, and put them together into something that works really well is much, much smaller.

But I have bad news for you. This kind of software will soon be written not by you, but by an AI.

AI is actually much better at writing low-level code than high-level code. If you tell it to implement a piece table or a Fibonacci heap, it will do so without problems, in any language. If you tell it to create a simple CRUD application using existing libraries, it will output something that doesn't actually work.

I guess it all depends on your definition of low-level. High-level can also be low-level, just with different building blocks, if the building blocks are based on well-specified abstractions.

I am actually currently writing a novel text editor, and I can guarantee you, my requirements mean that an AI cannot help me with that directly, because the underlying mechanisms I need don't exist yet. The AI is still helpful though when researching existing mechanisms, and to pinpoint why they don't work for me.

AI is actually much better at writing low-level code than high-level code.

If you tell it to create a simple CRUD application using existing libraries, it will output something that doesn't actually work.

If your low-level code is just a copy or adaptation of something existing, then yes, AI is really good at that. Something novel though, AI cannot help you much here.

On the other hand, simple CRUD applications will all be done by AI very soon. Probably not using existing libraries though, because these are crap.

But there is certainly a role for humans currently in high-level design, I agree with you here. But this high-level design crucially relies on your ability for low-level design, with the ability to imagine and then implement (possibly with the help of AI) the building blocks you need, not just reuse the ones that are there.

You have to be pretty smart to even think of an idea that can't be done with existing building blocks.

It seems to show up more in hobby stuff, or in cutting edge projects.

I'm always excited whenever I see any kind of algorithm work, because it's such a novelty. Most everything I do is just a UI around a few libraries.

There is a world of difference between "can be done" and "can be done well". Existing building blocks typically is good for the former but often not the latter, or you have to exercise a good amount of judgement when piecing things together.

Existing libraries for writing CRUD apps are anything but crap.

If we can’t create special libraries that work well for common tasks then as a species we are too dumb.

Which ones are your favourite libraries, let's say for a TypeScript only code base?

If you feel like telling us about your novel approach to text editing, I'd like to hear it (or I'll just wait for the announcement post. That's cool too).

Every CS graduate can implement Paxos.

I have lived evidence that not nearly every CS graduate even knows about the topic that Paxos addresses.

Where in the world can you get a computer science degree without hearing about consensus algorithms?

Pretty much everywhere below country's top3, top5?

I've got masters degree and I don't think anyone even mentioned word "paxos" or "raft" during my 5 years there.

Fortunely internet is a thing and I could read something about this topic.

I know graduates from various public schools and people who heard about paxos (let alone can implement it) are tiny %.

You’ve just shifted the goalpost twice here. I never learned about Paxos, but I did learn about consensus algorithms. Second, learning about something, and implementing something are completely different. Most CS graduates have an idea of how syntax parsing works. How many can implement a parser? What about a syntax highlighter? Most graduates have an idea of how an OS works. How many can build an OS?

I guess if you’re always using libraries though, you may mistakenly be thinking that these libraries are just doing trivial work. Once you dive in and try to implement the low level stuff, then you realize how big of a disconnect there is between a fuzzy idea in your head and the lines of code that constitute that idea in reality.

Every CS graduate can implement Paxos

Is all of this satire?

Just plain old elitism.

But I have bad news for you. This kind of software will soon be written not by you, but by an AI

Press X to doubt

I think those who deal with OS kernel, compiler etc on daily basis also understand those points. They use various tools to make code review, time management etc easier.

Not all hackers are overly obsessed with the most efficient Big O and various low level (sorry pun intended) details.

Not all hackers are overly obsessed with the most efficient Big O and various low level (sorry pun intended) details.

A NAND gate is low level. Big O is just theory that anyone should know if he's worth something as a programmer.

Most of the time, big O is not relevant if you're not writing new algorithm. For the usual CRUD app? Not so much. Heck even for more involved work, you will not touch them because your solution will be called or will call some external tools.

The good news is that, while still valuable, you won't have to lose time creating a new broken wheel. The bad news is that this part of CompSci is now mostly fundamental research.

This is just wrong. Understanding the difference between O(1), O(n) etc is essential for literally everyone who writes code. Every single programmer is better with this understanding than without it. You should know the complexity of the code you write - and most of the time that doesn't even require actively thinking about it. If you know the basics of complexity analysis it's just intuitive.

Understanding the difference between O(1), O(n) etc is essential for literally everyone who writes code.

No it isn’t.

It’s a great thing to learn and understand, and essential for designing and maintaining data-intensive systems, but your statement simply isn’t true.

Sure, you don't strictly need it to write working code. But you will very quickly run into situations where you're writing unnecessarily slow code because you don't know what you're doing.

To me, it's essential.

Most of the time, big O is not relevant if you're not writing new algorithm

This attitude is precisely why we have computers thousands of times more powerful than we had in the 90s, and yet they perform the same tasks slower.

It becomes very important as soon as you start getting statement timeouts in your db queries.

And then you get needlessly inefficient CRUD apps that take minutes or even hours to generate a simple report because they're doing accidentally quadratic or even cubic work somewhere.

Understanding algorithm costs can be extremely important when building a simple CRUD app because it will help you reason how things will scale with the size of the data you’ll see in the real world.

It’s easy to make things that are accidentally quadratic but are fine until your big customer has 10 times the data and they suddenly aren’t fine anymore.

Doesn’t mean you need to optimise everything, but it doesn’t mean you shouldn’t think about this stuff.

I don’t really see how one could have any hope of performing engineering with any sort of rigor while throwing away big-O.

Then again, big-O is useless in many cases because real computers have too many arbitrary performance thresholds.

I suspect software engineering is impossible, or at least, nobody has made the model required to do it.

Going from O(n) to O(1) on an operation where n is 10 could be a performance downgrade. That doesn't mean it's useless, it just means there's more to it than the big O.

Asymptotic complexity is about how things scale up. You should care about it when you're working with things that scale. Doesn't mean it's useless if your data is static, just means you need to understand when to go for the high scaling solution and when not to.

It can be useful in general, but it is hard to provide the type of tolerances expressed in physical units you’d expect in an engineered solution when all the constants are ignored.

I'm not sure what you mean. Sounds a bit like you might be misunderstanding what were talking about.

It is a bit rude to become confused and then assume the other person was the one with the misunderstanding.

I'm just curious about what you're talking about. Tolerances and physical units are not usually mentioned in relation to time/space complexity. I'm open to the idea that you may know something I don't, feel free to enlighten me.

if your input data doesn't scale then either

① you're writing a real-time control program such as a motor controller and you care about not just the asymptotic complexity of your algorithms but their worst-case execution time, in microseconds; or

② you should just do the computation by hand with pencil and paper instead of writing and debugging a program to do it for you

at the point that the input data becomes too big for ② to be an appealing option, n is at least 100, which means you probably care a lot about whether your program is O(n) or O(n⁴), at least if you wrote it in something slow like python

maybe the time to start worrying about it is after you start the program debugged and running for the first time

So you say software engineering is not real engineering I will drop this link here:

https://youtu.be/RhdlBHHimeM?feature=shared

It is an hour long, he’s a good speaker, but what’s his point or where’s he get to it?

Notice it said programmer, not software engineer.

Speaking for myself, software engineering is something I do because I have bills to pay. Programming is something I started doing because it’s fun. The projects in this article are the types of projects I would do for fun if I had the time.

Software engineering is something I do because I have bills to pay. Programming is something I started doing because it’s fun.

Sudden clarity.

I feel like printing it and putting it next to my workstation, so that I remember this both when doing something fun and when having to work so that my bills get paid.

Yeah, this is a very important realization to make in a career of a software engineer.

Other points to internalize:

- in business, software is made to either earn money or pay less money. All other effects are at best secondary

- code is a liability as much (or more) as it is an asset

- programming is the easy 0-20% of the job of the software engineer, the rest is making sure the right thing gets programmed

- no matter what they tell you it’s a people problem

- software is always broken but it keeps running as long as practitioners are keeping it within the operational envelope

I like watching Tsoding in my spare time and I love what he calls "recreational programming". That is what we need imho, more recreational programming.

Also, I'd add that doing HoTT or other such "high theory" stuff actually enhances my understanding of the fundamentals and gives me the machinery to handle complexities systematically that building a mere web search engine would not. I already have a job for building such commodity stuff everyday.

Speaking for myself, software engineering is something I do because I have bills to pay. Programming is something I started doing because it’s fun.

Same for me. While I strive to do stuff fast, in a way that is easy to maintain, modify and extend for work, see the big picture, take the right time of shortcuts, understand business and customers, there is not were the fun is.

I enjoy the time when I am alone with the PC, far from user stories and scrums.

I love to program, to convince the PC to do the trick I tell it to do. What I enjoy also is CS, algorithms, data structures, abstract thinking.

If you want learn how things scale across a team and last years, read or contribute to open source code.

It takes years for a single person to get a project to the point where it's a good learning ground for scaling and maintenance.

Gluing a few libraries together is real software engineering but unless you're really invested in the outcome it's not that engaging and it's not that educational.

Gluing a few libraries together is real software engineering but unless you're really invested in the outcome it's not that engaging and it's not that educational.

That's only true if complex systems don't interest you.

Personally, I have always found the experience of "putting the pieces together" and orchestrating highly diverse systems into a coherent whole to be much more educational than learning about algorithmic details. I also generally find making things work well more interesting than making things work.

That's only true if complex systems don't interest you.

Gluing a few libraries together will certainly produce a complex system.

So if I'm understanding your argument correctly, if you enjoy it, it's what everybody else should do. The other points are just post-hoc justification.

I think your view of low-level projects from scratch is simply wrong, if you describe them as 'algorithmic details'.

Do you have any good resources for what you mentioned?

I wish. As you can plainly see in this thread, most "real programmers" consider such things beneath them, and the scarcity of relevant resources is a natural consequence.

Nobody considers them beneath them, you just found a straw man and keep doubling down on it.

Have good critical thinking skills and you’re pretty much there.

In that spirit, I offer the following alternative challenge: Create a web search engine.

Yea, no thanks.

If I'm going to work on something in my spare time, I am going to work on something that I am interested in and find fun. I generally loathe the dirty work often involved in software engineering and debugging all of these dependencies all the time. It's exhausting. Doing things from scratch puts me in charge.

And I mean, creating a search engine is the ultimate "not invented here" project. I'd just pay for and use Google, Bing, or whatever else API to do things for me. So it doesn't even make sense in the context of your objections.

And I mean, creating a search engine is the ultimate "not invented here" project. I'd just pay for and use Google, Bing, or whatever else API to do things for me. So it doesn't even make sense in the context of your objections.

Except that none of the established search engines actually does full-text search (anymore). There is no way to prevent those engines from breaking apart and rewriting your query as they see fit. No matter how many quotation marks you put around it, and even if you enable "Verbatim" mode or whatever.

So if you just implement the option to do that (which is actually the default if you leverage a standard FTS library), you already have something that AFAIK you can't get anywhere else. Then you can work on kicking the blogspam out of your index, and before long you will have a search engine of real value for yourself (and possibly others).

Or, you can build a crappy clone of a text editor from the 70s.

and before long you will have a search engine of real value for yourself (and possibly others)

How much do you think hosting would cost?

This sounds like an excellent project! I look forward to following your work on GitHub. Everybody needs a crappy search engine from the 90s!

I don't understand why you believe low-level "intrusive" programming and general software engineering are mutually exclusive. There is a big, open field for high quality, low-level software to be written well bearing good practices and practical decisions.

1. To be faif, these kinds of "hacking" projects aren't technically meant for learning software engineering in the context of a job, but they are really interesting and a treasure trove of new knowledge for anyone willing to put in the effort. I don't think it's a bad thing. A person can learn to be a good software developer and simultaneously have deep knowledge of the systems they are working with.

2. The goal of this project is not to research and implement some novel algorithm to replace existing works. The goal is to learn how things work. You are still going to use libraries, procedures and algorithms designed by other people and compile them into a cohesive system, it is merely the depth and complexity of the problem that makes it an interesting learning challenge. It's not about some absurd NIH philosophy, it's about curiosity.

3. Software development is not as straightforward as you make it seem. It's not just about best practices, management and finding an optimal way to combine existing libraries into your project. Sometimes you run into unforeseen issues, strange niches that your libraries may not account for. Sometimes you need to dive into the specifics of these libraries, understand at a lower level why they do not function for your use case. You might need to "hack" these libraries. Fork and customise them to truly fit in your project and produce the desired results. And the only thing that will help you then is deep knowledge and the ability to work with low level software.

I don't understand why you believe low-level "intrusive" programming and general software engineering are mutually exclusive.

Hmm, while I don't fully agree with his comment,

then your comment reminded me how I hate to discuss programming languages and programming ecosystems with C people.

It feels like C/low lvl people often refer to some "standard" that they treat as a bible of programming languages even when nobody mentions C and they treat GCC/CLANG as a perfect reference for compilers.

So, while high level devs dont understand low lvl stuff, then I believe low lvl programmers aren't creative at very abstract level, very abstract system modeling cuz they're too used to modeling everything with integers in C.

And they're too used to being hurt by C ecosystem that they believe it is "normal".

High level people (web developers) spend a lot of time arguing about their buzzwords like microservices, ddd, oop vs fp, etc.

Honestly, that is fair. My perspective on this is as a high-level developer, looking into how the things I am using daily actually work. I haven't actually interacted with any proper low-level developer, kind of hard to come across them where I am tbh, so I don't have any clue about how snobbish they can be, but I have seen a fair share of high level devs do it too, as you mentioned. I kind of get it? It's like a way to get validation on the technology you are stuck with for the rest of your working lives, I'm guilty of pushing Flutter onto people despite having had a painful experience with it (state management is a nightmare).

But hey ultimately I just think people should satisfy their curiosities. My work is often always in Python, JS or Java, but at some point I got tired of just doing regular software development stuff and lately I've happened to gain interest in low level development. (Like any definitely sane person I chose to start with rust and decided my first ever rust project should be an emulator, so maybe I'm just dead inside and I don't know it yet)

There is a need for both "solving complex problems alone" and "learning how to work as a team developing a complex project within a timeline". The Software Engineering part is best learnt in industry under a mentor.

The tooth extraction skill is fundamental if you are a dentist. You learn it in school. Knowing how to market and sell that skill is valuable but you can learn it while working.

CS I learned in school, engineering I learned while working.

> Writing code that will still be readable to you (and others) a few years from now

This will be a wonderful feature for copilots, to allow for the human to provide a verbal narrative of the logic/process or even the environment and of others in the room commenting on why a such-and-such is being done, and the copilot elegantly documenting the narrative.

Wait until we have "Thick Code" (where an app comes with an AI generated Mini-Documentary-Series on the creation of the piece (as an optional function of your enterprise Copilot Agent (With the [Security Assistant] [Compliance Assistant] [Legal Assistant] [Media Assistant] and whatever Alignment Agents are required and inform the narrative of the project.

EDIT: @Tao3300 - I do that all the time. You just need to be a better MeatPT and infer "like, my, opinion, man"

I suffered a fatal error because I expected a few end parentheses there.

Gentlemen, you can't hack in here; this is Hacker News.

I'm shocked, SHOCKED, to find that hacking is going on in here!

one wonders whether you’ve created having completed two of the items on the list, i can confirm that it makes you better, whichever way you’d like to measure it. as a google engineer told me once: operating systems and compilers are where all the good ideas are. study them! only thing that beats that is probably writing them yourself, even if toy versions.

I'm working on a hobby OS. It's mostly just a fun project, but there's certainly a lot of deciding what to write and what to use, and identifying quality libraries when you want to use something. You can start from nothing and build your own board and bring it up with no software you didn't write or you can use a commercially available board, bios or uefi boot, use an existing boot loader, use an existing libc, etc etc, and explore the specific thing you're interested in, while still earning a lot of systems knowledge.

Deciding when to optimize isn't too hard when nobody is ever going to use it (only optimize run time if something really takes too long, otherwise don't do anything tremendously stupid, but optimize for less development time because this is taking forever)

Readable code would be nice, because sometimes it takes a long time to get back to it, but I've also designed in a real reason to come back and retouch everything (x86 -> amd64; aarch64 as a stretch) and maybe I'll make things reasonable then... Although it's not my strongest skill.

Even most simple OSes become complex. Maybe the dependency chain isn't too long though.

Seems to tick all your boxes, IMHO.

Idk why everyone is piling on this comment, they clearly make a distinction between programming and software engineer.

There are a few things on this comments that is causing noise in my mind.

might make you a better programmer, it won't make you a better software engineer

I think it is a short-sighted definition of SE but in any case, there are not any mention anywhere in the article claiming that this was aimed to software engineers.

Also, although I appreciate the concerns of "Not Invented Here" doctrine, we also have the other side of the spectrum, where we don't know how to do a left-pad anymore [1], and introduce dependencies everywhere.

The third point is that I don't think that having more knowledge on how things works contributes with the "Not Invented Here" mindset. I would argue is the opposite. When engineers are hungry on learning, they find excuses to make address that appetite at work, but once they know the tradeoffs and amount of effort to make it work, they will think twice before starting anything from scratch.

[1] https://qz.com/646467/how-one-programmer-broke-the-internet-...

Is not this the main point of the editor war? programmers built tool for enhancing programming abilities, but they argued with which tool is best one...

There is a lot of disagreement towards this idea in the thread, but I really like this project suggestion for someone to make a multi-faceted project that'll help someone "grow up" and it resonates with something I've long felt missing from comp sci education: that it doesn't focus enough on studying and replicated big successful projects.

You mostly learn by doing, true, but you learn by copying what the masters did. Software engineering education feels more like taking classes and writing code for exercises, whereas it probably should have more projects of the form "study this complicated-implementation-of-X and use those techniques in your implementation of Y", where x = google search, postgresql, whatevers and y is a complicated project [for one person or a small team] like your own single concurrent user search engine

I've held this idea for a really long time, in fact, I've not actually looked at comp sci education recently, so maybe this isn't true anymore, or maybe it never has been

I figured that was the point here. It's nice to follow tutorials for a new language or framework or whatnot, but to really stress and test what you absorbed, you want some non-trivial project with minimal dependencies. the game section exemplies this the best:

The idea here is to implement a well-defined game from start to finish without getting bogged down on the other fun stuff (e.g., game design and art). Also, it is best if you use a barebones 2D graphics library (e.g., SDL, SFML, PyGame), not a big game engine that'll hide all of the interesting bits from you.

This kind of game won't go on sale nor even be a portfolio piece for a non-junior engineer, but it's a good enough idea for someone like me who say, wanted to really wanted to solidify their Rust skills and show what my and the language's weaknesses are.

After I can get that done, I may feel confident enough to move on to work on a toy renderer based on what I figured out in the game. And then later contributing to a proper engine that already has a lot of groundwork figured out. I'd just be getting in the way if I jumped straight into trying to throw anything but the simplest PRs into Bevy.

You're making a strawman here, based on a very much false dichotomy. The things you list here are all very important.

But something you learn from writing a larger piece of software yourself is that you get to appreciate complexity and "bulk" (for lack of a better term) that you yourself created. You get to ask yourself questions like:

Is all this code needed? (Why is it 400 lines to do X?!) Why did this get so involved/ complicated?! How do I progress (eat the elefant)? Can I (teach myself to) make small iterative units of progress every day, for instance?

These kinds of katas are very very useful for a 5-7 year beginner (which I use as a label of respect - the longer we can remain beginners, the longer we learn with an open attitude!)

Honestly this just sounds like me when I first encountered coding puzzles like project euler / leetcode and interview challenge practice problems. I made up excuses like this because I simply didn't want to spend time on lower level problems and instead wanted to feel good sticking to my high level application comfort zone.

I eventually learned to like solving lower level problems and I'm much better off for it. And these days I can tell people who haven't paid their dues because the only solutions they ever come up with are slow quadratic ones no matter the occasion.

No, building your own text editor, compiler, OS, ray tracer is not going to make you a worse engineer nor keep you from learning how to google and evaluate libraries or think critically, nor are those things reserved for an engineer who isn't building those things.

might make you a better programmer, it won't make you a better software engineer

This is the distinction between solving "local" problems and solving problems at scale.

But I think there is a transitional phase where requirements (gathering, documenting), unit testing, UI design (human or programmatical) are essential. Learning these as nuts-and-bolts in any language is essential to becoming skilled in the craft.

Identifying high-quality libraries and frameworks that meet your project needs

This is a really important skill that I find difficult and frustrating. Does anyone have good advice or resources?

Actually reimplementing something that’s already exists is a great way to get an understanding of it precisely so you can deal with the questions you mention.

My rule of thumb is that you should always understand at least one level of abstraction down.

In fact, it might make you worse at software engineering, because it embodies the disastrous "Not Invented Here" doctrine

I get your point against NIH, but I don't see how doing such a project on your own in your freetime could possibly make you a worse programmer. If anything, you would likely learn exactly what it takes to go down the NIH path so strictly and be able to make better choices

I agree with this, only working on a project with certain expectations of delivery can make you a better software engineer.

I could not disagree more.

If it is not not invented here, on the limit is not invented anywhere because nobody knows how to build these things, which we have been on that path more or less.

Perhaps you were visualizing that guy on your job who creates his own libraries and obscure systems within the system before telling anyone and that is generally a bad peer.

But the Universe is much bigger and diverse than that, people who programmed their own things but do a Google search first and use other software because they know it’s more feature complete, more known, trusted and better future supported are the best to work with.

We are all using the products of not invented here syndrome right now, don’t forget that.

While writing a text editor, a compiler, an operating system, or a raytracer might make you a better programmer, it won't make you a better software engineer.

To me learning the engineering part is easiest. And also not terribly valuable if you don't know CS fundamentals.

I agree that these are all valuable things for a software engineer to learn, but in my experience, it’s valuable to at least take a crack at the projects listed in the post. I wasn’t under any illusion that I was going to build anything amazing, and gave me deeper appreciation of why you don’t try and build everything in-house.

I wish every dev would try projects and things completely unrelated to computers.

I enjoy being outside.

These headlines “projects every dev should try” really annoy me. I try to put in enough effort at work, and that’s tough. But I have been messing with computers for over 20 years now.

Agreed. This site selects for people who have made computers their entire life. Consider this an unsolicited reminder to touch grass.

Consider this an unsolicited reminder to touch grass.

Do you find yourself personally attacked by someone doing something that you don’t enjoy? Or is the notion of them having fun AND improving their direct skill?

This site selects for people who have made computers their entire life

isn't it the other way around? The site allows anyone to post on nearly any topic, but the longest standing audience veers tech. there still is quite a few topics on medicine, transportation, economics, and politics as well.

Can you please elaborate what do you mean by " completely unrelated to computers" ?

Woodworking, blacksmithing, gardening, etc

Things that don't take computers to create

*makig music

I enjoy being outside.

I don't. It's cold outside!

Caribbean is nice this time of year.

I think it’s important that those who want a 9-5-only are able to get it, and those who want to make it their calling in life are also able to have that

A strange sentiment for someone who's wasting time heckling articles on HN.

Of course nobody said don't go outside or pursue non tech interests.

I just do both.

We're on Hacker News. Do you go to a fitness community and say "I wish every gym bro would put down a dumbell and pick up a book?"

But if you're curious, I hike, am learning Japanese, and want to eventually clean up and pick back up my saxophone once I can get the money to get it fixed. Tech is a big but not the only part of my life.

You could always pull a Wolfram [1].

[1] https://content.wolfram.com/sites/43/2019/02/07-popcorn-rig1...

I don't really understand how this comment relates to anyone but you.

Nerding on a computer and doing outdoor things are not mutually exclusive. They may be for you, but that's... well... you.

I think a little toy raytracer is another great thing to try out. Just something that outputs bitmap graphics of spheres and does diffuse and specular reflection, couple light sources. Should be a relatively self-limited project if you don't go too crazy with it.

That is in the sequel! “More challenging projects every programmer should try”

https://austinhenley.com/blog/morechallengingprojects.html

Hey, putting a ray tracer and a web browser in the same category of "more challenging projects" seems a bit weird. A ray tracer is a weekend project. A web browser is a multiple man-years project, unless you use a third-party HTML+CSS engine and a third-party JS engine.

depends on the ray tracer. You can make "a" ray tracer in a weekend. You can spend months making a mid-scoped tracer if you decide to pick up PBRT (which is online for completely free now!): https://pbr-book.org/4ed/contents

If you want to upgrade your "weekend" ray tracer to a 2-4 week project, I'd suggest:

1. have it take input a scene/model file (FBX is the industry standard but also a pain in the butt because Autodesk. I'd suggest looking at gltf or blender scenes). This may or may not mean supporting triangles/quads if you only focused on spheres and planes.

2. texture support. Which sounds easy and then you enter the wonderful world of sampling. you can dive as shallow or as deep as you want there.

3. acceleration structures to improve runtime.

for a small start.

Try using USD instead[0]. It's an open scene description from Pixar that is well documentented and seeing adoption across the board - including apps like Blender.

[0]https://openusd.org/release/index.html

A fully featured, commercial ray tracer takes many years, much like a fully featured, commercial web browser.

My point is that a barebones ray tracer (spheres, planes, metal vs plastic) is much, much simpler than a barebones web browser (a minimal HTML parser + a minimal CSS parser and cascading + a minimal JS implementation + a miimal layout engine + a minimal renderer).

A text-only browser without scripting support, like lynx, could be a more reasonable project, but still larger than a basic ray tracer, or even a ray tracer + a basic material system.

Their definition of web browser is basically the UI and renderer for a text based browser.

I went to a university with a great computer science & video games major, and while I didn’t finish out all the credits, I took a couple of the videogame classes; holy shit, you will never learn more about how to abuse geometry and data structures than trying to render graphics.

Not just geometry on the math side but numerical computation and linear algebra as well. Was enough to make my head spin.

My friend did one of these in a weekend!

https://raytracing-iow.shuttleapp.rs

``` curl --request POST --header 'Content-Type: application/json' --data '{"height":10,"width":10}' https://raytracing-iow.shuttleapp.rs ```

Or go more esoteric: "Raycasting engine in Factorio 1.1": https://youtube.com/watch?v=0bAuP0gO5pc

Ray Tracing in One Weekend is a popular tutorial series for those interested in trying this out https://raytracing.github.io/

that's listed in the followup post, and a lot of people have done it within a day. it's a pretty amazing cost-to-benefit ratio

A text editor can use an array as data structure. You only need it to be fast during the typing, where you are only changing one line. But for entering new lines, the extra latency needed for rebuilding the array after pressing enter is not noticeable for several million lines.

The more challenging part of a text editor is making sure you only render what the user sees.

And the most challenging part is making a text editor that actually works well from a usability perspective.

Which has almost nothing to do with data structures and optimization. I've used dozens of text editors. I have never once thought "man, that thing is slow". But I have thought "man, that thing is a bug-ridden, unintuitive piece of garbage" many, many times.

I have never once thought "man, that thing is slow".

You’re either lying or incapable to detect typing latency.

Try typing while CLion indexes stuff in background or open couple hundred thousand line file in VS Code with Vim plugin and let me know how it goes.

There's a difference between noticing latency and considering it a problem.

My car doesn't go at the speed of sound. That doesn't mean I think "man, this car is slow" each time I drive it.

But if the same car suddenly stops, and I have to open all the doors and close them again in order to keep going, that sure is a problem.

There's a difference between noticing latency and considering it a problem.

Just because you don't consider it a problem, doesn't mean it's not a problem.

Weren't you the one lamenting about ML having potential to being 100-1000x faster if only there were real™ software engineers to implement them?

My car doesn't go at the speed of sound. That doesn't mean I think "man, this car is slow" each time I drive it.

This isn't even remotely valid example. Put your car on track against faster cars and you'll quickly arrive to `man, this car is slow`.

I'm sure then you'd be perfectly content if your otherwise perfect car were replaced with one identical in all ways except that the acceleration was 1/4 of the previous rate.

You must not have used Eclipse back in the day

or Atom

oh man, this one hurts.

the original Eclipse was so godawful slow, this was back in the days where you would install Textpad after installing Java because it would pick up your classpath during installation and configure it for you. Java 1.4 I think.

Try re-indenting a large xml file, or something else that will result in a very large number of small deletions and insertions throughout a file. Even on a modern computer the underlying data structures will make the difference between something near instantaneous and the user giving up in despair after a few minutes. A simple array does not cut it.

It is shocking to me how little known the rope data structure [0] and the order statistic tree [1] are. Especially in the context of text editors.

[0]: https://en.wikipedia.org/wiki/Rope_(data_structure) [1]: https://en.wikipedia.org/wiki/Order_statistic_tree

Total long shot. But does anyone have any good side projects that would center around simulating fluid dynamics? I’ve always been interested in aerodynamics and I’ve wanted to see if there is a way to learn more about it with my programming skills.

Sebastian Lague recently did a video on simulating fluids, which may be interesting. As always, he takes a "from scratch" approach to it.

https://youtu.be/rSKMYc1CQHE?si=pXdsHlQSCpw8nY8m

The GitHub repository also contains links to some of the research papers used to implement the simulation.

https://github.com/SebLague/Fluid-Sim

I recently went this route. I didn’t want to set up or use Unity so I wrote my own 2D fluid simulator based on some of the same papers using Metal compute shaders (though I’d love to try again using webgpu). Sebastian’s video is great and the implementation is good. But this was a great (and fun) opportunity to look for ways to improve on it.

For starters, the way he’s doing the spatial lookup has poor cache performance, each neighbor lookup is another scattered read. Instead of rearranging an array of indices when doing the sort, just rearrange the particle values themselves. That way you're doing sequential reads for each grid cell you look for neighbors in, instead of a series of scattered reads. The performance improvement I got was about 2x, which was pretty impressive for such a simple change.

The sorting algorithm used isn’t the fastest, counting sort had much better performance for me and was simpler for me to conceptualize. It involves doing a prefix sum though, which is easy to do sequentially on the CPU but more of a challenge if you want to try keeping it on the GPU. "Fast Fixed-Radius Nearest Neighbors: Interactive Million-Particle Fluids", by Hoetzlein et al [0].

Or, if you want to keep using bitonic sort, you can take advantage of threadgroup memory to act as a sort of workspace during bitonic merge steps that are working on small enough chunks of memory. The threadgroup memory is located on the GPU die, so it has better read/write performance.

I ended up converting his pure SPH implementation to use PBF ("Position Based Fluids", Macklin et al, [1]), which is still SPH-based but maintains constant density using a density constraint solver instead of a pressure force. It seems to squeeze more stability out of each “iteration” (for SPH that’s breaking up a single frame into multiple substeps, but with PBF you can also run more iterations of the constraint solver). It’s also a whole lot less “bouncy”. One note: I had to multiply the position updates by a stiffness factor (about 0.1 in my case) to get stability, the paper doesn’t talk about this so maybe I’m doing something wrong.

The PBF paper talks about doing vorticity confinement. It’s implemented exactly as stated in the paper but I struggled for a bit to realize I could still do this in 2D. You just have to recognize that while the first cross product produces the signed magnitude of a vector pointing out of the screen, the second cross product will produce a 2D vector in the same plane as the screen. So there’s no funny business in 2D like I had originally thought. Though, you can skip vorticity confinement, the changes aren't very significant.

There’s a better (maybe a bit more expensive) method of doing surface tension/avoiding particle clustering. It behaves a lot more like fluids in real life do and avoids the “tendril-y” behavior he mentions in the video. "Versatile surface tension and adhesion for SPH fluids" by Akinci et al [2].

One of the comments on Sebastian's video mentions that doing density kernel corrections using Shepard interpolation should improve the fluid surface. I searched and found this method in a bunch of papers, including "Consistent Shepard Interpolation for SPH-Based Fluid Animation" by Reinhardt et al, [3] (I never implemented the full solution that paper proposes, though). There's kernel corrections, and then there's kernel gradient corrections, which I never got working. With the kernel corrections alone, the surface of the fluid seems to "bunch up" less when it moves, and it was pretty simple to implement. Otherwise, the surface looks a bit like a slinky or crinkling paper with particles being pushed out from the surface boundary.

I found [0] and [1] on my own but I found [2] through a thesis, "Real-time Interactive Simulation of Diverse Particle-Based Fluids" by Niall Tessier-Lavigne [4]. I also use the 2nd order integration step formula from that paper. It has some other excellent ideas that are worth trying.

Many years ago I used a paper (that is in fact one referenced by Sebastian’s video) and some C sample code I found to write an SPH simulator in OpenCL. I had been wanting to write one again but this time get a real understanding of the underlying mathematics now that I have some more tools under my belt. I owe it to Sebastian that I finally started on my implementation and I understand SPH a lot more now.

[0]: https://on-demand.gputechconf.com/gtc/2014/presentations/S41...

[1]: https://mmacklin.com/pbf_sig_preprint.pdf

[2]: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

[3]: https://www.hdm-stuttgart.de/hochschule/forschung/forschungs...

[4]: https://project-archive.inf.ed.ac.uk/ug4/20181074/ug4_proj.p...

See my post in this thread about dimples/barnacles...

But have you seen this guys package: https://github.com/ProjectPhysX/FluidX3D

If you're interested to learn more about aerodynamics I would highly suggest learning a bit of classical aerodynamics. It will not be software oriented, since most of the theory deals with approximating very complicated behavior with simple analytical models.

It could be interesting to do a comparison with finite volume methods to see when/how those approximations break down.

Totally newbie question - 'approximating very complicated behavior' - this seems like a perfect problem for ML to me. Is this something that's used or explored ?

Not sure if this fits, but I've thought about doing an app to approximate cyclists drag coefficients from video and help improve aero positioning.

I have no idea if it's realistic - from what I've read, it looks like not really, but hell, I would love to give it a shot.

https://github.com/ProjectPhysX/FluidX3D

I DO! PICK ME PICK ME!!!

First, be aware of FluidX3d [0] [1] - which is awesome for CFD simulations, OSS, etc...

Here is the premise of the CFD question:

It has long been known that Eddies [2] were studied by da vinci - and he was the first to propose the eddy pump... and how eddies work in hydrodynamics - and aerodynamics.

The barnacles on the leading edge of a Wales fin is also thought to cause beneficial eddies in the fin's ability to cut through water more efficiently with less drag.

Dimples on a golf ball affect the air-flow in tiny micro eddies, but at extraordinary speeds - where (I surmise) a certain amount of 'cavitation' may occur with a very thin film around the ball - kind of like water-tension, but with tiny eddies [4]

SO:

Create a Helicopter blade with leading-edge 'Barnacles' similar to the shapes of the Acorn barnacles on wale fins, which will create eddies as the air passes/affect the flow of the air over the foil.

Add dimples of varying shape profiles (such as convex round dimples to hexagonally based dimples (much easier in aircraft which are already based on titanium honey-comb-sandwhich materil)

But make the dimples morphic - being able to electrostaticly "activate" the dimples (meaning they are either on or off for the simulation)

The goal is to determine the characteristic of having dimples and/or barnacles have a net positive impact on the flow and conditions of air over a foil in the helicoptor blade - or the fixed wing of larger craft - or the entire fuselage dimpled like a golf-ball affecting fuel efficiency or other factors of lift or flight that could be visualized easily using something like [0]

[0.0] https://github.com/ProjectPhysX/FluidX3D

[0.1] https://www.youtube.com/watch?v=mhacLfz92h0 <--- This is a fantastic video tutorial on installing FluidX3D

[1] https://www.reddit.com/r/CFD/comments/10ghc2d/fluidx3d_blows...

[2] https://theconversation.com/how-leonardo-da-vinci-master-of-...

[3] https://marinesanctuary.org/blog/whales-and-barnacles-an-unl...

[4] https://www.scientificamerican.com/article/how-do-dimples-in...

Some sort of game might be a good idea, for example about building or managing a dam, a reservoir, or canal. Maybe a simple colony sim about otters.

I would really love to try my hands on something much more physical, like a robot; or a drone with autopilot; maybe accurate simulation of flight dynamics of a spaceplane with programmable GNC parameters?

I have a copy of "Fundamentals of Astrodynamics" by Bate, Mueller et al and I would love to do something with it this holiday season.

I say simulation because all the rest of the stuff seems to cost a lot. I am really interested in the GNC aspect of robots, and would love some good pointers on the topic!

I bet a self-balancing two-wheeled robot would be a fun (and relatively safe) project. I mean a Segway-like thing that can just stand in one place without falling over or yeeting itself off the table.

You'd need a microcontroller, an IMU, a stepper motor controller, a motor, some LEGO wheels, and maybe a block of wood.

I haven't tried this, so my guesses are probably way off.

Sounds like a good project to me (we did this for a couple of labs in an undergraduate [sophomore maybe junior year] control systems class). The "inverted pendulum" problem and its subsequent derivations should be a good model for this. When the object is nearly upright (or at small angles from the vertical axis) a nice linear control loop should suffice but if you leave that region the control solution becomes more difficult (would make a good target for improvement after initially getting it going).

I have a copy of "Fundamentals of Astrodynamics" by Bate, Mueller et al and I would love to do something with it this holiday season.

What do you plan on doing with it? I'd love to hear more about what the book is about and a project you might do.

https://books.google.com.au/books?id=UEC9DwAAQBAJ

is the direct link to the Google Books overview with sample pages, table of contents, etc.

https://ksp-kos.github.io/KOS/ Kerbal Space Program might not simulate aerodynamics quite to the realism level you prefer, but a lot of people seem to enjoy writing GNC software for it.

I feel the same way - almost all my personal projects have a hardware component. After spending all day on software I very much appreciate working towards a physical object.

I’m surprised an emulator (especially Game Boy) is considered harder than a little operating system.

I guess that would make sense if you’re not familiar with ASM but an OS will force you to learn that too.

Having checked off most of the list, I would agree the Game Boy emulator is far easier than the OS.

I guess with the emulator it depends how much you want to do. You'd be foolish to write a console emulator and not import someone else's CPU code. When I wrote a Game Gear emulator 26 years ago I used off-the-shelf CPU code and just mapped all the in/out and did the graphics bits. Took me and a friend a single evening to get it happily playing games.

And OS is harder. Well, it was last time I wrote one, again about 26 or 27 years ago. Getting something to boot at all was tricky. I used this book as my main source of inspiration:

https://www.amazon.com/Developing-32-Bit-Operating-System-Cd...

Wait what?

Cpu instructions are literally the easiest part of writing an emulator there is.

You'd be foolish to write a console emulator and not import someone else's CPU code.

Why? It's part of the learning and it's not particularly complex.

You can make a surprisingly usable OS with (relative) ease - because you can control all the features you choose to support (or not).

An emulator already has a "spec" you have to operate to.

I guess you could do something ultra-simple (no loadable programs or me port allocation for example) but that doesn’t feel like an OS to me.

You can sort of do that with an emulator too. For example on the GB you could do just the CPU/memory at first, skipping some rare/tough/quirky ones. Graphics aren’t that bad, you can avoid emulating some bugs/ultra-tight timing. Controls are quite easy.

Honestly none of it is that tough as long as you’ve skip sound. Sound is the worst part, and what’s next on my plan for my Swift based emulator.

Of course if you wanted to keep going you could add on color Game Boy support, Game Boy advanced support, cheat codes, save states, stuff like that.

It’s turned in to a pretty great little project.

A GameBoy emulator has a lot of edge cases to cover. Yes, it's straightforward to get most games running in a playable state, but there are several games (Prehistorik Man) and demos that rely on precise timing of the PPU relative to the CPU and those are notoriously hard to get right.

Here's the emulator I made not so long ago: https://github.com/grishka/miscellaneous/tree/master/GBEmula...

If you think you need a "factory pattern" to write Space Invaders, there is something wrong.

I'm close to absolutely certain no such design concept was involved in the original.

At no point does the author say or even imply that you need the factory pattern to implement Space Invaders. Are we reading a different article?

In the bullet points "things to learn":

- Creating and managing a dynamic number of objects (e.g., factory pattern).

Yup where does he say or imply that it's necessary to implement the game?

It's the result of 30 years of OOP and design patterns. When your brain gets infected with them you start looking for ways to spread the infection.

Not sure if you’re joking, but in a way this feels very true to me. After a decade or so of doing OOP, I started doing a lot of 6502 assembly (for a home brew computer) and not applying OOP principles felt dirty at first. Same as not having 100% (or any for that matter) test coverage. I felt like I was cheating by doing things a simpler way. But then I started to really enjoy it and it felt liberating.

For a fun challenge, implement space invaders in Verilog (i.e. purely in hardware)

How would you suggest going about the display? Driving an LCD or generating an NTSC/PAL signal?

Or maybe a giant array of LEDs?

Is there a good tool to simulate Verilog so you can test without hardware?

You can hook it up to a monitor and stream the graphics over HDMI. And yes, there are ways to simulate Verilog

Couple of years ago I implemented a snake game in 74-series logic ICs, and 8x8 led indicator, it was fun )

Fun fact: The original Pong had no CPU and had all the logic built into the hardware

I would add in a security or network related project.

Overflow attack, (sql) injection attack, and maybe something like using wireshark to see what a http request actually looks like.

I took two capture the flag classes which changed how I look at some of my day to day dev work.

Interesting, I will look into this for learning, thanks for the inspiration

pwn.college

"I took two capture the flag classes"

Can you share links for the same ?

The two I took were a part of my masters program, so they aren’t publicly available. But there are tons of Capture the Flag (CTF) resources. https://picoctf.org/ Is a good one for beginners, and you can find more from this list - https://old.reddit.com/r/securityCTF/comments/ewpdt8/top_10_...

What’s with many programmers’ obsession with writing their own text editors? In part a rhetorical question, but it still boggles me. Not that many people that are obsessed with manufacturing their own hammers and nails, for example, hence why I’m asking.

It's a nice self-contained project that's relatively easy to make fast progress on, while also being challenging. Plus it's something that programmers have to use every day so we're bound to have pain points with existing options.

Well you would normally use hammers and nails for woodworking but you wouldn't be able to use woodworking to make hammers and nails... But I guess with a text editor it's a weird recursive thing where the programming tool is itself a programming project. I think people just do it for fun and because they can. You'll probably never make something you can productionize but maybe you will learn some cool things along the way.

The same reason as all the other projects mentioned, because it teaches you many useful concepts that show up elsewhere when writing code. A carpenter likely won't learn much by manufacturing their own hammer that they can apply in their own workflow, a blacksmith on the other hand might.

Discussed at the time:

Challenging projects every programmer should try - https://news.ycombinator.com/item?id=21790779 - Dec 2019 (297 comments)

Also related:

More challenging projects every programmer should try - https://news.ycombinator.com/item?id=25489879 - Dec 2020 (223 comments)

The external link for the second story 404's.

https://archive.is/3Nbu8

https://austinhenley.com/blog/morechallengingprojects.html

On the text editor one:

The biggest challenge is figuring out how to store the text document in memory. My first thought was to use an array, but that has horrible performance if the user inserts text anywhere other than the end of the document.

I guess this is only an issue in low level languages, as I just used a JavaScript string and I don't think it's been a perf issue in 2+ years using my editor full time. Plenty of other things have been, of course - rendering long horizontal lines comes to mind, as my approach to optimisation assumed rendering a single line would be cheap and expanding the logic to render partial lines would add another whole dimension of complexity.

(By "long" I mean an entire minified file, for example.)

It's one of those things that show up at very large file sizes isn't it? Most programmers work with tiny text files 99% of the time.

If everything else is even less efficient than string management, then string management does not turn up as a bottleneck.

All of these are fairly easy to make in languages like Scratch but hard to make in lower-level ones. I don't know enough about coding theory and such to understand why, though.

The only one of these that Scratch might make easier is the Space Invaders game

Great, have fun experimenting and learning. Just please don't release your hobby projects as innovations or npm modules.

Why?

Yeah, I don’t have time for any of this.

Yet you have time to browse hn. That reminds me, I’ve got a side project that needs attention.

I will also add a CAS, computer algebra system to the list.

I've never seen an explanation of Gröbner bases that's accessible enough to the average non-mathematician developer to make that practical. Any pointers?

Doing some 2D game dev without an engine was the most humbling experience that showed me just how ridiculously fast computers have become. You do things that you swear are far too slow to work, and yet it works even on a Game Boy because it’ll do a million operations in a second and a million is apparently a big number.

“Wait… so I’m looking up and drawing every sprite every time? There’s no clever diffing happening to know which ones to change? There’s no second buffer to reference the previous frame and steal work from?!” And then you find the much more complex games that do start using these tricks and it’s ridiculous.

well, there ARE some very interesting tricks back in the old days on both the application and hardware side to help out with these exact problems (for one thing, this is exactly how a SpriteAtlas as a data structure is utilized. Why waste time uploading a dozen sprites that usually render in a predictable way when you can upload one big image and sample parts of the image?).

But in terms of today, yes. You can certainly brute force render 99% of anything from 15+ years ago on modest commercial hardware (i.e. not even gaming PCs) by drawing and re-drawing a scene.

For a more UI / Web based slant, I would recommend these (beyond a Spreadsheet - which is incredibly helpful for understanding data flow systems):

* Simple video game using Unity or Unreal (understanding perf limitations in a game where 30-60 fps is critical helps make performant interfaces elsewhere - even on Web)

* A simple Javascript framework similar to React (again, will help understand data flow & handling events)

* An http library wrapper around XMLHTTPRequest (fetch exists now, but understanding how to send & read HTTP requests from scratch will help in debugging any problems down the line like CORS issues, OPTIONS requests, etc.).

maybe godot would be a better alternative to unity and unreal for this? because, quite aside from not having to wallow in proprietary licensing, you can find out what's going on

Awesome list.

A few more I’d add:

- search engine

- web crawler

software renderer

programing language

When I was 17 I gave myself the challenge of making a profitable and useful app. The discipline and real market skills I learned made the 6 month development processes worth it, even though the app ended up being scrapped.

Interesting list, I guess it depends on your personal inclinations and circumstances, but if you're looking for ideas, it's a good start.

I won't go to far as to tell what every programmer should try, just what were my personal choices.

Using the Sinclair ZX Spectrum:

- Music staff editor and tracker (very crude)

- 2D game - Space Invaders (only one invader)

That computer wasn't even mine. With my first computer (386):

- Huffman compressor

- B-Tree indexes (outperformed DBase/Clipper)

- OOP form generator

- Several DOS background programs, one of them was later used by a big company for installations

Years later:

- Email checker for dial-up

- Manual parser for syntax highlighting

- Spreadsheet (something like that, I'm on it now)

Sure I forget someting.

There should be also, implement a Net Cat clone

Those are all good easy projects to do. Good intermediate-level follow-ons would be a JIT binary translator from any architecture to any other and expanding that simple OS to support SMP scheduling.

Try coding a massive multiplayer game if you really want to send shivers down a spine. From scratch of course.

He forgot to mention to build a small database (DBMS). For me, the big 3 projects are: 1. computer language (I already did), 2. Small operation system and 3. a small database (DBMS).

It's a great list. Would be a good to keep it as some live list so we can help evolve it over time.

I'd add two ideas:

Build lightweight memcached - it requires mixture of basic algorithms as well as good system design practices - it's a "simple" problem, but requires to think well from multiple angles - invalidation, concurrency, memory management...

Build your own docker - it's not as complex as it might sound - gives opportunity to understand basics of OS programming, which most engineers nowadays don't really use in day to day jobs

+1 for mini operating system.

Us, application developers, rely on many OS features: memory management, filesystem, etc. I'm sure eventually we'll ask "how such things are done behind the scene?"

That's why I tinker with xv6 (https://github.com/mit-pdos/xv6-public) during sparetime. Learning various process scheduling algorithms from textbook is a thing. Implementing it is another thing. I learn a lot. And it's definitely fun, even though there's almost zero chance the knowledge gained is relevant for my job (I'm a mobile app dev).

Writing a tool that can do large template conversions are particularly fun. Like handling Jinja into markdown. You can use this for future projects to build out documents without relying on too much effort if the documentation for things you work on follows a formula.

So whenever I don't know what to build or I want to learn a new programming language or framework

While the mentioned have some merits if you learn how to program, doing them once might be enough. Not sure how implementing a text editor the nth time will help you learn language X.

If I am learning a new language, I try to implement what interests me at that time, from small to big. It might be a small game, it might be an AI algorithm, it might be an authentication system, it might be an ORM. Doing thing that I like or need helps.

So unless you fancy text editors, I wouldn't churn one after another.

If you would like to contribute to a Python Text Editor / IDE, I would suggest https://github.com/Akuli/porcupine

It has a great (but very small) community and the maintainer is phenomenal as well.

Thank you for this idea, I was inspired by it years ago and wrote a delay queue using Golang. But it is dependent on the Redis, recently I want to remove the Redis and write a Key-value store by myself. Welcome to contribute your code to it: https://github.com/raymondmars/go-delayqueue