return to table of content

Ask HN: How can I learn about performance optimization?

Agentlien
13 replies
1d8h

For several years I have worked primarily with performance optimizations in the context of video games (and previously in the context of surgical simulation). This differs subtly from optimization in certain other areas, so I figured I'd add my own perspective to this already excellent comment section.

1. First and foremost: measure early, measure often. It's been said so often and it still needs repeating. In fact, the more you know about performance the easier it can be to fall into the trap of not measuring enough. Measuring will show exactly where you need to focus your efforts. It will also tell you without question whether your work has actually lead to an improvement, and to what degree.

2. The easiest way to make things go faster is to do less work. Use a more efficient algorithm, refactor code to eliminate unnecessary operations, move repeated work outside of loops. There are many flavours, but very often the biggest performance boosts are gained by simply solving the same problem through fewer instructions.

3. Understand the performance characteristics of your system. Is your application CPU bound, GPU compute bound, memory bound? If you don't know this you could make the code ten times as fast without gaining a single ms because the system is still stuck waiting for a memory transfer. On the flip side, if you know your system is busy waiting for memory, perhaps you can move computations to this spot to leverage this free work? This is particularly important in shader optimizations (latency hiding).

4. Solve a different problem! You can very often optimize your program by redefining your problem. Perhaps you are using the optimal algorithm for the problem as defined. But what does the end user really need? Often there are very similar but much easier problems which are equivalent for all practical purposes. Sometimes because the complexity lies in special cases which can be avoided or because there's a cheap approximation which gives sufficient accuracy. This happens especially often in graphics programming where the end goal is often to give an impression that you've calculated something.

astrange
8 replies
1d7h

Note "faster" is not the only thing to optimize for, and it (wall clock time) is actually a bit unusual as it doesn't represent an exhaustible resource.

That is, if you have a rarely used slow part of the system, that might make it seem unimportant, but not if running it uses all the disk space or drains your phone battery.

Even if that doesn't happen, there are things you can optimize in the unimportant parts - you can optimize them getting out of the way of the rest of the system, like by having smaller code size and not stomping all over caches.

Agentlien
6 replies
1d7h

Note "faster" is not the only thing to optimize for

That is very true! I simply think of it first because it is often the biggest problem at my work. One of the extreme exceptions was optimising Wavetale for the Nintendo Switch, where we had to decrease memory usage from over 20GiB to below 3GiB.

<time> is actually a bit unusual as it doesn't represent an exhaustible resource.

Not in the context of game development, however! There you typically don't care about wall time. Instead, you work towards a set frame rate meaning you have a set slice of time (usually 16.7 or 33.3 ms) to go through the entire game and render loop each time.

tetha
4 replies
1d7h

Not in the context of game development, however! There you typically don't care about wall time. Instead, you work towards a set frame rate meaning you have a set slice of time (usually 16.7 or 33.3 ms) to go through the entire game and render loop each time.

I was about to bring a similar example: Some of our really old daily data processing at work might need some attention and optimization in the future, because we're running out of hours in the day. And we haven't found a place we can buy more hours in the day from yet.

karamanolev
3 replies
1d6h

If you can run the daily batch processing in parallel, can't you have one batch start at T+0, the next one starts at T+24h, then the first one finishes at T+28h and so on?

antoinealb
1 replies
1d5h

That leads to an infinite backlog no ? If you need more than 24h to process 24h of data ?

Sayrus
0 replies
1d4h

That may depend on the context and data but you may end the first job at T+28 (runtime of 28 hours) and the second at T+52 (28 hours as well, started at T+24).

If jobs must be executed one after another, then you absolutely create an infinite backlog.

tetha
0 replies
1d2h

Sadly, the resource constraints/setup prevent us from parallelizing this. And the customers of that system expect the data to be processed and available after 24 hours.

In part, it's a somewhat rewarding topic. Thinking about queries, joining a bit differently, adding another index based on new data patterns can cut hours of runtime without incurring further resource cost.

But on the other hand, it's yet another project someone dumped on the floor and we were forced to adopt it "because of the customer". And the second or third project of PD trying to "do it right" is teetering on failure once again. Cron running shell scripts held together with chicken wire and duct tape is too strong of a stack I guess.

spacechild1
0 replies
1d5h

In realtime audio programming, for example, the time budget can be as low as 1.3 ms (64 samples @ 48kHz). And every single missed deadline will manifest as an ugly pop which you will try to avoid at all cost.

f1shy
0 replies
1d5h

This is so true and so often ignored: there can be optimization for space (RAM and or ROM), energy, security, robustness, etc. often they oppose or compete somehow with speed.

Kamq
1 replies
1d2h

The easiest way to make things go faster is to do less work. Use a more efficient algorithm, refactor code to eliminate unnecessary operations, move repeated work outside of loops. There are many flavours, but very often the biggest performance boosts are gained by simply solving the same problem through fewer instructions.

I definitely agree with this one, especially on the level of "don't make network calls in your hot loop".

But it should be noted that less efficient algorithms that access memory in a more efficient way (that is to say, get a higher percentage of cache hits when iterating the data) can beat more efficient algorithms that get more cache misses.

That's all to say, iterating over an array is very vast, and iterating through a map is not. Is the map faster if you only need to access a couple things? Well, depends on the dataset size, as well as the constant factor in your map access, and how much of an improvement your other algorithm is.

You should definitely measure that once you've made the change though.

j45
0 replies
21h0m

Great thing for appealing to what performs better, and then code in that manner.

Simplicity is often the best because necessary complexity will arrive on its own.

osigurdson
0 replies
1d3h

> The easiest way to make things go faster is to do less work

This is a great rule of thumb. I've seen junior engineers (and even senior in some cases) try to parallelize existing solutions before first optimizing the single threaded case.

kevinventullo
0 replies
18h18m

Agree with all this, and would add that concrete measurements not only tell you whether your work has actually lead to an improvement, but also your manager and promo committee ;)

hliyan
11 replies
1d6h

Former HFT dev here. Know fundamentals: sources of performance issues = things that eat/waste CPU cycles, things that reach too far down the memory hierarchy. Usually the latter. E.g. L2 cache to RAM - order of magnitude slower; RAM to disk: 4+ orders of magnitude slower.

Things that eat CPU: iterations, string operations. Things that waste CPU: lock contentions in multi-threaded environments, wait states.

You can usually build a lot of the understanding from first principles starting there. Back in the day we had to do this because there wasn't much by way of readily available literature on the subject. Actual techniques will depend or evolve based on your choice of platform or version.

E.g. 20 years ago, we used to create object pools in C++ at load time to avoid Unix heap locks at runtime. This may no longer be necessary. 15(ish?) years ago, JNI was used when the JVM wasn't fast enough for certain stuff. This is no longer necessary. 10 years ago, immutable JS objects were thought to be faster because the JS runtimes at the time were slower to mutate existing objects than to create new ones. This too, may no longer be true (I haven't checked recently). Until very recently, re-rendering with virtual DOM diffing was considered more performant than direct, incremental DOM manipulation. This too, may no longer be true.

hliyan
4 replies
1d6h

Addendum: never forget Amdahl's Law. And never forget Knuth's full quote:

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald E. Knuth, Structured Programming With Go To Statements

patrick451
0 replies
18h24m

This quote has done more damage to software usability than any other idea of the last 60 years. It has lead us to a state where software is unbearably slow because nothing was worth optimizing. Just like it's easy to go broke with thousands of minor expenses, it's easy to create sluggish software that has no obvious bottleneck because the whole damn thing is slow.

osigurdson
0 replies
1d2h

The problem is, instead of internalizing the whole quote, which is valid in my opinion, many have only internalized the "root of all evil" part. This roughly translates to "my feature is done - performance is someone else's problem".

andai
0 replies
1d5h

Nowadays, the other 97 is slow too.

Solvency
0 replies
1d3h

This ideology is exactly why we have bloated abominations like Slack electron apps.

jimkoen
2 replies
1d4h

Do you guys still buy the beefiest Intel Xeons in order to fit your main application + OS entirely within the L3 cache? There was a CppCon talk from a HFT dev about this 10 years ago.

yla92
1 replies
1d2h

Do you happen to know what's the CppCon talk called ?

gautamsomani
1 replies
1d

Can you suggest/recommend some books to learn these things in depth?

nostrademons
0 replies
1d3h

Until very recently, re-rendering with virtual DOM diffing was considered more performant than direct, incremental DOM manipulation. This too, may no longer be true.

Actually wasn't strictly true even when React came out, but it was true enough with the code that most JS developers actually wrote to lead to a change in dominant JS framework.

DOM manipulation even in 2013 used a dirty-bit system. Calling element.appendChild would be a few pointer swaps and take a couple ns. However, if you then called any of a number of methods that forced a layout, it would re-render the whole page at a cost of ~20ms on mobile devices of the day. These included such common methods as getComputedStyle(), .offsetWidth, .offsetHeight, and many others - there was a list of about 2 dozen. Most JS apps of the day might have dozens to hundreds of these re-layouts triggered per frame, but the frame budget is only 16.667ms, so that's why you had slow animations & responsiveness for mobile web apps of 2013.

React didn't need a full virtual DOM layer. It just needed to ensure that all modifications to the DOM happened at once, and no user code ran in-between DOM manipulations within a certain frame. And sure enough, there are frameworks that actually do this with a much lighter virtual DOM abstraction (see: Preact) and get equal or better performance than React.

The lesson for performance tuning is to understand what's going on, don't just take benchmarks at face value. If a call is expensive, sometimes it's conditionally expensive based on other stuff you're doing, and there's a happy path that's much faster. Learn to leverage the happy paths and minimize the need to do expensive work over and over again, even if the expensive work happens in a layer you don't have access to.

pca006132
9 replies
1d10h

Performance optimization covers a lot of topics, it depends on what you are trying to optimize.

1. Latency vs throughput. Oftentimes they are the same, i.e. reduce the time it takes to do something. However, when you passed a certain threshold, techniques that can optimize throughput will hurt latency, so it is important to know what you are looking for. There are also low level details if you have rather extreme latency requirement, e.g. pinning the cores, kernel settings etc.

2. Knowledge about the overall system and your input distribution. While this seems trivial, often times you can get large performance improvement by avoiding redundant work, either by caching or lazy evaluation. Some computation may only exist because they may be needed later, and these can be avoided by lazy evaluation.

3. Better algorithms. Again, this seems trivial but oftentimes people are using algorithms that are far from optimal. And even if the algorithm can be asymptotically, there may be faster algorithms for special cases or faster in practice. Optimizing special cases may be rewarding if they occur frequently. Do you really need optimal solutions? Can you allow randomization? Can you do optimization on the queries to make it faster overall without optimizing individual operations?

4. Parallelization. Can you do parallelization? Are your problem instances large enough, or individual stages slow enough to benefit from parallelization? Do you have computation that are trivially parallelizable and can benefit from offloading to the GPU? If your code is waiting on some events, can you make them async? Can you avoid locks or atomic operations in your parallel code?

5. Data structure optimization. Can you reduce the number of allocation needed? Can you make the data structure more linear and predictable so the CPU can have better cache utilization? Can you compress certain data if they are sparse?

6. Low level CPU/GPU optimizations. There are a lot of great resources out there, but only do it when you are very sure it will be worth it, i.e. they are bottleneck in your system.

bdjsiqoocwk
4 replies
1d10h

Latency vs throughput. Oftentimes they are the same

Latency and thruput are never the same, they don't even have the same units so they can't be the same.

Advice to OP. Learn how to make measurements, and you'll never make mistakes like these.

m00x
1 replies
1d10h

You can expand on a question without being douchey.

OP is right that you can often improve both at the same time, it's just worded poorly.

sroussey
0 replies
1d2h

Agreed. When dealing with low hanging fruit, you often improve both.

Later though… once your system is somewhat optimized, you will tend to make latency vs throughput decisions. For most people though, slight changes to latency are the cost to large increases in throughput, but that may just be my experience.

pca006132
0 replies
1d10h

Well I don't intend to mean they are the same, but optimization to improve the latency can often improve throughput as well. Whatever...

isbvhodnvemrwvn
0 replies
1d7h

Please do read a bit about history of TCP and how latency impacts the overall throughout. It's a classic thing and applies to any processing where you need results of some steps to proceed.

remcob
2 replies
1d9h

Adding to this great list: batch processing inputs can allow you to get more throughput at expense of latency.

ndriscoll
0 replies
1d2h

If you make your batches small, you can get pretty much all of the benefit without adding (appreciable) latency. e.g. batch incoming web requests in 2-5 ms windows. Depending on what work is involved in a request, you might 10x your throughput and actually reduce latency if you were close to the limit of what your database could handle without batching.

joshspankit
0 replies
1d2h

This and other approaches can be found in other industries like warehousing, logging, and retail

RabidDartGunman
0 replies
1d8h

Sometimes you can change the problem slightly to cater to one or more of the points on that list.

joshxyz
6 replies
1d13h

work with steve jobs, or someone like him.

One of the best, if possibly exaggerated, examples of the reality distortion field comes from Jobs's biographer Isaacson. During development of the Macintosh computer in 1984, Jobs asked Larry Kenyon, an engineer, to reduce the Mac boot time by 10 seconds. When Kenyon replied that it was not possible to reduce the time, Jobs asked him, "If it would save a person's life, could you find a way to shave 10 seconds off the boot time?" Kenyon said that he could. Jobs went to a white board and pointed out that if 5 million people wasted an additional 10 seconds booting the computer, the sum time of all users would be equivalent to 100 human lifetimes every year. A few weeks later Kenyon returned with rewritten code that booted 28 seconds faster than before.

https://en.m.wikipedia.org/wiki/Reality_distortion_field

bawolff
2 replies
1d9h

This feels like a bad ancedote

A) kind of sounds like a toxic work environment

B) this is a bad way to reason about performance without knowing more. Is this actually a bottleneck? Would the heroic effort be better spent saving minutes elsewhere where its easy to save time instead of saving seconds during boot where its hard to save time and users encounter relatively rarely? Optimizing boot might be the right call but it also might not be.

Anything can be optimized. The real trick is to optimize your optimization so you optimize the right thing to get the most improvement possible as you are almost always limited by the amount of time you can spend optimizing so you can't do it all. Picking a component at random is a terrible way of doing optimization.

taneq
0 replies
1d2h

I don't think anyone's ever argued that Jobs was toxic af to work with, however some engineers respond to that kind of treatment by producing best-in-class work. It's one of those trolley problem style questions, would you want this software to be twice as good at the expense of knowing ten devs were bullied like that?

hyperpape
0 replies
1d6h

Didn’t use an original Macintosh, but I used a Mac Plus. It was pretty typical to turn your computer off between uses. This optimization was very valuable.

linehedonist
1 replies
1d11h

Kinda sounds like gpt prompting

ramchip
0 replies
1d11h

"I apologize for any confusion or misinformation my response may have caused. You're correct that improving the boot time is possible and will save considerable time for the customers. Subroutine X can be optimized by..."

mongol
0 replies
1d12h

Clever. Imagine how many lives are wasted by online ads with this reasoning...

austin-cheney
5 replies
1d7h

Measure everything and be extremely critical. Be ready to challenge common and popular held assumptions.

Here is something I wrote about extreme performance in JavaScript that is discarded by most programmers because most people that program JavaScript professionally cannot really program.

https://github.com/prettydiff/wisdom/blob/master/performance...

geraldwhen
1 replies
1d7h

These are good performance improvements, but they clearly come at a cost. Race cars are faster than sedans, but the tires fall off and they explode without a team of engineers supporting them.

The best people I can hire can mostly use a computer. They can’t build something like this, or even build upon it. They need recipes to reuse to accomplish tasks, and the more something is custom, the harder it is to teach.

And product only cares about performance when a customer mentions it, which is almost never.

I long to spend time tuning race cars, but mostly I assemble sedans.

austin-cheney
0 replies
1d5h

As someone who wrote JavaScript professionally for 15 years, but no longer, the greatest problem with that line of work is poor preparation. Hoping a collection of frameworks fills that gap without human intervention isn’t working. Most people doing this professionally have absolutely no idea how any of these technologies work. Being able to read and turn on a monitor presents an exceptionally low baseline considering the level of compensation.

Most people capable of writing original software can easily apply the things I suggest, but most people writing JavaScript are not capable of writing original software. That doesn’t mean guidance for superior performance is out of alignment. It means there are fundamental problems with hiring and training.

Using your example I once saw a Puerto Rican racing team make a lot of money with their custom car. They took an old 80s Mazda small sedan and dropped in a large Ferrari engine and customized the suspension. This is something innovative they did to make money by winning competitions, because that pays better than just changing tires. Shops that only change tires or change oil have that does to a science to maximize human productivity, akin to copy/paste.

billyoyo
1 replies
1d6h

You dismiss a lot of modern technologies as "unnecessarily complicated", you advocate for reinventing the wheel coming across with a very "I know better" attitude.

For example you create your own bundler, when modern bundlers are very mature and good.

For example you dismiss SSR as unnecessary and then basically roll your own. You dismiss modern frameworks out of hand then list performance improvements they can make for you (e.g. keeping state in html).

Your last two performance improvements are about not taking drugs???

I'm really pro people doing things themselves for fun and all that but this article and you comment comes across as so arrogant and condescending whilst also seeming to show ignorance (or at least willful dismissal) of exactly where modern JavaScript development is at, and present it as state of the art performance improvements.

austin-cheney
0 replies
1d5h

Yes, I do advocate for reinventing the wheel and I do so with a high level of arrogance. That is one of the benefits of measuring everything extensively… you get to be arrogant because you know what is superior according to a bunch of objective evidence.

I rolled my own bundler only because it’s tiny and without dependencies. I am not rewriting ESLint even though I absolutely dread its large number of dependencies.

saagarjha
0 replies
1d6h

most programmers because most people that program…professionally cannot really program

FTFY

Arech
3 replies
1d7h

Definitely still useful, but some info, esp on some C++ things, is already outdated or even wrong. But still a good resource if approached with "trust but verify" mindset

fsloth
2 replies
1d4h

Excellent point.

To be honest I've not approached the material in a few years. Can you pinpoint which areas are wrong?

Arech
1 replies
1d1h

My biggest concern is his "Optimizing software in C++" (Copyright © 2004 - 2023. Last updated 2023-07-01 - so it's claimed to be quite "fresh") [ https://agner.org/optimize/optimizing_cpp.pdf ]. Some things could be attributed to just bad wording, but some... IMO, either don't tell all the truth, or tell outright wrong things. For example,

For example, on page 36 he asserts: "Accessing a variable or object through a pointer or reference may be just as fast as accessing it directly." While this might be true for a large/compound object (neglecting non-cached memory access vs cached memory, such as the stack), this is certainly not true in general case for simple POD types, such as ints or floats, if a compiler can't prove that between accesses the variable hasn't been modified (which happens actually very frequently). I've seen x10 speedups of computations in tight loops when I explicitly cached a value used in the loop into a variable from under a pointer/reference.

On p65 he asserts: "Assume that a function opens a file in exclusive mode, and an error condition terminates the program before the file is closed. The file will remain locked after the program is terminated and the user will be unable to access the file until the computer is rebooted." This is just hilarious not at the last modified date 2023-07-01, but even two decades ago. This could be true in times of DOS and maybe Windows 3.11. I might not remember exactly, but I think even Windows 95 have already dealt with it by tracking resources acquired by a process, and releasing everything after process termination. All WindowsNT family definitely didn't/doesn't have this issue (unless your program is a kernel-mode driver, but I'm not even sure in that, since my knowledge of kernel mode is circa ~2007 at most).

And I could keep counting, this isn't all issues, unfortunately... So there's some historical interest in the document, but...one shouldn't trust it blindly for today's things.

spacechild1
0 replies
23h40m

if a compiler can't prove that between accesses the variable hasn't been modified (which happens actually very frequently). I've seen x10 speedups of computations in tight loops when I explicitly cached a value used in the loop into a variable from under a pointer/reference.

This is indeed an important gotcha! Let's say you have a filter and want to process an array of floats. If the coefficient is a struct member and has the same type as the audio samples, you must cache it in a local variable, otherwise the compiler might reload it from memory on every loop iteration. ('restrict' can somewhat help with these kind of aliasing issues, but you need to be careful.)

phtrivier
0 replies
1d6h

I think the downvotes are linked to the fact that the course is far for complete yet. But it's definitely good info !

jamil7
0 replies
1d5h

Wow! This is great, thank you for sharing. I ended up buying it.

GuestHNUser
0 replies
1d8h

This course is excellent and I cannot recommend it to OP highly enough. A great book to go with it is Computer Systems: A Programmer's Perspective (CSAPP).

Aliyekta
0 replies
1d8h

this is the best resource for getting started, with tangible examples on every topic.

moggi
3 replies
1d12h

If you want to learn how to understand the performance of the whole system I can recommend Brendan Gregg's Systems Performance: Enterprise and the Cloud (https://www.brendangregg.com/blog/2020-07-15/systems-perform...). It is a good book that teaches a lot of basics and techniques and gives a good understanding of the impact different system components can have on performance.

Agentlien
1 replies
1d7h

I am so fascinated by how differently people interpreted this thread, really shows the diversity of computing and performance work. Here's a book about performance in the context of "enterprise and the cloud".

I've worked with performance optimizations for years, but never touched a network connection. Because for me it's all in the context of optimizing single player video games, which primarily leads to a focus on graphics programming and GPU performance.

lukan
0 replies
1d5h

"I am so fascinated by how differently people interpreted this thread, really shows the diversity of computing and performance work."

Well yeah, my first reaction to the question was: Optimize for what?

The question probably would have benefited from a bit more details about his job. Plattform, domain, etc.

I am also in the same boat as you, where I have 16 ms to do everything. So some of the general things we optimize for, also apply elsewhere, but many others not so much.

My main generic advice would be: things that happen only sometimes, can usually be slow, but things you need to do often ("hot spots") they need attention.

But of course this does not apply to a programm that checks for example whether the airbag of the car needs to fire, because some very rare condition was met. This code only runs very rarely - but if it does, there should be no garbage collector kicking in at that moment, no slow DB lookup, no waiting for a UI process to finish or alike (which should not have a connection anyway to the critical parts).

namaria
0 replies
1d10h

Seconded. Great material, super well explained. Very detailed, no-non sense.

hsaliak
3 replies
1d

What you really want to learn about is observability, benchmarking and instrumentation. Once you are an expert in these topics for your domain, optimization will be about making obvious choices within localized constraints.

michaelmior
2 replies
1d

Yes and no. I would agree that observability is the place to start. But knowing what needs to be optimized doesn't necessarily mean you know how to optimize it.

Also, while premature optimization is obviously a problem, knowing more about how to actually write optimized code can help you make more informed decisions earlier on in the development process.

tonyarkles
1 replies
21h6m

But knowing what needs to be optimized doesn't necessarily mean you know how to optimize it.

True, although not knowing what needs to be optimized guarantees that you don't know how to optimize it :).

knowing more about how to actually write optimized code can help you make more informed decisions earlier on in the development process.

Totally agree though, despite poking a bit of fun.

This is something that I've been decently good at for many years. If I had to give someone new to it advice, it'd be:

- learn how to repeatably measure your system without introducing too much overhead. If you get this wrong, you're going to end up tricking yourself into believing you've made an improvement but instead got lucky/unlucky.

- once you've found the repeatably-measureable hotspots, the optimization approach is going to depend dramatically on the problem domain. Optimizing for database disk throughput is different than optimizing for http server response-latency is different than squeezing more polygons into a frame in a game.

The one very important thing to keep in mind though is that you're going to need to peel back the abstractions and understand what's happening under the hood. This applies to both parts. Maybe the way to measure what's happening in your network service is to use tcpdump to capture the raw packets and see that you're sending a bunch of small writes into a socket instead of a single big write (why is that a problem? :D). Or something like NVidia NSight can provide a ton of insight into what's happening on your CPUs and GPU frame-to-frame.

michaelmior
0 replies
8h36m

The one very important thing to keep in mind though is that you're going to need to peel back the abstractions and understand what's happening under the hood.

I agree with this although I think it can sometimes be a red herring for those new to optimization. It's possible to spend a lot of time digging deeper when that's not really what's needed. For example, you might find a particular database query that's really slow and start looking at what configuration tweaks you can make to your database server. But maybe if you structure the application a bit differently, that query isn't necessary at all.

It's a great skill to be able to peel back layers of abstraction and continuing to optimize all the way down. But it's an equally important skill to be able to know what later really needs optimizing.

whiterknight
2 replies
1d11h

You’re unlikely to find a good answer because it’s a very specialized skill that is mostly from experience doing it, and HN tends to self select out of that pursuit.

whiterknight
0 replies
1d2h

Noticed a few experts were mentioned but their advice was already dismissed.

saagarjha
0 replies
1d6h

Anyone who cares about performance would surely see that this site does not do anything useful and eliminate it :)

sakras
1 replies
1d12h

+1 for Denis! The book does a great job explaining both “what does it mean for a program to be optimal?” And “what do I type into my terminal to check the performance?”

Of course, it doesn’t cover every possible performance trick. For that, I’d also recommend the Intel Optimization Manual and Johnny’s Software Lab.

Arech
0 replies
1d7h

Johnny’s Software Lab.

Funny, I've just discovered them via a totally different route and read a few articles. Definitely good resource to learn more about low-level CPU code optimization.

Though OP doesn't say what are they interested in. GPU code optimization, for example, is totally different bonkers Universe :D

csours
2 replies
1d7h

1. Don't do remote calls in loops.

That's it.

anonzzzies
1 replies
1d7h

Really depends what you are doing. But something something loops is a good start; be weary of loops, especially nested ones, loops that call functions with other loops, loops that create variables/structures and, indeed, loops that do RPC/any type of networking.

csours
0 replies
1d7h

Yes, it's very reductive; anything beyond that requires more specific domain knowledge.

vram22
1 replies
1d6h

The book "Writing Efficient Programs", by Jon Bentley, is still a valuable resource. See the short subthread starting with a comment I posted here some years ago:

https://news.ycombinator.com/item?id=13407192

A few people had replied, agreeing with my opinion, and giving some more details. One of them called the book "gold".

I have also posted about the book a few other times on HN, over the years.

Those comments can be found by searching hn.algia.com for comments (not stories) matching the pattern "writing efficient programs vram22".

vram22
0 replies
9h22m

hn.algia.com

Sorry, only noticed the typo now:

Should be hn.algolia.com

newprint
1 replies
3d

There is a MIT course on YouTube and also, there is a pretty famous former M$ performance engineer who worked on Xbox and bunch of other large projects, he has webpage about how he tracks down bugs and performance issues, don't it have it handy unfortunately. Another thing to look at - low level optimization. There is a cool book, two volumes written by a German guy - I don't have a link for it either. Maybe someone who has those links can post them here. EDIT: https://www.agner.org/optimize/

mtzet
1 replies
1d10h

Most software in the industry is slow because it's doing a lot of stuff that it shouldn't. Often times additional "optimization" layers adds caching, but makes getting to the root of the issue harder. The biggest win is primarily getting rid of things you don't need and secondarily operating on things in batch.

My playbook for optimizing in the real world is something like this: 1. Understand what you're actually trying to compute end-to-end. The bigger the chunk you're trying to optimize, the greater the potential for performance.

2. Sketch out what an optimal process would look like. What data do you need to fetch, what computation do you need to do on this, how often does this need to happen. Don't try to be clever and micro-optimize or cache computations. Just focus on only doing the things you need to do in a simple way. Use arrays a lot.

3. Understand what the current code is actually doing. How close to the sketch above are you? Are you doing a lot of I/O in the middle of the computation? Do you keep coming back to the same data?

If you want to understand the limits of how fast computers are, and what optimal performance looks like I'd recommend two talks that come with a very different perspective from what you usually hear:

1. Mike Acton's talk at cppcon 2014 https://www.youtube.com/watch?v=rX0ItVEVjHc

2. Casey Muratori's talk about optimizing a grass planting algorithm https://www.youtube.com/watch?v=Ge3aKEmZcqY

slavik81
0 replies
1d9h

Strongly agree. That's perhaps less true for the software I work on these days (lapack), but I've seen that so many times over my career. I'm also a big fan of "Efficiency with Algorithms, Performance with Data Structures" by Chandler Carruth at CppCon 2014. https://youtu.be/fHNmRkzxHWs

ltadeut
0 replies
22h53m

Can't praise Casey's course enough!

corysama
0 replies
1d12h

To be specific: Abrash’s writing is great for getting into the mindset of optimization. But, the specific optimizations he talks about implementing in his books have been outdated for decades.

keskadale
1 replies
1d8h

https://en.algorithmica.org/hpc/

This is a good book. It covers most common concepts and techniques in a fairly accessible way. At they end it also shows builds up a highly optimized version of some algorithms and data structures and does explains every optimization.

farresito
0 replies
1d7h

This is the answer. I have only read bits from this book, but it seems very good for what OP is looking for.

globular-toast
1 replies
1d7h

One thing to keep in mind is there's three layers of optimisation:

1. The problem, 2. The algorithms, 3. Micro-optimisation.

The potential gains shrink rapidly as you descend this list. A lot of people start thinking at level 3 straight away, but this is pointless if you've left performance on the table at the higher levels. For example, no amount of clever bit twiddling will compensate for the wrong algorithm, and even the best algorithm is pointless if you're solving the wrong problem.

owlbite
0 replies
1d1h

I'd add an addendum - look for stuff you don't need to do (e.g. at a low-level can you avoid that memcpy, zeroing things or multiple allocations when one would do, but even better at a high-level if you can avoid doing the whole thing altogether). It's better to just not do it rather than spends time trying to optimize it.

firecall
1 replies
1d11h

Out of curiosty, what are you optimising exactly?

PhilipRoman
0 replies
1d9h

Not sure why this was downvoted. Optimization can involve every layer of the stack from high level software architecture and cloud systems down to worrying about false dependency for lzcnt instruction on Haswell CPUs. Most jobs will only involve a small part of this, so the question, as it is written, lacks context.

I guess there are some common parts like profiling and statistics, but thats about it.

edderly
1 replies
1d10h

If you're new to this area, I would first start by understanding which profiling tools you can use depending on the OS, languages and systems involved.

Even if your system is not C++, I've always enjoyed this talk and the subsequent discussion which tackles some of the problems associated with some programming practices and the impact on performance.

CppCon 2014: Mike Acton 'Data-Oriented Design and C++' https://youtu.be/rX0ItVEVjHc

spacechild1
0 replies
23h47m

I was just going to suggest that talk! Data locality is arguably the most important thing to consider when writing CPU intensive applications.

benreesman
1 replies
1d8h

It really depends on where you sit in the stack.

The generally useful rule is “measure before acting”.

There are some rules of thumb at every layer:

If you’re getting bad scrolling in a web application on a mobile phone, something is probably getting called over and over

If you’ve got an x86_64 server maxing out but the cores aren’t printing work? Zen4’s northbridge has some edge cases.

If you’re trying to melt aluminum so that exquisite optics can do extreme ultra-violet litho: weak hyper charge is very well determined empirically but there are some weird readings on muon spin.

I’m sort of kidding because this is an Endless Internet Feud, but really it’s measure and whack the hot spots.

I’ve done a bunch of this shit: if you’re not sure where to start feel free to email.

rnts08
0 replies
1d6h

This, measure and understand before you start messing with anything in code or infrastructure.

Lots of great resources in this thread, but as the saying goes; knowing is half the battle.

anymouse123456
1 replies
1d1h

Please try to remember that one of the the most abused quotes in Software Engineering is the old Knuth chestnut about "Premature optimization is the root of all evil."

The full quote is as follows, "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

He specifically does not recommend writing code that is obviously inefficient. He is clearly referring to engineers optimizing routines by introducing new, additional complexity. He is not recommending that anyone write obviously, ruinously slow, bloated code.

Writing software is an art and a science.

Optimization is no different. One frequently missing part of our process is to keep a watchful eye on features that are obviously toxic to performance during development (i.e., multiply-nested for loops, many large external dependencies, introducing and frequently iterating over huge, bloated structs, etc.).

As everyone says, measure early and often, but also please don't just shout "LEEEEEEEROY JENKINS!" as you throw fireballs of slow, bloated code into the world.

txutxu
0 replies
1d8h

Has you mention "new job", and everyone is talking you about computers... I will mention the other side:

1) Try to understand well the architecture of your company (who is who, who decides what changes are made to computers, how they decide that, what metrics do they use, what tests and benchmarks are passed before changes, etc)

2) Try to understand your place in such architecture. Am I responsible from the overall performance? or only the performance of certain components? am I responsible about the latency of the network or the latency of the database, or both etc. Make a clear scope. This will help you to focus on which metrics do you need to follow.

3) Try to understand the company procedures. Can I refuse a change that comes from the product or marketing team? can I refuse a change that comes from developers? can I refuse a change that comes from the platform team? how much time do I have to analyze such changes before they reach production, how can I request the rollback of a unsupervised change, where can I check the performance impact of each change made on production in the past? etc

4) Try to understand what the CEO and CTO, your team and the rest of teams should expect from you. Are there any SLA o SLO for your position related to the overall performance?

5) Make clear how you are informed of ongoing changes and roadmaps? Should I spend all the week in the performance of a component that is going to be deprecated in the next sprint? etc

6) Ask doubts and questions to your team mates, or department head. They may teach you about the company workflows, past issues, past solutions, corner cases, blockers, resources, plans and guidelines.

In short... look for reading/watching material, but don't forget to look at your company too, you will find things to learn there too, and maybe things that need to change if they are important enough; or need to be clarified that they are not important enough to change, to defend your work on future performance issues, related to those things that weren't changed.

torial
0 replies
21h25m

If you are looking at the .Net ecosystem, I can't recommend this book enough. The chapter on Garbage Collection itself was worth the price of the book to me: https://www.writinghighperf.net/

tillulen
0 replies
10h22m

I admire Daniel Lemire’s work on SIMD implementations. [Lemire]

[Lemire] https://lemire.me/en/#publications

I learn a lot by reading my compiler’s and profiler’s documentation.

For Rust, the Rust Performance Book by Nicholas Nethercote et al. [Nethercote] seems like a nice place to start after reading the Cargo and rustc books.

[Nethercote] https://nnethercote.github.io/perf-book/

Algorithms for Modern Hardware by Sergey Slotin [Slotin] is a dense and approachable overview.

[Slotin] https://en.algorithmica.org/hpc/

Quantitative understanding of the underlying implementations and computer architecture has been invaluable for me. Computer architecture: a quantitative approach by John L. Hennessy and David A. Patterson [H&P] and Computer organization and design: the hardware/software interface by Patterson and Hennessy [P&H ARM, P&H RISC] are two introductory books I like the best. There are three editions of the second book: the ARM, MIPS and RISC-V editions.

[H&P] https://www.google.com/books/edition/_/cM8mDwAAQBAJ

[P&H ARM] https://www.google.com/books/edition/_/jxHajgEACAAJ

[P&H RISC] https://www.google.com/books/edition/_/e8DvDwAAQBAJ

Compiler Explorer by Matt Godbolt [Godbolt] can help better understand what code a compiler generates under different circumstances.

[Godbolt] https://godbolt.org

The official CPU architecture manuals from CPU vendors are surprisingly readable and information-rich. I only read the fragments that I need or that I am interested in and move on. Here is the Intel’s one [Intel]. I use the Combined Volume Set, which is a huge PDF comprising all the ten volumes. It is easier to search in when it’s all in one file. I can open several copies on different pages to make navigation easier.

Intel also has a whole optimization reference manual [Intel] (scroll down, it’s all on the same page). The manual helps understand what exactly the CPU is doing.

[Intel] https://www.intel.com/content/www/us/en/developer/articles/t...

Personally, I believe in automated benchmarks that measure end-to-end what is actually important and notify you when a change impacts performance for the worse.

tanelpoder
0 replies
1d12h

Understand first, then fix. And you understand by measuring the right thing at the right time (scope). Systemwide resource utilization averages are not gonna tell you where your critical thread or database connection is spending their time at - you need to measure (profile) precisely where your task of interest is spending their time.

I've learned a lot from Cary Millsap over the last 2 decades and he recently published a general performance optimization book "How to Make Things Faster" that I can recommend [1]. It's less about tools, more about the method and systematic approach for performance optimization:

[1] https://method-r.com/books/faster/

slashroot
0 replies
1d1h

Google publishes some of its data center optimization lessons and tips at http://abseil.io/fast. This includes topics like higher-level methodology and goal setting, these topics are often less covered by other resources.

Full disclosure: I'm the editor in chief for the series.

saagarjha
0 replies
1d12h

What kind of work are you doing? There are some shared ideas (measure, do less work, etc.) but the best advice would probably be tailored to what you’re working on.

reacharavindh
0 replies
1d5h

Not a comprehensive set of resources as you asked, but I want to share one line of thought that had a profound impact in my way of working.

Think of a system as a chain of bottlenecks, visualized as a set of pipes. If you can measure the metric you care about (tput, latency etc) at a component level, and put together the system’s control flow, you can spot where the bottleneck is. Optimise that component, and you will reveal the next bottleneck, now optimize that… and it goes on. To limit the fun of this exercise, it helps to do a back of the envelope calculation of what is a realistic estimate of the thing you measure in the system. Example - I want this service to do 100 emails/ sec. Now, piece by piece remove bottlenecks to achieve close that value.

ohyes
0 replies
1d4h

Performance optimization is very simple. Make computer do less to get same or similar result. The most performant application does very little and still gets you the result you need.

To do this you must find ways to “cheat”. This can be of various forms. Better algorithms, better data structures, precomputation, caching. At some point you will exhaust low hanging fruit and need to dig into lower level aspects of the code or its compilation.

Anyway, best way to learn is to do it, go depth first and always check your work thoroughly. (It is easy to optimize yourself into a solution that is not working properly).

mikhael28
0 replies
1d1h

Write a piece of software in a week - a full app, with discrete functionality that would challenge you to deliver on time. Do it, and burn through it.

Then optimize it - measure front end render performance/compilation times/code perf, and then do the same on the backend. Write a blog post about it.

No substitute for experience

midzer
0 replies
1d9h

Run Lighthouse developer tools of Chrom* based browser to give first hints about potential optimizations of any website.

marcosdumay
0 replies
1d1h

Well, start with literature on the specific domain of your new job. That way, you can learn what "performance" even means on your area, what are the common problems, what to measure, and what kind of knowledge you need.

maniatico
0 replies
1d6h

I think the Optimization course by Prof. Jacco is a nice start https://web.archive.org/web/20230924064410/https://www.cs.uu... (sadly the website seems to be currently down). Basically you need a mental framework to approach optimizing software (else you might just be spinning around wasting time). I recommend reading at least lecture 1 and looking at the references on the website.

As for specific optimizations, it requires context of what software are you trying to optimize and under what circumstances. A lot of the times you are going to see that the answer to asking if certain optimizations are worth the effort is going to be 'it depends'

lallysingh
0 replies
1d12h

It really depends on what level you're working on improving. It's effectively queues all the way down, but programmers hate reading statistics. E.g. a server process is a series of queues between your TCP socket to your process to your disk, CPU's reorder buffer, and scheduler.

You have three areas to study:

1. Measurement - makes you define the performance you're looking for and measure it. Until you do this it's mostly a bullshit "make people stop complaining about performance" errand that's too wishy washy to do with more than a few stabs in the dark. With containers and decent capture of samples of your load, a benchmark is pretty straightforward to set up.

2. Modeling - these models are usually little more than measured rates and latencies applied to Little's Law. Pocket-calculator math is often good enough. At worst, an M/M/1 queue.

3. Instrumentation - Figuring out how to attribute your computer's resources (memory, CPU time, iops, etc) to different parts of your code. Tracing libraries, Linux perf, and ebpf can be useful here.

There are a decent number of computers performance books. I like the ones by Jain (great, but AFAICT out of print) and Harchol-Baltar. For work, you shouldn't read them straight through but iterate through parts as you better understand the problem you're trying to solve and start choosing strategies. For the tactical side. Brendon Gregg (sp?) has some decent measurement tool books. Figure out what you want to improve and how to measure that. Then start attributing the existing performance to implementation choices that you can control. Then control those choices (e.g. change algorithm, load balance better, make design trade-offs) to improve performance.

keeperofdakeys
0 replies
1d8h

I'd recommend learning how to instrument and measure the performance of your code. I find most performance issues are (mostly) situations you didn't and couldn't anticipate. So instead of preventing them, learning to investigate and fix them is key. (Shout out to Brendan and his Linux Performance page https://www.brendangregg.com/linuxperf.html).

Second there is an important engineering lesson to learn. Often there are many performance issues, with only a few acting as serious bottlenecks. Additionally sometimes the solutions to performance issues add complexity, but as an engineer you want to avoid complexity. Engineering effort is usually limited, so there is always a question of whether a performance issue needs to be fixed now or left till later.

Here is a quick example to illustrate my point. pgAdmin is a webui program to interact with PostgreSQL databases, allowing you to remotely run queries. Part of its operations fetches information about columns in a result set, in one version this code ran one query per column sequentially. So c columns, each a synchronous query to the server - almost instant on a local database with a small number of columns. However with 400 columns, and a 40ms internet link, it ended up taking at least 400*40=16 seconds to complete. In 99% of cases this code works just fine, but in a few less obvious scenarios its runtime balloons.

Another example; what happens if all the daily scheduled jobs run at the same time? https://github.com/go-acme/lego/issues/1656

kamikaz1k
0 replies
1d4h

1. Computer Enhance by Casey Muratori (check out some of his YouTube videos if you want a preview)

2. Read the blog posts people wrote about the 1BR challenge where they tried to figure out the fastest way to process 1 billion rows of data

3. Brendan Gregg‘s blog

4. Google/YouTube how to profile code in your desired language

jsenn
0 replies
1d5h

As you can tell from the diversity of responses here it really depends on what you're doing. In my work I use C++, and "optimization" typically involves making a heavy computation run faster (measured in wall clock time) or making a particular subsystem use less memory.

The number one most important thing you can do is dive in and start profiling real-world code. Find a part of your software that is too slow or uses too many resources, and use whatever the standard profiler is for your development environment to figure out why. Performance optimization is a very empirical discipline. Yes there are general principles, but if you don't measure your baseline or your changes you won't know how good your optimization was. In my experience, the first attempt at a fix is often flat-out wrong! Doing this first will also help motivate your reading.

Once you know how to measure the performance of your software, I recommend learning the basics of modern computer architecture. At a minimum, learn about CPU caches, how they work, and how to design your code to use them effectively. I find Algorithms for Modern Hardware to be a good resource for this [1], but there are many others. Relatedly, you should have a rough idea of how long it takes for your computer to do various basic things (fetch something from memory, fetch something from cache, etc.). There's a table at [2] that gives a good idea. Don't worry too much about the absolute values--the order of magnitude is what's important.

You should also study fundamental data structures, but understand that for low-level programming 95% of the time the correct answer will be to shove everything into a simple flat array (e.g. std::vector in C++), maybe with some sort of index on top. Fancy data structures are more important in higher-level languages that are structurally unable to make effective use of modern hardware.

[1] https://en.algorithmica.org/hpc/

[2] https://gist.github.com/jboner/2841832

joshspankit
0 replies
1d2h

Suggestion: Program some (slow) microcontrollers as a hobby.

Go multi-core because async is an important optimization skillset, but other than that just build some things.

I live and breathe optimizations (it feels almost as satisfying to me as driving fast) and as an example recently I created an 11-board (one for each channel) wifi-presence-detection system in a busy wifi area and there was literally no way it was going to work without optimization. From communication protocol to having to be strict about every byte of memory, it’s working with the first principles that built the entire industry.

jkoudys
0 replies
1d5h

Many have covered specific learning material, but my best advice is to find a mentor. It really is a skill that's best learned by apprenticing. I new a lot of concepts and could muddle my way through them, but it wasn't until I had experienced people to work under directly that my skills really took off.

Just like the best advice on learning to write code is to write code, the best way to learn how to optimize performance is to optimize performance.

hesdeadjim
0 replies
1d1h

Huge topic, what are you trying to optimize? What language(s), hardware, etc.

Optimizing games sends you deep down a fun rabbit hole, but that will be very different than trying to optimize a Go backend server.

easyas124
0 replies
1d12h

Understand how the software works, and how computers work in general. You have to understand the system before you can a) understand how it's slow, and b) how to make it faster. If you can tell us what, specifically, you need to optimize, we can recommend more specific techniques.

Or get a job you can handle idk.

dboreham
0 replies
1d4h

The Nike doctrine works: just do it. Besides that I recommend always ask yourself the question: we told the computer to do X, and it took too long, so what was it doing for that time? The rough answer can come from surprisingly simple sources such as "top". Fancy, intrusive tools such as traditional profilers are often not the best first place to look for answers. If X is some short one-off thing that makes it hard to see what's happening: make it do 1M of X so bulk data can be observed.

darksim905
0 replies
10h11m

performance optimization of -what- ?

chainingsolid
0 replies
1d4h

Here's 2 more links. I didn't see already posted worth watching/reading. Should give a good intro. Within 3 hours combined (For CPU performance anyway).

A good talk, doesn't go deep and instead goes a bit wide. https://www.youtube.com/watch?v=6RlloT_6WxA

This one explains the black magic the CPU makes have been doing. If you going to be optimizing code you should know your hardware. https://www.lighterra.com/papers/modernmicroprocessors/

Aditional note: I've noticed C++ conventions have a habit of having performance related talks, YT is your friend.

boberoni
0 replies
1d13h

This is an area that is new to me, but a big part of my new job.

Can you tell me more about what your new job is, without releasing anything sensitive?

If you are running applications on Linux containers in the cloud, then I would recommend Brendan Gregg's blog and books (https://www.brendangregg.com/overview.html). He does a lot of knowledge sharing from his experiences at Netflix.

atoav
0 replies
1d10h

To be honest I think the best way to learn about it is to develope for resource constrained environments. E.g. when you use 99% of your embedded MCUs code memory and another static string for a label shown on screen stops your code from compiling you will optimize code.

andai
0 replies
1d5h

The most interesting thing I've learned in this regard (from Casey Muratori) is non-pessimization. Non pessimization means don't make the computer do unnecessary work. Just write the simplest code that does the thing. Unfortunately almost no software is written like that.

amadio
0 replies
1d9h

For me, one of the biggest leaps in how I think about performance was when I learned about the Top-Down Micro-Architecture Analysis Method, by Ahmad Yasin from Intel. You can learn the main ideas from himself in the video below:

https://youtu.be/kjufVhyuV_A

The idea to classify cycles into front-end bound, backend bound, bad speculation or memory bound is brilliant. Once you know which one your program suffers from, it's easy to know what can be done to improve things.

SleepyMyroslav
0 replies
1d21h

If you need to organize your thoughts on what measurements are and how at least some profiling tools work you can pick up a book or two. I would recommend for example [1]. It is a bit heavy on C++ side but you can complement it with something relevant to your job's language.

If you want one bit of advice on optimization, I can try one: follow your app architecture closely. This is where data structures that hold all of the important data live and this is what limits what is possible to achieve on performance. A lot of learning is narrowly focused to specific micro optimization techniques leaving big picture as an exercise.

1 Fedor Pikus, The Art of Writing Efficient Programs

Qwertious
0 replies
1d

What are some good resources for learning about performance optimization?

I swear, nobody actually read OP's post. The top five comments are "here's some personal advice about general rules of thumb, without any links to actual resources!"

Kudos to levodelellis, whose post is at #6 root comment and contain some links and drops some names.

DougN7
0 replies
1d12h

Look into the write ups of the One Billion Row Challenge - you’ll see lots of techniques.

AtNightWeCode
0 replies
1d5h

The difficult thing is to benchmark the software correctly and evaluate the impact of a change. Most of the examples on the Internet are useless micro-optimizations. I evaluated a program some time ago that did several speed tricks. But the reason it was slow was because it reread a file on each iteration in a loop.

101008
0 replies
1d2h

I think this thread is old enough to ask something like this, but instead of learning performance optimizatoin, is there a way to be hired to work on this? I am fascinated and I always loved when I had to optimize something (mostly code in my experience, algorithms, etc), but that's only a very small percentage of the work I do.