For several years I have worked primarily with performance optimizations in the context of video games (and previously in the context of surgical simulation). This differs subtly from optimization in certain other areas, so I figured I'd add my own perspective to this already excellent comment section.
1. First and foremost: measure early, measure often. It's been said so often and it still needs repeating. In fact, the more you know about performance the easier it can be to fall into the trap of not measuring enough. Measuring will show exactly where you need to focus your efforts. It will also tell you without question whether your work has actually lead to an improvement, and to what degree.
2. The easiest way to make things go faster is to do less work. Use a more efficient algorithm, refactor code to eliminate unnecessary operations, move repeated work outside of loops. There are many flavours, but very often the biggest performance boosts are gained by simply solving the same problem through fewer instructions.
3. Understand the performance characteristics of your system. Is your application CPU bound, GPU compute bound, memory bound? If you don't know this you could make the code ten times as fast without gaining a single ms because the system is still stuck waiting for a memory transfer. On the flip side, if you know your system is busy waiting for memory, perhaps you can move computations to this spot to leverage this free work? This is particularly important in shader optimizations (latency hiding).
4. Solve a different problem! You can very often optimize your program by redefining your problem. Perhaps you are using the optimal algorithm for the problem as defined. But what does the end user really need? Often there are very similar but much easier problems which are equivalent for all practical purposes. Sometimes because the complexity lies in special cases which can be avoided or because there's a cheap approximation which gives sufficient accuracy. This happens especially often in graphics programming where the end goal is often to give an impression that you've calculated something.
Note "faster" is not the only thing to optimize for, and it (wall clock time) is actually a bit unusual as it doesn't represent an exhaustible resource.
That is, if you have a rarely used slow part of the system, that might make it seem unimportant, but not if running it uses all the disk space or drains your phone battery.
Even if that doesn't happen, there are things you can optimize in the unimportant parts - you can optimize them getting out of the way of the rest of the system, like by having smaller code size and not stomping all over caches.
That is very true! I simply think of it first because it is often the biggest problem at my work. One of the extreme exceptions was optimising Wavetale for the Nintendo Switch, where we had to decrease memory usage from over 20GiB to below 3GiB.
Not in the context of game development, however! There you typically don't care about wall time. Instead, you work towards a set frame rate meaning you have a set slice of time (usually 16.7 or 33.3 ms) to go through the entire game and render loop each time.
I was about to bring a similar example: Some of our really old daily data processing at work might need some attention and optimization in the future, because we're running out of hours in the day. And we haven't found a place we can buy more hours in the day from yet.
If you can run the daily batch processing in parallel, can't you have one batch start at T+0, the next one starts at T+24h, then the first one finishes at T+28h and so on?
That leads to an infinite backlog no ? If you need more than 24h to process 24h of data ?
That may depend on the context and data but you may end the first job at T+28 (runtime of 28 hours) and the second at T+52 (28 hours as well, started at T+24).
If jobs must be executed one after another, then you absolutely create an infinite backlog.
Sadly, the resource constraints/setup prevent us from parallelizing this. And the customers of that system expect the data to be processed and available after 24 hours.
In part, it's a somewhat rewarding topic. Thinking about queries, joining a bit differently, adding another index based on new data patterns can cut hours of runtime without incurring further resource cost.
But on the other hand, it's yet another project someone dumped on the floor and we were forced to adopt it "because of the customer". And the second or third project of PD trying to "do it right" is teetering on failure once again. Cron running shell scripts held together with chicken wire and duct tape is too strong of a stack I guess.
In realtime audio programming, for example, the time budget can be as low as 1.3 ms (64 samples @ 48kHz). And every single missed deadline will manifest as an ugly pop which you will try to avoid at all cost.
This is so true and so often ignored: there can be optimization for space (RAM and or ROM), energy, security, robustness, etc. often they oppose or compete somehow with speed.
I definitely agree with this one, especially on the level of "don't make network calls in your hot loop".
But it should be noted that less efficient algorithms that access memory in a more efficient way (that is to say, get a higher percentage of cache hits when iterating the data) can beat more efficient algorithms that get more cache misses.
That's all to say, iterating over an array is very vast, and iterating through a map is not. Is the map faster if you only need to access a couple things? Well, depends on the dataset size, as well as the constant factor in your map access, and how much of an improvement your other algorithm is.
You should definitely measure that once you've made the change though.
Great thing for appealing to what performs better, and then code in that manner.
Simplicity is often the best because necessary complexity will arrive on its own.
This is a great rule of thumb. I've seen junior engineers (and even senior in some cases) try to parallelize existing solutions before first optimizing the single threaded case.
Agree with all this, and would add that concrete measurements not only tell you whether your work has actually lead to an improvement, but also your manager and promo committee ;)