Dynamic programming is not black magic

The name "Dynamic Programming" might seem out of place because it doesn't come from programming as a discipline. In this case, it actually refers to something like optimization, similar to linear programming. Dynamic programming is basically a method you can use to solve decision problems with discrete time, i.e picking the optimal sequence {a_t} in order to maximize \sum_t u_t(a_t) (plus constraints). The "dynamic programming" is defining a value function V* where V*(t) = max_{a_t}{ u_t(a_t) + V*(t-1) } which greatly reduces the dimensionality of the optimization problem.

In fact, the official line [0] on where the name comes from is quite funny. I shall quote my favourite part:

[Dynamic] also has a very interesting property as an adjective, and that is it's impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to.

[0] https://en.wikipedia.org/wiki/Dynamic_programming#History

My dad told me a joke/story once about one of his old psychologist colleagues(call him Dave) arguing with a psychoanalyst* about some psychology thing.

After a while of the discussion going nowhere, the psychoanalyst said something like this. "Not to worry, Dave, I know you're dynamically minded enough that you'll eventually learn to agree with me."

Dave was not pleased.

I guess that was more condescending than pejorative, but oh well.

(Most modern clinical psychologists consider psychoanalysis to be a pseduoscience. A small minority still practice it even clinically. They don't like eachother very much)

Care to offer evidence of pseudoscience status? All I could find was debunkings of this claim as myth and outdated, and to my understanding research in the field has caught wind in past decades. I’d love to learn more about the debate, so any pointers are welcome.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6020924/

https://cosmonautmag.com/2021/10/on-the-scientific-status-of...

” Depending on where your philosophical allegiances lie, Karl Popper might be a friend or an enemy to whatever version of philosophy of science you accept. Falsificationist approaches to science imply that psychoanalysis makes untestable and unfalsifiable claims! Popper made the very same claims about Marxist social theory as well. There are two issues with this approach. One, if falsificationism is true, then all social sciences, and even some natural sciences, are pseudo-scientific. This is an unacceptable conclusion since much of social science is clearly science. Lots of cutting-edge physics whose models do not lend themselves to controlled experimental testing would also lose their status as science. That is also an absurd conclusion. Newtonian mechanics, for example, would never have been accepted as a theory if we use Popper’s standards of theory corroboration and falsifiability. The theory offered better ways of explaining phenomenon that previous theories were already decent at predicting reliably. The process of legitimizing Newtonian physics had nothing to do with testability.4 A good theory of science does not rule out obvious cases of scientific rigor.5”

The burden of proof lies on them that claim it’s a “real” science.

It appears to me the burden of proof lies on whoever wants to draw the line on what ”real” science is. Only natural sciences, then?

https://en.wikipedia.org/wiki/No_true_Scotsman

Rejecting an assertion because you don't like it's conclusion is pretty specious. I fully endorse the notion that any practice which makes unfalsifiable claims is not science. But that doesn't necessarily make such a practice pseudo science -- this is a false dichotomy. There are disciplines which are neither scientific nor pseudo-scientific.

Well hang on, let me go find my logic book from college to decipher...

https://en.wikipedia.org/wiki/Psychoanalysis#Debate_over_sta...

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5459228/

https://www.bbvaopenmind.com/en/science/research/psychoanaly...

https://link.springer.com/referenceworkentry/10.1007/978-94-...

https://www.skeptic.org.uk/2021/07/psychoanalysis-science-or...

https://philarchive.org/archive/FERIPA-6

That quote you included seems a little funny. It’s not making any direct or compelling argument in favor of psychoanalysis being science, it’s just saying well if psychoanalysis isn’t science then other disciplines aren’t science either, and since that’s sometimes not true, then Popper is wrong. It’s a slippery argument with some assumptions and big holes, and perhaps most glaringly it is intentionally ignoring the magnitude or degree of scientific experimentation, and attempting to frame the issue as only binary: science or not science.

I’m sure there is some modern psychoanalysis that is scientific, but OTOH it seems like the foundations of psychoanalysis, especially Freud, are certainly problematic from the perspective of science, right? Psychoanalysis has a valid, earned reputation that may take a very long time to fix, even if today’s practitioners are being careful and scientific. This could be compared to chiropractic medicine - some of it is valid medicine today, but it definitely came from non-medical, non-scientific origins.

There appear to be at least 2 separate arguments going. One is whether psychoanalysis can be a science, and the other is whether it actually was a science historically. This is why both sides are somewhat right: proponents of psychoanalysis argue that it can be scientific, which is true, and opponents argue that it wasn’t scientific in origin and has a troubled past, which is also true.

The issue for me is that the popular claim against psychoanalysis seems to be exactly that, binary.

Thanks for bringing nuance and providing quotes!

I’ve personally benefited tremendously from work with the ego structure and from realizing I can strenghen my will/ego capacities and learn to more and more discern and deny excessive power from my superego - from past learnt protective and restrictive impulses that no longer serve a purpose. This understanding alone seems like a treasure trove that keeps on giving year after year while doing introspective work and self inquiry with others interested in the work.

However I have little understanding how this fits in a modern understanding of psychoanalysis or psychodynamic psychotherapy.

What I do know is that CBT I did younger didn’t seem to have nearly sufficient explanatory powers to help me personally. Most of the skills I was offered seemed more or less trivial or perhaps were taught ineffectively.

Another thing that I would have needed was tapping into the resources of my body in the context of therapy and how that links to having capacity to work with myself. CBT seemed obsessed with, umh, overly just cognition.

Interesting quote, since my mind immediately went to Critical theory, one of the schools of philosophy probably targetted by Popper.

Over the last 200 years, the definition of science has been deliberately narrowed down a lot, mainly to combat pseudo-science and misinformation, but I think partially also to create an in-circle of academics. There are few good definitions of science, and usually they are a bit self-contradictory or insufficient to capture the underlying goal of science; the building of a corpus of verified knowledge. Falsification is a very high standard, that cannot always be applied, but it is still a good first test to filter out possible bullshit.

In deductive studies (formal sciences but also often in other fields) we need to have acceptable axioms, and rigorous deductions from these axioms. In inductive studies, you need good data and valid methods to derive meaning from that data. A lot of research actually falls somewhere in-between, and we wouldn't want to outright dismiss it, as many subjects would be too elusive to study at all, but we still have a vested interest in understanding them.

A lot of research, and I would count psycho-analysis in there for the most part, kind of walks the line between pseudo-science and proper deductive or inductive studies. These fields definetely build a corpus of knowledge, which often turns out to be true, but there are fewer good measures of filtering the bullshit from the nuggets of truth, and that substantially devalues the field. The risk, not exclusive but especially pronounced, is that acceptance of research becomes more a measure of eloquence than depth of enquiry.

And fields can be collectively wrong, even in the more rigorous fields that lend themselves to proper testing or deductive reasoning: You mention Newtonian Physics, but there are too many examples to enumerate: Quantum superposition was extremely controversial when first proposed, and poor Lobachevsky was ruthlessly mocked for his perfectly valid development of hyperbolic geometry.

IMO, modern psychology “walks the line between pseudo-science and proper deductive or inductive studies”. Psychoanalysis at best sometimes stumbles in the right direction.

Demarcation problem is unique to Popper's philosophy of science. That problem doesn't exist in other philosophies of science. The best attack on psychoanalysis comes from the late philosopher of science Adolf Grunbaum's "The Foundations of Psychoanalysis: A philosophical critique"

I'm not a psychologist myself, and I know nothing about modern academic psychoanalysis. I've tried to read some but it was indecipherable to me.

Modern clinical psychoanalysis is a strange field. It went out of vogue with the advent of behavioral and later cognitive psychology in the latter half of the 20th century, as well as psychiatry. Psychoanalysts tend to believe their way is the only way, while more modern psychologists are much more eclectic. They're often extremely critical of the use of medication even in cases where the evidence supporting their use is overwhelming, like stimulants for ADHD.

Psychoanalysts have their own strange nomenclature that's often incompatible with modern developments. So it's hard for other psychologists to even talk to them at all.

Those are some of the reasons psychoanalysis is viewed as pseudoscientific by modern clinical psychologists.

As I understand it, the notion of falsifiability is really more like, "Theory X is acceptable if it can be shown to be false by the use of tools whose principles meet the same standard of falsifiability, even if those tools aren't currently available."

So a theory that can be tested only with later scientific refinements -- say, by increasing measurement precision beyond what's currently available -- is indeed eligible to be classified as science. That criterion would allow for Newtonian mechanics to be accepted in its time, and for things like string theory to be accepted provisionally in ours.

Basically, the ultimate failure of the Newtonian model can't be used as an argument that it should never have been considered valid science.

> After a while of the discussion going nowhere, the psychoanalyst said something like this. "Not to worry, Dave, I know you're dynamically minded enough that you'll eventually learn to agree with me."

It looks like psychoanalyst suffered from a professional deformation. Not enough evidence to be sure, but it is the main premise of psychoanalysis that psychoanalyst knows better then his "patient". A psychoanalyst knows what his patient feels, thinks, wishes, and so on. If patient doesn't agree then it is a "resistance" and overcoming it is the first priority task for a psychoanalyst.

> I guess that was more condescending than pejorative, but oh well.

The whole statement is condescending, but not because of "dynamically minded". It is "you'll learn to agree with me" that does the trick.

This is a caricature of clinical psychoanalysis and is more applicable to other therapies. For example, the initial moments of CBT are "psychoeducation", where the therapist sketches out with magic markers a bunch of categories (feelings, thoughts, behaviors, emotions) and their subcategories, and the patient learns to see themselves in these ontological terms. Suggestibility is the cornerstone of these interventions.

Every psychoanalyst of course has their theoretical meta-psychology, but every clinical psychoanalyst knows that they must fight to abandon it at the door as they confront every analysand anew. Psychoanalysis is not a matter of instructing the patient of the death drive, of resistance, of the Oedipus complex, and so on. (Freud was wrong about why his intervention with Little Hans "succeeded".) Freud himself searched for something beyond hypnosis because the hypnotic technique required a susceptibility of the patient to suggestion, and furthermore required a constant relationship with the patient, otherwise the patient would invariably suffer a symptom relapse. Relate this to the dismal long-term rates of regression of CBT patients. (And this is the forbidden evidence of this "evidence based therapy".)

The analysand enters the analytical relationship supposing (as you have supposed) that the analyst knows something crucial that the analysand does not. So what happens as the analysand comes to realize that the analyst does not, in fact, possess this knowledge? The analyst indeed knows they don't know, but this is presumably nothing like what you assume the analyst supposes themselves to know.

The analyst is, in their role as an analyst (and not a therapist), merely the secretary of the analysand. They "take notes" on what the analyst says, in order to help show the analyst what they've said.

Dynamic typing comes to mind ;)

Dynamic scoping ;)

You cannot get any more pejorative. Mission accomplished!

Dynamic ethics.

It _is_ a funny quote.

But I would still point out that this is the exact reason “dynamic” does not help if you do not know about the history. Since it can be applied to anything, it does not help you trim down what it can refer to.

Scripting languages are called 'dynamic' sometimes pejoratively. Dynamic microphones are also inferior in studio recording situations. I could see a poor DP implementation wasting memory being described negatively.

Most of the time I hear the term being used by other people (not you :) ), I feel it's for showing off and look smarter than everybody else - "Look, I used dynamic programming to solve that problem", when in fact they were just employing what I'd consider the natural and intuitive approach for solving the problem. There was nothing they actually "used", besides happening to identify that the problem could be broken into progressively smaller subproblems.

Do you feel similarly if someone says they used an iterative, recursive, or greedy algorithm?

Dynamic programming is a whole chapter in most algorithms books. It's not about showing off, it's the name of the technique.

As I interpret the GP, the person is peacocking or gate-keeping by (humble)bragging about using dynamic programming to solve a problem. For all we know, they Googled for an efficient algorithm and copied the result. I have done it before, and I have no shame about it. If a teammate asks me how I knew about that dynamic programming algorithm, I would reply: "Are you joking? I could never program that myself. Thank you Google." Except for a few algorithms, most are solved using iterative or recursive.

Well aren’t you cynical.

happening to identify that the problem could be broken into progressively smaller subproblems.

Which is what dynamic programming is about. And no, not everyone is capable to do that, especially since not all problems solved at work are Leetcode.

Sometimes people really have to spend hours or days to understand how to split a problem. Only then optimizations can be applied or become "obvious".

Like usual, solutions are "so obvious" when someone has done a lot of heavy lifting to simplify the problem, improve the context, etc.

“Optimization” is similarly misleading (as someone who once took a CS class on “optimization” expecting something very different ;)).

Optimisation is just function minimisation/maximisation. What did you expect and how was it different?

Not speaking for the GP but I'd imagine "optimizing" for performance. I had a professor who would get very irritated when anyone mentioned "optimizing" a program for just this reason, since you weren't finding the best possible runtime.

Oh, makes sense, performance optimisation didn’t cross my mind for some reason.

Compare also 'Linear Programming'. Or the usage of a 'TV programme' or a 'musical programme'.

Yeah the "programming" part has always thrown me off when I started my first Engineering job one of the guys was always talking about an "LP" model I thought it was some deep piece of black magic until I realized it was essentially brute searching for a solution to a series of simultaneous equations.

I think "Linear Optimization" might be a better term, or maybe "Linear Solving".

Modern Linear Programming solvers are a lot more advanced than 'brute searching'. But otherwise, you are right.

That’s the programming part. As others have shared, the “Dynamic” part was made up by Bellman so it sounded important and politicians wouldn’t try to cut his funding. It’s often applied to dynamical systems, but that’s a coincidence.

You forgot to define u and t.

Here, u is presumably utility and t is time

I think utility (of what, why?) doesn't make sense in this context.

If you go further back, "programming" describes it precisely, and what we call "programming" today is "writing code" which can be subdivided into different kinds of "programming" such as functional, declarative, procedural, etc. But there's a lot more that fits under that umbrella.

It's interesting how computing things - like optimization problems - used to be much more dominant in terms of what people thought about and did with computers. It feels like most of the time we are just doing data storage and retrieval combined with networking... some computations are embedded within those things, but usually well encapsulated.

This is the correct definition of dp.

It's good that the article points out that DP algorithms are "just" clever ways to cache a recursion. Looking for a recursive solution is certainly the best way to start out looking for a DP solution, in my experience -- if you can find one, memoising it is trivial and may give a big speedup (and may even be faster than a "bottom-up" DP, since it only ever computes solutions that we definitely need).

With enough practice, it's usually possible to come up with a (correct but slow) recursive solution. When turning this into a DP, it doesn't matter if there are large numbers of subproblems in the call tree -- what's important is that there are a relatively small number of distinct subproblems. (Since there's no point caching a result that's only ever needed one time.) And that's where the difficulty tends to lie: Figuring out how to partition the original problem into few enough distinct subproblems.

The best analogy I have here is that saying "DP is just caching/memoization" is like saying "investment is just buying stuff and selling them later." Regardless of how technically accurate it might be, it misses such massive complexities and difficulties in the topic that it just makes you look silly rather than insightful.

Would you prefer someone start learning the principles of investing from an initial premise of “investment is just spending money to make money” or “investment is just buying something for less than it will be worth in the future”

Both are technically correct and both statements are vast oversimplifications of a complex subject, however the second statement leads into more advanced topics naturally, creating a smoother progression of knowledge that builds on itself.

Every complex topic generally starts with a few foundational bits of knowledge that support the whole house of cards. Pick the wrong foundational bits and you won’t get as far.

I certainly don't disagree that there should be a solid foundation here; I just think this is the wrong foundation, for what I believe to be good reasons (some of which I'll get to below) - mainly, it confuses the concept with the implementation mechanics.

Instead... I find it much more productive to start teaching this with the fundamentals of problem-solving techniques, as follows:

- One method for solving complex problems is to divide them into separate subproblems that are simpler to solve. We call this "divide-and-conquer", as you've seen in [blah, e.g. mergesort]. Observe that this technique decomposes your subproblem structure into a tree. (Draw a picture.)

- Another (more advanced) method for solving complex problems is to find subproblems that might potentially overlap, and then find a clever way to combine their solutions into the final answer. An example here is finding the shortest N-hop path by finding the shortest (N-1)-hop paths, and picking out the shortest of those to find the shortest N-hop path. Observe that this technique decomposes your subproblem structure into a DAG form. (Draw a picture.)

Take a moment to read the above... notice the lack of any mention of caching/memoization/tables, etc.? That's not an accident. It's because those are optimizations to DP, not the core idea! Before you can even start talking about "how do I solve this quickly", the question you need to answer is "what systematic method can I use to solve this at all?" The central idea there is the decomposition of the problem into a tree vs. DAG structure. Once that sinks in, the question of how to make it faster (or whether to use recursion vs. iteration) is secondary, and also far more obvious to figure out on your own.

Moreover, this also lets the student realize that you can (sometimes) speed up divide-and-conquer with memoization, too. Which also hammers home the point that such an optimization is very much distinct from the technique itself!

I think the point of GP is that saying it's basically “smart caching”, or “filling the cache in the correct order” helps connect with a concept many people are already very familiar with, which grounds the concept immediately. In contrast, going through each step you have outlined introduces several new concepts that are generally new and counter-intuitive to most people, so many will just give up.

In contrast, going through each step you have outlined introduces several new concepts that are generally new and counter-intuitive to most people

What's new there? The concept of a DAG? I dare say it's important to learn DAGs before learning DP. Trying to teach them in the opposite order is like trying to teach multiplication before addition.

(I also have no idea what you find counter-intuitive here. How is solving complex problems by solving simpler problems counter-intuitive? Do you normally solve hard problems by solving harder problems?!)

Please, take the perspective of an average computer science student, who might have had some interest in computers, but maybe did not look too deep in the theory. In the past few years, they just had to learn graphs, automata and Turing machines, complexity classes, computer architecture, compilation theory, and possibly a programming language they never used before, maybe two. And they might prefer having social life than doing all-nighters on computer-assisted proof assignments :-) .

In short, I am not saying you cannot learn dynamic programming straight away from the theory, just that you are going to lose many people with this approach.

In fact, it makes me think of “New Math” [1], an effort to teach math from the ground-up starting in junior high. I am not familiar with how it worked in the US, but at least in France, it was definitely not a success. I would definitely have had a lot of fun, but many more did not, and failed to get a basic mathematical education at all as a result.

[1] https://en.wikipedia.org/wiki/New_Math

The traditional way of teaching dynamic programming worked well in the traditional CS curriculum that had a lot more mathematics than today. It leveraged what the students had already learned at that point.

Take the definition of the problem. Transform it into an equivalent definition that looks like the inductive step in a proof by induction. Look at the partial order defined by the dependencies between the subproblems in the inductive step. Solve the subproblems in an order that is consistent with the partial order. (For bonus points, find an order that is asymptotically faster in easy situations, such as when the edit distance is small. You may have to rule out paths that cannot be part of an optimal solution.)

That's first or second year mathematics in the traditional curriculum.

Today, and actually since the 90s or maybe even earlier, many CS students have no particular interest in mathematics. Classes that rely on mathematics have to spend less time on the actual content, as they have to cover the prerequisites and find alternate ways of explaining things.

I think my own education followed the “traditional” way of learning computer science, albeit after years of tinkering with computers and programming as a kid. But I often felt that some concepts were much simpler when you made the connection with something concrete.

Of course, you would not use analogy for formal proofs, but it helped build a mental model incrementally instead of having to conjure various concepts out of thin air and draw connections between them.

Would you mind pinpointing where exactly you feel I would lose students in the above approach? Would I lose people at the mention of "DAG"? in which case... isn't it easy enough to jog their memory in a few seconds ("it's like a tree, except a node can have multiple parents" <drawing>) if they've forgotten what that is? Or is it with the N-hop example? In which case, aren't there plenty of simpler ones (like Fibonacci) you can use instead? Or do you see somewhere else where people would get overwhelmed, and if so, where?

Like, I would understand where you're coming from if I'd used the pumping lemma or something like you're imagining, but that's not what's happening here? All I did was merely mention the words "subproblems", "trees", and "DAGs"... that's it. No theorems, no proofs, not even any need to remember what caching is... just descriptions with vivid examples and diagrams. All of which you can re-explain in a few minutes if they've forgotten them. I have a hard time seeing why that should scare someone away who's otherwise in a position to learn DP.

Also, isn't the audience kinda important here? It's not like this was intended for 10-year-olds like New Math. It was intended for college-level CS students and above. They can and should be expected to have a greater understanding (and attention span) than elementary school students.

The point is that dynamic programming builds on various abstract concepts that many college students only understand to a level deep enough to pass exams. Making the connection with caching, which is a mostly-unrelated, and much more concrete concept, give them the basis to build a mental framework. Sure, they might do without, but why make it harder?

And how do you find "subproblems that might potentially overlap"? You go through the space and cache results of your calculations.

I don't think there was a DP problem in my university days that couldn't be solved this way. Are there problems that can't? Maybe, I don't know. From what I've seen you sometimes need a more clever caching but that's about it. The question is why do we teach a very simple, intuitive concept in a way that makes a lot of people who grasp all the pre-requisites struggle with it.

And how do you find "subproblems that might potentially overlap"? You go through the space and cache results of your calculations.

I addressed this directly in my comment? At the risk of repeating it: you don't have to even think about caching anything in order to find "subproblems that might potentially overlap". The example I gave directly in my comment illustrated this vividly: the shortest N-hop path from A to B is the shortest of all (N-1)-hop paths after all the first hops you can possibly traverse from A. Is it clear what the subproblems are? Yes, they're the (N-1)-hop subproblems. Is it clear why they might overlap? Yes, because after a couple hops you might end up in the same location. Was caching relevant to understanding the aforementioned? No, it didn't even need to cross your mind - the point was the problem decomposition. Caching is just a generic optimization you can apply later to DP, just like you can to divide-and-conquer (and to other stuff).

This is a widespread misconception: thinking of dynamic programming as just a form of memoized recursion is not the way to learn DP because it makes it extremely difficult to understand how to do the style of DP problems that involve filling out a 2D array.

For example, look at the "best time to buy and sell stock" series on leetcode: https://leetcode.com/problems/best-time-to-buy-and-sell-stoc....

These are much more naturally done by filling out an array, right? I've never done them with a recursion; I can't say I've thought hard about it -- is there a natural recursive solution?

(I linked to iii above but for anyone who hasn't tried them they are a great intro to DP problems; start with the first one: https://leetcode.com/problems/best-time-to-buy-and-sell-stoc...)

The point is that you don't have to exactly fill the 2D array to solve the problem in that way. The 2D array is an optimization in this view, and can be safely replaced with a cache without breaking the correctness. Of course there is also some learned techniques specific to dynamic programming, and that makes it worthy to learn at some point because otherwise you will never think of them, but at its core dynamic programming is just a specific way of doing recursion.

OK, but, those are solved in just a few lines of code with 2D arrays. I'm not convinced it's helpful to approach them as recursions.

Also, anyone who thinks they understand how to solve DP problems on leetcode because they understand how to memoize a fibonacci recursion is in for a rather large disappintment.

Fibonacci recursion is a bad example for DP because it is obvious how to do that. You need to teach a generative recursion, as pointed out by Shriram Krishnamurthi [1]. Once you've got a hang about a generative recursion DP is a space optimization on top of that.

[1] https://parentheticallyspeaking.org/articles/how-not-to-teac...

I agree with that article that recursion is better introduced with relevant data structures than with Fibonacci. And I agree that Fibonacci is also too simple to explain dynamic programming, which is why I showed how it works with edit distance, and they with AoC 2023-12-18.

Yes, there is a natural way to solve these recursively. Here is an example: https://leetcode.com/problems/best-time-to-buy-and-sell-stoc...

Not the best runtime (I could use a multidimensional array instead of an hash map for the cache, and recursion is "slow" in Python), but it's a clear solution.

Given a recursive solution, it is also "trivial" to convert it into filling an array by visiting the recursive states in reverse topological order. So here is a submission that fills in the array: https://leetcode.com/problems/best-time-to-buy-and-sell-stoc...

DP algorithms are "just" clever ways to cache a recursion

It does't work all the time like that since caching increases the Space Complexity.

There are DP problems that can be solved without storing all possible values in an array.

The issue I have with not connecting dynamic programming with caching is that it becomes an exercise in cleverness, and many people just give up. It was pretty fun in school, but if I need colleagues to ramp up on it, I need a more effective approach. Sure, they might not grok the full theory right away, but they won't use it at all, and definitely won't get to the theory, if they think it's too abstract for them.

DP algorithms are "just" clever ways to cache a recursion

This is what made it click to me. When I was in university, perhaps because of the prevalence of procedural programming at the time (as opposed to FP), DP looked magic to me with the bottom-up tabularization examples in the textbook.

Practically that is the way to do it because tail-call elimination isn't always applicable, but I wish they's taught us the more intuitive way to look at it first (top-down, cache a recursion)

The bottom-up tabularization isn't always necessary, but it's a key technique that is not just "avoiding overhead" or whatever. Doing the top-down memoization, you don't know which subproblems you need to store at any given, but going bottom-up, you do, and you can usually be much more clever about memory usage and overhead.

The obvious example is Fibonacci numbers: the "top down" memoized version needs to store every Fibonacci number in memory, because you don't know what number the recursion is asking for. But if you do it "bottom up", you know that to calculate f(N), you only need f(N-2) and f(N-1). Therefore, you only need to store the last two numbers, reducing your memory usage from O(n) to O(1).

This principle almost always applies in DP problems: for the "edit distance" problem, the number possible values for the function is N^2 (where N is the size of the two words you need to compare, assuming they're similar sizes), so the memoized version takes O(N^2) memory. But if you do it the "bottom up" version, you realize that to calculate all values in a row, you just need the previous row of values. So it takes memory from O(N^2) to O(N).

Point being: turning the recursion "upside down" is not just a thing you do to be clever (or to avoid "function calling overhead", something like that), it has very real algorithmic benefits, usually because it takes MUCH less memory to do it like that.

I think their point is that naive caching attempts still result in duplicate work, and if you start by viewing the problem recursively then DP is a clever insight allowing you to avoid that duplicate work. You're right; it's not _just_ function calling overhead or anything minor; it often represents major Big-O improvements. I think that's in alignment with what they're saying.

I like your insight about Fibonacci needing less state than a naive DP would suggest. Diving into that a bit, DP implementations represent a DAG. More importantly, for all the problems that easily fit into DP that DAG has small numbers of "local" connections -- you can partition it into at least O(some_meaningful_dimension) many topologically sorted levels where each level only depends on a small, known, finite number of previous levels (often 1-2), not deeper DAG nodes. A clever choice of order of operations then looks for a min-cut on that graph, proceeding through those levels. Fib has at most O(n) levels and at least O(n) state for naive DP, so a clever implementation has O(n/n)=O(1) state. Edit distance has at most O(n) levels and at least O(n^2) state, so a clever implementation has O(n^2/n)=O(n) state. I did something similar for some DP probability calculation [0], and it reduces state from quadratic to linear.

Fibonacci is, slightly pedantically, one of the worst algorithms to analyze this way. Only the smallest inputs actually fit in machine integers, and the exponential increase in the size of any bigints involved really puts a damper on a naive Big-O analysis. Fib mod 2^64 (or some biggish prime) fits into the spirit of things better.

[0] https://github.com/hmusgrave/zdps

Regarding Fibonacci, the recursive formula is much more efficient anyway! Oh, not _that_ recursive formula. That one: [1], from [2]. Basically, it allows you to implement in O(log n) arithmetic operations.

[1] https://wikimedia.org/api/rest_v1/media/math/render/svg/64b9...

[2] https://en.wikipedia.org/wiki/Fibonacci_sequence#Matrix_form

Only under the word-RAM model of computation though. The phrase "arithmetic operation" is doing a lot of heavy lifting. On a real computer, the output size of Fib is O(2^n), requiring O(n) bits of space, and O(n) assembly instructions (at a minimum, and for other reasons that minimum can be attained). The exponential nature of the problem additionally means you hit that Big-O curve for extraordinarily small inputs (as opposed to, e.g., a hash table where technically access isn't O(1) but nobody cares).

Separately though, that's a fun trick :)

That's why I phrased it that way. But since you mentioned working modulo 2^64, I thought that would count :-).

It's useful to make it explicit anyway for people who might miss that, so thanks!

oic. I missed that nuance. Thanks for the clarification.

Again, neat formula, happy to see your comments :)

Agree. When I first learned the subject, my first thought was:

As fancy as that feature is, it should be called array memoization, or call stack memoization.

The term "dynamic programming" should be reserved for something better. IMO.

I'm sure you know this (since in my experience the history of the name is usually taught together with the technique), but for those reading along: the name "dynamic programming" originates from the late forties, early fifties. While the "office politics"-related motivation Richard Bellman gives for the name cannot be completely true due to anachronisms[0], it still fits with terms that made sense at the time, before the modern usage of "programming" existed:

The word dynamic was chosen by Bellman to capture the time-varying aspect of the problems (...). The word programming referred to the use of the method to find an optimal program, in the sense of a military schedule for training or logistics. This usage is the same as that in the phrases linear programming and mathematical programming, a synonym for mathematical optimization.

I fully agree that would be nice if we relegated "dynamic programming" to a confusing historical umbrella term, and switched to less confusing names, like the ones you suggested. In that sense it is similar to "trie" - it's a funny pun, but incredibly impractical for verbal communication especially. It would be so much better if we all decided to just use "prefix tree" or "prefix search tree" and avoid the ambiguity.

[0] https://en.wikipedia.org/wiki/Dynamic_programming#History

what's important is that there are a relatively small number of distinct subproblems.

This is the crucial part IMO. Whether the larger algorithm is recursive or iterative etc is secondary. But yes DP usually tends to show up in recursive algorithms most often.

I remember learning about the knapsack problem and trying out recursive ways to solve it as well.

https://github.com/dsego/codility/blob/main/knapsack.js

Would it be wrong to just think "memoization" when you hear "dynamic programming"? The missing part may be intelligently breaking up the problem to use memoization?

There are dynamic programming solutions not based on memoization. See for example finding the longest common substring between two strings. Memoization will not help as much because you only need the table's left and up cells once.

In general, if the problem's sub-problems have a lot of overlap and the optimal subproblem must be part of the optimal overall solution, you've got an opportunity for dynamic programming. Saying memoization is the only kind of dynamic programming is like saying hash tables are the only abstract data type.

The left of the up is the up of the left.

Yeah, I had a momentary misgiving when I hit send and thought (_well, the cell is used twice_) but in practice it is still very different than what would be considered memoization. You can make it work with just two rows at a time, forgetting all the calculated values of the earlier rows. You can also do it in one row if you work backwards in each row. If anything, it's a cache with a very specific eviction policy.

I had a whole other paragraph where I nearly convinced myself it's the same here for LCS so I'm just going to stop myself here. If you look up any texts that describe LCS they don't refer to it as using memoization. Let's not confuse things.

If you look up any texts that describe LCS they don't refer to it as using memoization.

I challenge that actually, several resources online consider it memoization (I would too): https://www.geeksforgeeks.org/longest-common-subsequence-dp-...

I had in mind a Data Structures and Algorithms text like what is used to teach comp sci courses with. But I'm not here to denigrate the credibility of Geeks for Geeks. If we're going to cite arbitrary web sites as sources, let's refer to Wikipedia's definition:

""" An optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again. """

In LCS you are not saving the result of calling a substring method for (str1, str2, i, j) or any similar recursive method. You're keeping a constant-sized association of index pairs and their accumulated score.

If we're getting technical, I would say that what LCS uses is referred to as tabulation.

It is though, it's just a matter of perspective. The recursive method would look like this:

    str1 = ...
    str2 = ...
    def lcs(i, j):
      if i <= 0 or j <= 0:
        return 0
      res = max(score(i-1, j), score(i, j-1))
      if (str1[i] == str2[j]):
        res = max(res, score(i-1, j-1) + 1)
      return res

Now if you memoize this and make it iterative, you end up with your DP solution.

Except you've reformulated LCS as a top-down function in order to be convenient for your argument. The implementation is bottom-up not top-down.

That's exactly what DP is, every DP formulation of a recursive function is bottom up vs. top down (eg. think of the steps you need to do to get from recursive Fibonacci to DP, or knapsack). LCS is not special in that regard

My point is that the naive approach to the problem doesn't even need to use recursion, nor does the final solution, even if the path towards reasoning about the complexity calls on recursion. I prefer a narrower scope in definition for memoization, but I accept that there are many who would prefer the broader definition. I think perhaps there is even a grey area around the term, and cache, and working memory, and more. Is it no longer memoising if you're using it to build an index table? Is a cache implementation of an @memoize function decorator still memoising? IDK, it's not my intention to split hairs here.

For what it's worth, I reread the 1975 paper about the linear-space version of LCS, they do not mention memoization, even though recursion is used in both the algorithm pseudocode and in the complexity analysis. The term memoisation was coined in 1968.

https://dl.acm.org/doi/10.1145/360825.360861

Memoization is a more general technique. It's often little more than caching the results you happened to compute, in case you need them again in the future.

Dynamic programming is systematic memoization. You solve subproblems of increasing size, until you reach a solution to the full problem. "Inductive algorithm" would be kind of appropriate term, as the typical dynamic programming algorithm is effectively a proof by induction. Unfortunately that term already has other meanings.

And "dynamic programming" is an insanely general term that leetcode enthusiasts have pigeonholed into meaning memoized reduction. It can describe dynamic dispatch, code morphing/rewriting, dynamic loading, etc.

I would argue "memoization", while still a bad term, is far more specific and descriptive of what's happening than "dynamic programming".

The term "Dynamic programming" comes from mathematics of the 1940s before modern programming languages even existed. It's about a certain way to do optimization. This is also how it is used by "leetcode enthusiasts" so imo they are using it correctly.

Nowadays, you could naturally confuse dynamic programming with the things you have listed.

This is also how it is used by "leetcode enthusiasts" so imo they are using it correctly.

I never said they used it incorrectly. In fact, inversely, I tacitly acknowledged the fact that it was used correctly. I said they pigeonholed a generalized phrase into a specific use case. E.g. a word/phrase that can be used to describe many similar but distinct concepts in the same field, as you yourself acknowledge.

If you're going to take umbrage with someone's words, it's better not to misunderstand them or misphrase them.

Sorry I was not clear enough here. I wanted to just add some historical context.

My point is that dynamic programming was not "pigeonholed" because of "leetcode enthusiasts" but rather they are just using the original meaning. Modern programming languages refering to "dynamic" things in various circumstances made the term confusing so you would have to blame modern programming languages rather than "leecode enthusiasts".

Sorry, I wasn't clear enough. The term is perfectly fine, it's simply too generalized. So if other's want to use other terms to better describe it, I support that; no matter who was first. Otherwise we end up in another "systems programming" situation.

My offhanded remark about "leetcode enthusiasts" is about people trying to strongarm/gatekeep a, frankly, way too generalized phrase to mean a very specific thing. They can call it that all they want. It's a correct phrase for that. Just don't get mad when someone else refers to "code morphing" as "dynamic programming".

Modern programming languages refering to "dynamic" things in various circumstances made the term confusing so you would have to blame modern programming languages rather than "leecode enthusiasts".

Modern programmers and engineers used the word dictated for the functionality by the language they communicate in. You're just further reinforcing my point that the phrase is too generalized.

Or, to give you an analogy that might finally make it click. It would be like if I called all "string metric" problems "string programming".

You're just wrong when you say > "dynamic programming" is an insanely general term

It is not. It is a specific technique to solve a number of problems having a certain structure. In the continuous case it leads to the Hamilton-Jacobi-Bellman partial differential equation, in the discrete case it leads to the Bellman equation discussed above. The latter, in turn, gives rise to a specific and efficient algorithm.

Code morphing has nothing to do with dynamic programming (even though it might well be programming, and might well be dynamic).

Exactly! I have already linked it in another thread but see this link below for the mathematical definition of dynamic programming:

https://web.mit.edu/dimitrib/www/DP_Slides.pdf

Dynamic programming is systematic memoization.

Exactly! This should have been clarified in the op.

To my approach to dynamic programming, memoization is step 2 of 3.

1- find a recursive algorithm

2- memoization

3- make it iterative / bottom-up

3b- if possible, optimize for space

Step 3 is the most characteristic part of dynamic programming, but if you stopped at step 2, it would still qualify, I think, but it would not be as efficient as it could be.

Another way to think of step 3 would be: memoization is about caching, is there a way to pre-fill the cache?

This makes it clear why dynamic programming is not easy. The problem occurs already in step 1 -- we naturally tend to think in loops, not recursions. So before you find a recursive algorithm, you probably come up with a loop algorithm first. Then you have to try to convert the loop into recursion, then go back to loops that use caching. And hope that the result is more efficient than the loop algorithm you started with.

Speak for yourself, there are dozens (dozens!) of us for which recursion is the more natural way to think about most algorithms. Joking aside, DP is very straightforward in more functional languages like Haskell where recursion is the default way to do iteration anyway.

And I think people don't like functional programming languages because they don't like recursion. Even cookbooks use loops!

Indeed, to me, step 1 is the hardest. Not only you need a recursive algorithm, but you also need an algorithm where memoization is possible.

Step 2 and 3 are more systematic. That is, if you understand the Fibonacci sequence example in the article, you can probably do all of the problems. It still requires training, especially if you are doing competitive programming, but it is always essentially the same thing.

Finding the algorithm is the tricky part as there is no set formula.

That's why the Fibonacci sequence is a good introduction. You know where you are starting from: part 1 is trivial as the formula is given to you. And you know where you are going, as you probably already have a good idea on how you are going to implement it iteratively. So you won't get lost in the problem solving part, which may be hard and requires a lot of attention, instead focusing on the basic technique. Once well understood, it is time to solve the actual problems, where step 1 is not trivial and where intuition is not enough for finding the final solution.

Where I have the most trouble is usually going from recursive to iterative. It's pretty easy for me to understand the recursive + memoization, but when that ends up being taken off the table due to various inefficiencies of recursion I often get a little stuck.

Yes it would be wrong in my book. First i think a obvious counter example of memoization can be used outside of dynamic programing. Otherwise you can do most dynamic programming algorithms with storing the results in a table, and searching the table afterwards for the best answer. Memoization is basically a strategy to speed up algorithms.

First i think a obvious counter example of memoization can be used outside of dynamic programing

This isn't relevant, the quesion is whether there are dynamic programming solutions which do not involve memoization.

Well, dynamic programming also tells you when you can drop your 'memoization' caches.

That's exactly how I teach dynamic programming. First solve it recursively, then add memoization (I call that top-down).

Then notice that recursion and memoization has some overhead, and construct the table from bottom-up, remove the recursive call, and voila, dynamic programming.

This sounds like a promising teaching technique. Do you have a slide deck to share? I am sure HN crowd would be interested to read.

It's in TFA...

I had a very good algorithms professor. He studied at UCLA. His classes about dynamic programming were superb. He started with a problem for which the naive solution was simple and had exponential complexity. Then he broke the problem into smaller problems and reduced the complexity to something polynomial. Then he applied memoisation and the complexity dropped to linear.

I really would like to remember the problems he used.

Did he study under Kang at UCLA?

I don't know, but I know he studied there in late 90's or early 2000's.

I did as well in the 90s - I studied algorithms, which included dynamic programming, with Kang, who was recognized as an amazing algorithms teacher.

Maybe Eggert? He's also the chief maintainer of emacs and Linux tzdata.

1. *Fibonacci Sequence*: The classic example where the naive recursive solution has exponential complexity. By storing previously computed values (memoization), the complexity can be reduced to linear.

2. *Coin Change Problem*: Given different denominations of coins and a total amount, finding the number of ways to make the change. The naive approach is exponential, but dynamic programming reduces it to polynomial complexity.

3. *Knapsack Problem*: Particularly the 0/1 Knapsack problem, where items with given weights and values must be placed in a knapsack of a fixed capacity to maximize total value. The naive exponential solution can be optimized using dynamic programming.

4. *Matrix Chain Multiplication*: Determining the most efficient way to multiply a chain of matrices. The problem can be solved in exponential time using a naive approach but becomes much more efficient with dynamic programming.

5. *Longest Common Subsequence*: Finding the longest subsequence common to two sequences. A classic dynamic programming problem that can be solved in polynomial time.

6. *Longest Increasing Subsequence*: Finding the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.

7. *Shortest Path Problems*: Like the Floyd-Warshall algorithm for finding the shortest paths in a weighted graph with positive or negative edge weights.

8. *Edit Distance (Levenshtein Distance)*: Finding the minimum number of edits (insertions, deletions, substitutions) needed to change one word into another.

I am a big fan of the Knight Dialer. I wrote an article on it and how you can actually use graph theory to reduce it to an incredibly efficient 4x4 matrix. was super fun.

I just want to add

9. Longest Path in DAGs: Find a longest path in a directed acyclic graph.

10. Weighted Independent Set on a Path: Given an array of integers, compute the maximum sum of numbers provided that you may not take two consecutive cells.

I have listed a few in the article. It's pretty common to see them in lectures and practical exercices.

- longest common subsequence

- longest common substring

- line warp

- subset sum

- partition

- knapsack

You can also have a look at [1] for more ideas.

[1] https://en.wikipedia.org/wiki/Dynamic_programming#Algorithms...

In addition to the suggested problems others posted, perhaps it was a scheduling problem? Like, for example, scheduling N overlapping events in time, like course schedules or processes for a CPU. Generally this would be done to optimize something like throughput - I believe that when you start adding special requirements, like "These 2 courses must be taken together", then things get much more complicated and intractable compared to plain-old dynamic programming.

And many common algorithms are actually just the application of dynamic programming to specific problems, including omnipresent path-finding algorithms such as Dijkstra’s algorithm.

Dijkstra's algorithm is an application of dynamic programming? I disagree. In dynamic programming, you tabulate the subproblems to solve, with static interdependencies between them leading to straightforward orders in which to solve the subproblems. In Dijkstra's algorithm, you need to compute the shortest path from s to each vertex, but the order in which you have to visit the vertices is only discovered along the way using a priority queue, so the subproblem interdependencies are not known ahead of time until you have actually solved the problem.

Dijkstra is definitely dynamic programming, no doubt about it, you are still computing a new state from neighbouring state nodes while ignoring substantial number of non-neighbouring states.

You just have to accept the more abstract definition of "state" where the state encodes subsets of the graph. The states in Dijkstra are represented by subgraphs and distances that incrementally include more nodes leaving us with a path of states to follow in some order.

It's not much different from the traveling salesman or viterbi in that sense. You come up with some topological order of states in abstract state-space and then follow that topological order to compute the new state based on the adjacent previously computed states and never look back to update already finalised states.

With this more abstract point of view, it's clear Dijsktra is dynamic programming.

There is a whole field of graph dynamic programming problems. And the closely related Markov chain problems.

You come up with some topological order of states in abstract state-space

If the state space is "subsets of V", then it's exponentially larger than the set of states actually visited in Dijkstra's algorithm. Dijkstra's algorithm has an invariant that the vertices visited have a smaller distance than the vertices not visited. For vertex sets that adhere to this invariant, there's clearly a topological order, but for arbitrary subsets of V I don't see how this topological order would be defined.

I guess my gripe is that in my view, the framework of dynamic programming is not a useful way to analyze algorithms that explore a small set of states in an exponentially larger state space.

The state space here is subsets of V, but not all possible subsets and not exponential. The entire state is represented by the priority queue at each step which is bounded by the number of nodes and edges. And each state represented (encoded) by the priority queue transitions to exactly one new state (by popping the top node from the queue and adding the neighbours). And the number of steps is also bounded by the number of edges and nodes in V.

Most importantly, at every step of the algorithm, we only need to preserve the previous node in the state space (the previous state of the priority queue) to compute the new state (pop the top and add the dirty neighbours) and we can forget all previous states that existed before. We don't even need to keep track of the visited nodes, although if we did, that wouldn't change anything.

I think what you're missing is that DP is not an algorithm in itself, but rather a theorem. I also didn't knew about this for a long time. Quoting from [1, §4]

  > It should be stressed, here and elsewhere, that the DP functional equation
  > does not constitute an algorithm. It merely stipulates a certain property
  > that function f defined in (3) must satisfy. Indeed, in the context of Theorem
  > 2 the functional equation constitutes a necessary optimality condition. This
  > point is sometime not appreciated by students in their first encounter with DP

so that is actually happening in Dijkstra's algorithm is that the DP optimality condition is being reached with a sequence of clever approximations. Another common way to reach the DP optimality condition is by recursively exploring everything with a cache, this is how compsci people usually learn about DP, and of course this can get exponential and does not apply well to big problmems. Though, both are dynamic programming at their core.

  > in my view, the framework of dynamic programming is not a useful way to analyze algorithms that explore a small set of states in an exponentially larger state space

Well, since DP is a theorem I kinda agree, it is not how one should think about it when solving day-to-day problems. But formally DP is the reason why these algorithms work in the first place.

[1]: http://matwbn.icm.edu.pl/ksiazki/cc/cc35/cc3536.pdf

I totally agree in that I use the same mental model. But, if you look at it as “the shortest path must be the shortest path through one of its neighbors”, it can actually be classified as a dynamic programming algorithm!

https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Dynamic...

In dynamic programming, the problem can be solved by solving subproblems, and those subproblems are solved by solving subsubproblems, and there is overlap between these subproblems. This allows us to solve DP problems in two ways, either by recursion with memoization or by iterative table filling.

Although the shortest path problem has some kind of "optimal substructure", the recursive memoized approach doesn't work because there's no set order in which the subproblems can be solved. Instead, you need to compute the shortest paths in order of shortest path length, and the shortest path lengths aren't given ahead of time - those are exactly what Dijkstra's algorithm computes!

It's not enough to call it dynamic programming that "the shortest path must be the shortest path through one of its neighbors", because this fact doesn't immediately lead to an acyclic subproblem dependency graph.

Shortest path on an acyclic graph, and longest path on an acyclic graph, are two problems that can be solved with dynamic programming - but Dijkstra's algorithms solves shortest paths on a different class of graphs that doesn't lend itself to DP.

I'm a bit lost in this terminology. But coming from the Operations Research perspective that gave Dynamic Programming its name, Dijkstra's Algorithm is very clearly Dynamic Programming. It's Forward Dynamic Programming as opposed to the much more common Backward Dynamic Programming, if that helps any.

Not quite, actually Dijkstra is a special case of what are called label correcting methods / algorithms (LCA) [1], which come directly from applying dynamic programming to the shortest path problem. LCA apply to all finite graphs with non-negative cycles and to be specific Dijkstra's algorithm is a LCA where in step one the node to be removed is choosen according to min_{i \in OPEN} d_i, see [1] for more.

[1]: https://web.mit.edu/dimitrib/www/DP_Slides.pdf (see lecture 3)

I don't find DP that difficult to be called Black Magic. Tree algorithms on the other hand, is much harder

Do you mean tree rebalancing algorithms? I have to agree with this. AVL tree insertion is fine enough, but it gets hairy when you get to deletion. And Red-Black trees…

There are more than that, even finding diameter of a tree is pretty nasty

Ah yes, but I use Rust, I cannot go back up a tree (-:

No graphs either :(

Only nasty to find diameter of general graphs, right? If you know you have a tree, just root it at a random vertex and recursively compute `max(largest diameter of children, sum of the two biggest heights of children)`

Once you relax the invariants a bit, it becomes much easier to delete from your reddish-blackish trees :)

If you don't have to implement deletion, things are already a lot easier. And if you decide to implement a persistent red-black tree then they can be downright easy, even without relaxed invariants.

Relaxed invariants:

https://en.wikipedia.org/wiki/AA_tree

https://en.wikipedia.org/wiki/Left-leaning_red%E2%80%93black...

---

Years ago, I played around with red-black trees and I figured that I could relax the invariants and make the code a lot simpler -- and maybe get slightly worse theoretical performance and quite likely slightly better practical performance for small trees. I looked around for other people's ideas along the same lines and found AA trees, which didn't quite please me. A few years later, Sedgewick's left-leaning red-black trees came out.

I would probably have found them myself (+ some other related ideas) if I had continued to play around + systematically tried different relaxations. But I didn't, so I didn't.

I never heard of these, thanks for the links!

I don't understand why you would use memoization for fibonacci ever and hence its relevance for dynamic programming. It can be solved with a tail recursive function with three input parameters. Solution left to the reader as an exercise.

But how to you systematically deduce the tail recursion version?

I feel like, in this case, the recursivity in definition of Fibonacci and the recursivity in the tail recursion is just a coincidence, with the second just being a contrived way to write the loop you get after applying dynamic programming and trimming the table to only the last two elements.

But how to you systematically deduce the tail recursion version?

Here's one way: https://gist.github.com/tmoertel/5798134

Thanks! If I understand correctly, you would use that method, while substituting tail recursion in place of the look. However, I feel there is a lot of magic in https://gist.github.com/tmoertel/5798134#file-gistfile1-py-L....

Memoization generalizes much better than accumulator parameters.

I don't understand why you would use tail recursive functions for fibonacci ever and hence its relevance for dynamic programming. It can be solved with a simple equation with one input parameter. Solution left to OP as an exercise.

Because it's a good example to teach dynamic programming?

This year’s Advent of Code has been brutal (compare the stats of 2023 with that of 2022, especially day 1 part 1 vs. day 1 part 2).

I enjoyed completing AoC this year. While it was very clear that day 1 (esp. part 2) was significantly harder than previous years (I wrote about this among other things [0]), OP's claim seemed not obviously self evident when comparing _current_ 2022 stats to _current_ 2023 stats, as folks have had an additional full year to complete the 2022 puzzles.

I grabbed the 2022 stats from Jan 14th 2023 [1] and, indeed, the difference is quite stark. Graphing the part two completion stats[2] for both years, there was a relatively similar starting cohort size on day 1, but 2023 looks clearly harder than 2022 up until day 15. As OP observes, the ratio[3] of folks completing pt1 but not going on to complete pt 2 is way higher for a lot of days in 2023 and suggests the day 5, 10, 12 and especially day 22 part 2s were particularly difficult.

[0] https://blog.singleton.io/posts/2024-01-02-advent-of-code-20...

[1] https://web.archive.org/web/20230114172513/https://adventofc...

[2] https://blog.singleton.io/static/imgs-aoc23/completion.png

[3] https://blog.singleton.io/static/imgs-aoc23/ratios.png

Thanks for the details. To add to this discussion, I have a script to see the progression over the days.

Looking at the last two columns, you can see how brutal 2023 was compared to 2022. Especially in the beginning. The first few days, most people keep playing, with a retention higher than 80% most days, and virtually everyone people solve both parts. In contrast, only 76% of people solved part 2 after solving part 1. And many people gave up on days 3 and 5.

Interestingly, the last few days are not that much lower. And that can be explained by the fact that AoC 2023 is more recent than AoC 2022, like you said. My interpretation is that this group of people will get over all the challenges regardless of the difficulty (to an extent, of course), while many other people will give up when they realize it will take too much of their time.

    Stats for year 2022 of Advent of Code
    -------------------------------------
    
    Day   Both puzzles   One puzzle       Total   Rel. puzzle 1/2   Rel. day before
      1        280,838       15,047     295,885              95 %             100 %
      2        232,752       12,403     245,155              95 %              83 %
      3        200,016       11,392     211,408              95 %              86 %
      4        184,435        3,734     188,169              98 %              92 %
      5        157,392        3,116     160,508              98 %              85 %
      6        155,921        1,602     157,523              99 %              99 %
      7        113,241        2,592     115,833              98 %              73 %
      8        107,224        7,659     114,883              93 %              95 %
      9         82,414       11,449      93,863              88 %              77 %
     10         85,075        5,511      90,586              94 %             103 %
     11         68,838        9,258      78,096              88 %              81 %
     12         59,253        1,061      60,314              98 %              86 %
     13         51,512        1,220      52,732              98 %              87 %
     14         49,051          991      50,042              98 %              95 %
     15         39,677        5,773      45,450              87 %              81 %
     16         23,298        5,650      28,948              80 %              59 %
     17         21,525        6,237      27,762              78 %              92 %
     18         25,420        4,927      30,347              84 %             118 %
     19         17,516          928      18,444              95 %              69 %
     20         22,141        1,003      23,144              96 %             126 %
     21         23,022        3,060      26,082              88 %             104 %
     22         15,393        5,083      20,476              75 %              67 %
     23         18,531          254      18,785              99 %             120 %
     24         16,419          252      16,671              98 %              89 %
     25         13,192        7,473      20,665              64 %              80 %
    ~/src/advent-of-code% ./stats.py 2023
    Stats for year 2023 of Advent of Code
    -------------------------------------
    
    Day   Both puzzles   One puzzle       Total   Rel. puzzle 1/2   Rel. day before
      1        230,737       73,941     304,678              76 %             100 %
      2        196,352        9,256     205,608              95 %              85 %
      3        130,406       19,913     150,319              87 %              66 %
      4        130,271       17,691     147,962              88 %             100 %
      5         80,255       31,029     111,284              72 %              62 %
      6        103,358        1,918     105,276              98 %             129 %
      7         81,905        7,308      89,213              92 %              79 %
      8         74,034       14,707      88,741              83 %              90 %
      9         76,438        1,229      77,667              98 %             103 %
     10         48,313       17,054      65,367              74 %              63 %
     11         57,339        2,386      59,725              96 %             119 %
     12         30,985       14,440      45,425              68 %              54 %
     13         38,217        5,223      43,440              88 %             123 %
     14         36,500        7,457      43,957              83 %              96 %
     15         40,881        4,156      45,037              91 %             112 %
     16         35,347        1,023      36,370              97 %              86 %
     17         24,014        1,097      25,111              96 %              68 %
     18         24,799        4,937      29,736              83 %             103 %
     19         22,525        7,197      29,722              76 %              91 %
     20         18,287        4,398      22,685              81 %              81 %
     21         14,311       10,149      24,460              59 %              78 %
     22         15,830          988      16,818              94 %             111 %
     23         14,562        2,964      17,526              83 %              92 %
     24         11,864        4,918      16,782              71 %              81 %
     25         10,522        3,048      13,570              78 %              89 %

This was just my personal experience (which certainly came from trying out a different language than I typically use in my day to day), but I'd argue that day 1 part 2 wasn't _hard_, but improperly specified from the prompt. The examples given are:

  two1nine
  eightwothree
  abcone2threexyz
  xtwone3four
  4nineeightseven2
  zoneight234
  7pqrstsixteen

There is one critical example missing from this set and you can't exactly just figure out how you're meant to substitute the values without an example like:

  oneight

I didn't get very far into AoC this year as I ran out of time. Maybe I'll pick it up again later.

But my point is, I was surprised at how hard day 5, part 2 was. I didn't give up and solved it, but went away wondering whey I'd missed something obvious and overcomplicated it. So it brings some relief to know it was 'supposed" to be a bit challenging!

Early AoC was fun, you could get away without anything fancy until late in the game. Then it got harder, not fun, so I gave up and stopped touching it.

All problems solved using DP can also be solved using global optimization techniques. A much easier one to implement in an interview than most DP solutions are simple genetic algorithms. Yes, DPs really do solve any silly interview problem and even have several advantages to boot (i.e. partial solutions if stopped before they're finished).

Anyone interviewing you who won't give you a pass for solving the leetcode problem the "wrong way" without strong very strong justifications is a fool who themselves shouldn't be working in tech.

Genetic algorithms are not deterministic, whereas a DP solution is going to be deterministic with very easily calculable bounds on execution time. Comparing the two as you do, in my opinion, shows a fundamental gap in algorithms understanding.

But they didn't say 'use LLMs... in Rust' :) Points for that.

Did you encounter practical problems where genetic algorithms work well? So far, the only serious usage I did was for CodinGame's Mars Lander optimization problem (and it works pretty well there!).

Awaiting “but you’ll never need it at your job” crowd.

Do people use it much in their jobs?

I can't remember ever actually using it other than for puzzle-solving (like the excellent Advent of Code problems). I guess it's pretty common to use leet-code type problems for interviews, which maybe counts, but even there hopefully most employers aren't clueless enough to give people DP problems.

I've done some stuff around generating puzzles/levels/challenges for games that maybe could have used it, but it seemed like problems it might be useful for never came up for me... maybe because I found that you get better, funner puzzles and can control the difficulty better if the generator sort of "thinks like" a human solver/player would.

No doubt there are some edge cases, but how much are people really using DP at a job?

You don't use all them algorithms explicitly. When have you last done a sort?

However if you know them, you have an intuitive understanding of what your libraries do and are less likely to end up on accidentally quadratic.

https://web.archive.org/web/20240114111200/https://qsantos.f...

Since it got the hug of death

Effective web caching is black magic.

Thanks! I should have planned better for that before posting.

Dynamic programming is what allowed me to compute the number of legal Go positions [1,2], a 171 digit number.

While the naive method would take time 3^n^2 to examine all possible positions on an nxn board, dynamic programming effectively strips away one dimension, reducing time to O(n^5 * 5.4^n), while using space O(n * 5.4^n).

[1] https://tromp.github.io/go/legal.html

[2] https://tromp.github.io/go/gostate.pdf

Interesting, I have never really looked into this kind of calculation. Thanks for the links!

Dynamic Programming is not Black Magic

But who said otherwise?

The No True Scots Straw Man?

I wonder how much mechanical sympathy the Dynamic programming (DP) has as opposed to brute force or other simpler techniques. On paper it seems like a smart technique, in reality, I have rarely seen that being used in critical production systems. In fact, anything recursion-esque is eschewed. Can someone comment on their experience in implementing DP in critical production systems or beyond the scope of programming contest, academia.

In general, it has great mechanical sympathy. You need to split your larger problem into smaller sub-problems, and then figure out which sub-problems depend on each other using the recursive formulation. Once you have done that, there should be no more need to have a stack, so you can just use an array to store the results of the depended on problems, and a loop to change which problem is being solved.

This article doesn't make the final step from recursion to loop but you would probably want to if you were doing DP in production.

Edit: Actually the article does make this jump. See the last code snippet before the conclusions section. Note also the second note after the snippet, which explains that the snippet itself uses a larger array than is necessary.

I find it disappointing that the "coming up with a recurrence formula" part is always glanced over, which to me is very much the hard part.

I understand! I do not think there is a magic formula for this. This only comes with a lot of practice.

One cool application of dynamic programming is the pairwise alignment of nucleotide/protein sequences:

* https://en.wikipedia.org/wiki/Sequence_alignment

* https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algor...

* https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorit...

I would argue that these are some of the most important algorithms in bioinformatics/biology. They have a wide range of applications.

What bothers me a bit, is that in example where we trying to find Levenstein distance between two string are we 100% sure that calculated distance from end of string to the start will result in the same value as when we calculate distance from start to end?

That's a very good question.

I think the simplest approach to prove it would be to show that both computations yield the global minimum. Since there is only one global minimum, then they must be equal.

“Bootstrap” is an imaged expression to point to the absurdity and impossibility of a task

This is an archaic definition that I didn’t know, and it’s fairly interesting. At the bottom of the opening section on the history of “bootstrap” is this comment:

“Critics have observed that the phrase is used to portray unfair situations as far more meritocratic than they really are.[8][9][7] A 2009 study found that 77% of Americans believe that wealth is often the result of hard work.[10] Various studies have found that the main predictor of future wealth is not IQ or hard work, but initial wealth.[7][11]”

Interesting (to me, anyway) because it used “meritocratic” which itself is a word that was coined with a somewhat different meaning than it has today. Meritocracy was originally used to point out that merit itself is an outcome of social advantages, not inherent skill.

https://reagle.org/joseph/pelican/social/the-surprising-soci...

Amusingly, merit’s other meaning is the verb “to be worthy.”

Would recommend DPV Algorithms book and Georgia Techs lectures on udacity for graduate algorithms. The way to master dynamic programming is - practice practice practice solving problems…

Consider https://www.amazon.com/Dynamic-Programming-Coding-Interviews... for the most pedagogical approach I have seen.

I really like how this article exposes the problem recursively first then progressively adds caching, and finally reduces the size of the cache to only what is necessary.

I have often made the mistake of trying to get to the dynamic programming solution directly, and either got stuck or had to go through heroic efforts to get it working. I think from now on, I will force myself to go through the steps in order.

From my own experience, being taught dynamic programming straight way makes it more of a puzzle. By going through the steps and explaining _why_ we are using a table, and connecting the concept to caching, I feel like it makes much more sense.

Origin of the name "dynamic programming", from its inventor, Richard Bellman:

"I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, ‘Where did the name, dynamic programming, come from?’

The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence.

You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose?

In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense.

Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities”.

Source:

https://alliance.seas.upenn.edu/~cis520/dynamic/2021/wiki/in...

I think that CS classes also teach it really poorly. I did not understand it at all until I took an optimal control class and it instantly made sense

As a primarily self-taught programmer, when I was first looking for a job, after college if I had been asked on an interview to use dynamic programming I would have had no idea what that was. Thankfully that never happened to me. But I was familiar with the technique, and used it in multiple interviews.

Dynamic programming is not a black magic, _proving it can be dynamically programmed and their correctness_ is. You need to use mathematical induction to formally proof it just like with greedy algorithms which are very counterintuitive (take for example, greedy coloring) when I learned them both in uni.

https://archive.vn/dynNQ

(qsantos does not load for me)

Reminds me of how the mathematician behind DP, Richard Bellman, purposely chose the name "Dynamic Programming" because it sounded impressive and non-mathematical so that he'd be more likely to get government funding.

memoize is not "terrible academic vernacular"

Most of Dynamic programming is just a method of reducing computational complexity by changing the noun objects in first-order logic (or second-order logic, advanced version) to walk through the answers of unfinished tasks using completed tasks. Only in very few cases is it necessary to extract and match the completed parts from the unfinished objects in the above process, which often involves optimizing a function f(A,B). However, most of the time, this process is futile.

Cannot establish database connection… It probably is…

It's black magic bcs ppl don't want to admit they don't understand recursion lol. Fibonacci has always been a bad example to teach recursion. That and towers of hanoi are the greatest failure in teaching CS

If you are seeking black magic, dynamic optimality might be more your speed.

Good rant about BS terminology. He should've included "responsive" Web pages.

I like the short forumlation to explain it: "Dynamic Programming is approximately recursion + memoization + guessing"

Emacs's infamous "Ultra-hot screen management package" with its "Skull and Crossbones" warning was definitely black magic:

https://news.ycombinator.com/item?id=33450034

James Gosling's Emacs screen redisplay algorithm also used similar "dynamic programming techniques" to compute the minimal cost path through a cost matrix of string edit operations (the costs depended i.e. on the number of characters to draw, length of the escape codes to insert/delete lines/characters, padding for slow terminals, etc).

https://en.wikipedia.org/wiki/Gosling_Emacs

A redisplay algorithm, by James Gosling (ACM SIGPLAN Notices, April 1981):

https://donhopkins.com/home/documents/EmacsRedisplayAlgorith...

https://donhopkins.com/home/archive/emacs/mw/display.c

https://donhopkins.com/home/archive/emacs/skull-and-crossbon...

                         /-------------\ 
                        /               \ 
                       /                 \ 
                      /                   \ 
                      |   XXXX     XXXX   | 
                      |   XXXX     XXXX   | 
                      |   XXX       XXX   | 
                      \         X         / 
                       --\     XXX     /-- 
                        | |    XXX    | | 
                        | |           | | 
                        | I I I I I I I | 
                        |  I I I I I I  | 
                         \             / 
                          --         -- 
                            \-------/ 
                    XXX                    XXX 
                   XXXXX                  XXXXX 
                   XXXXXXXXX         XXXXXXXXXX 
                          XXXXX   XXXXX 
                             XXXXXXX 
                          XXXXX   XXXXX 
                   XXXXXXXXX         XXXXXXXXXX 
                   XXXXX                  XXXXX 
                    XXX                    XXX 

                          ************** 
                          *  BEWARE!!  * 
                          ************** 

                        All ye who enter here: 
                    Most of the code in this module 
                       is twisted beyond belief! 

                           Tread carefully. 

                    If you think you understand it, 
                              You Don't, 
                            So Look Again.

Any benchmarks for implementing caching in fib in a lang with a good compiler?

A long time ago I worked on a mathlib in C/C++ and when reviewing the assembler output it often did some really clever things.