"There was just one catch: You can't do recursion."
You could do tail recursion, because only the return address of the first call would be needed to be stored. `branch_with_link` would be used for the initial call, but the recursive calls would have to be regular branches.
Most recursion is about as bad as goto coding methodologies.
Personally, I think it should be avoided in modern compiled languages too.
Haskell is fun to play around on, but it is horrific to consider using in a stable production environment. =)
Outside of a university course, if I see recursion in a non-FP language I consider it a code-smell
This opinion is totally wild to me.
Do you never work with tree data structures?
I can't think of a non-trivial program I've written in the past two decades that didn't have some recursive tree traversal in it.
Recursion in production code is bad news, because you can't control the depth of your call tree.
At some point you will crash because the runtime won't be able to allocate any more stack. And you can't preflight it because if you could preflight it you wouldn't be doing recursion.
Recursion is a nice toy, but in real life it's a ticking time bomb.
This is just bananas. I work in programming languages.
I currently have open in my editor a code formatter that I maintain that uses at least half a dozen recursive algorithms to traverse syntax trees and other data structures. This program is used by almost every user of our language, invoked on every save, and probably executed billions of times a day.
Recursion is fine.
"Recursion is fine [for your use-case]."
In general it is naive, often dangerous, and an inefficient space/time trade-off.
I have been writing software for several decades... does that make one less insightful or more biased?
https://youtu.be/pmu5sRIizdw?feature=shared&t=31
=)
For a fun time, make a nested json blob about 1000 layers deep[1] and in python:
If you parse arbitrary json bytes from the Internet (for example, if you have some public Python API with no auth) then you have given the world a fun little stacktrace generator, and a way to lessen your server room's heating bill.EDIT: btw you can do the same thing in rust's serde_json if the type you're deserializing into somehow supports the nesting, but you'd have to work for it.
EDIT2: the only (decent) way I can think to mitigate this in the python app's case is to enforce a sufficiently restrictive Content-Length at the edge, which obviously isn't possible of you expect blobs O(100Kb) uncompressed. Or you could pre-parse to detect nesting.
[1] EDIT3: on my machine in ipython right now it's 1485:
Parsing untrusted input on a machine where failure could affect someone other than the person who provided the input is definitely a known case to be mindful of.
You'll want to cap the maximum allowed nesting depth. Even if not using recursion, you probably don't want untrusted input to be able to make your stack data structure allocate arbitrary amounts of memory.
If you do put a nesting limit in, you can do that equally well using an actual stack or recursion. (For example, I believe v8 still uses recursive descent for parsing JSON and JavaScript. It detects and handles stack overflows to ensure that untrusted input can't blow the stack and crash. I'm not sure if it's using a hardcoded nesting limit or actually trapping the stack/heap collision somehow.)
Yes, unfortunately in the python case the standard library's json parser has no way to configure a reasonable limit. It'll just consume stack frames until it hits the interpeter's recursion limit. And, as has been mentioned elsewhere, python stack frames are expensive for a variety of reasons.
I've run into similar issues with recursive code in java. The story gets better in C or rust, but there wre still sharp edges. I guess there are sharp edges everywhere though...
You mean like the reply depth limit in forums like YC.
Shouldn't you stick to a depth-first branch reply limit until moving to the next fork.
I want to respect your beliefs. =)
In general, most json/bson parsers like libbson have a rather shallow depth restriction (xerces also constrains this if I recall.)
And yeah, recursion is avoided in some shops for good reasons.
There’s an easy way to figure out if recursion is fine for your usecase
This code terminates only if all child-problems eventually are found not to have subproblems. So either it has infinite subproblems or it returns true. So recursion is good for a use case unless the problem decomposes into infinite subproblems, but in that case this test function never returns false.
Makes more sense to do it as breadth-first with only one return value:
"kernel: Out of memory: Kill process 9163 (Insulin-pump-task) score 511 or sacrifice child"
There are safer hobbies available like snake juggling =)
If you're going to make a generalization, I think recursion is fine, efficient, and rarely dangerous.
There may be use cases where it's insecure or has performance concerns, but those are the exception. Most of the time, it's fine.
We don't generally tell programmers that loops are naive, often dangerous, and risk locking up the program. They certainly can, but most just... don't.
Just like loops, recursion can make code much simpler to write, read, and maintain. Every modern language under the sun supports it for very good reasons. Use it.
For many reasons, people often add nesting limits in recursive structures to at least constrain how bad they behave when something eventually does go wrong.
Parsers are far from immune to these issues.
I think we will have to "agree to disagree" here... =)
Of course you can, if you wanted to, just like you can control the iteration count of a loop. It's not even hard. This is simply a non-issue.
Some algorithms are much more naturally expressed recursively and writing the imperative equivalent with manual stack handling is just annoying. Stack growth is just something you don't have to worry about for almost all scenarios you're likely to encounter.
Well, it's about the tradeoffs right? If I have a recursive algorithm that's growing the stack (assuming no TCO, because few languages people actually use in production support it) I'm trading execution time, space, and reliability for economy of expression. In reverse order:
- reliability: if, as you suggest, I implement some hard depth limit (which is necessary because all the processes which are running concurrently need to not exceed my max stack depth), and assuming generally that things grow over time (more users, more concurrent processes, more recursive calls needed to get the job done) we face two issues. (1) theres a complicated relationship between the maximum number of concurrent processes and the maximum allowable recursion depth. Getting it wrong could crash the entire program! If this is a web server that's means we just killed a whole bunch of connections all at once. (2) eventually, over time, we'll need to raise the recursion limit, which entails rebalancing the concurrency limit. Hard walls like this are bad news in systems. If instead this was implemented iteratively the system would degrade softly as the iteration count grows--each process would take longer to complete, but they'd still all complete. Assuming I'm monitoring process execution time I can predict and respond to this proactively instead of being faced with an emergency where the system is just completely broken.
- space: this is obvious I guess, more stack frames == more memory. The problem is worse in some languages than others.
- time: this may be less obvious, and there may be optimizations which render it false, but generally in my experience iterative code gets pipelined better and runs quicker.
I guess that depends on the situation. I've encountered hard walls and performance issues from recursion enough times in my career thus far that I make the extra effort to avoid it. I could totally see the value, though, in areas where you know ahead of time how the recursion depth will scale over time. More often than not, though, that's unknowable at implementation time so better err on the side of caution.
EDIT: upon re-reading this I think it might have been clearer if I insted wrote "task" every time I wrote "process"--I'm not talking specifically about any OS feature.
I think you're overthinking it. Here are probably the people who need to worry about recursion depth: embedded developers working with limited memory, and developers working on data systems that process huge data sets (and even this is debatable as stack growth is log n with these tree data structures).
My process for deciding when to use iteration or recursion is simple: if each step needs to keep context then use recursion, otherwise use iteration (unless recursion with TCO is the native idiom of course).
I've never had an issue that wasn't an obvious bug that iteration would have somehow solved. If any bug that terminated a thread was fatal to the whole process then I suggest the thread termination handler should handle this more gracefully.
Increased space usage is a non-issue IMO. Translating a stack frame to a data structure you manage as an explicit stack requires the same space within a small constant factor.
And an oft-ignored factor is the cleanup advantages of allocating on the stack, which offsets any space and time disadvantages you might see with recursion.
Quite likely :)
This is a very good point.
Alright, you've convinced me to start playing with recursion again. Just not in java or python.
They were referring to algorithms that need to use that much space either way and where the alternative is maintaining your own stack.
Even in that case there's a tradeoff, if that stack needs to be large you can maintain it on the heap. Now you can do that to some extent with either a recursive implementation or not of course, but it's still a matter of deciding which is the better approach. It's not simply the case that "you can always just use recursion and it'll be totally fine mostly".
That sounds awfully complicated modifying a recursive algorithm to control the recursion depth. By that, I mean, if sometimes the data happens to be a very deep unbalanced tree that would cause a stack overflow with a naive recursive algorithm, you detect that situation and make it work. Isn't that much harder than just using your own stack (from a library/framework)?
> That sounds awfully complicated modifying a recursive algorithm to control the recursion depth.
An easy way to do it is to just pass a separate "depth" integer parameter into the recursive function. It then passes that +1 when it makes a recursive call. At the top of the function, if the depth is above your chosen limit, you bail out in some way instead of continuing to recurse.
> you detect that situation and make it work.
If you really do need to "make it work" for arbitrarily-deeply nested inputs, then you are actually better off using a heap-allocated stack instead of recursion.
But in most cases, a recursive implementation will only exceed the stack if the input is malicious or pathological and you can just fail in that case.
From my perspective, the main issue it causes (besides flood risks) is an overlapping state dependency that can process-bind large areas of fragmented memory.
So, if you try to unroll naive recursive code, than one wins n-many issues instead of 1 predictably separable one. Sure one could pass in a mutex etc., but it is back to using a global again and a single-core bounded context.
Best of luck, =)
If you're using some sort of balanced tree (red-black, AVL or a b-tree of some sort), the depth of the tree is guaranteed to be log(n) where n is the number of items. If you recursively descend down the tree, the number of stack frames involved will never exceed the height of the tree.
If you have a binary tree with 1 billion elements, the depth will be 20. In a b-tree with a reasonable node width (eg 16), assuming 50% occupancy the depth will be about 8.
This is, in practice, totally fine. You won't exhaust your stack traversing a balanced tree.
I think there are a lot of web devs here who never do anything more complicated than process data in a loop, and complexity analysis can be accomplished by counting indentation levels. When this viewpoint is applied with a wide brush to all programmers or the whole practice of programming, it results in some pretty spicy takes.
On the other hand, recursion extremely expensive in some languages and Python has a notorious limit that forces library writers to roll their own stack. So despite the above, I'm actually in the anti-recursion camp.
There are developers in every field who only work on trivial problems. I bet there are scientific programmers and quants and ML developers who never need more than loops in their work. Loops are powerful tools.
But it does you no credit to dismiss ‘web devs’ as more likely to be doing that kind of work. You know the web is full of hierarchies, right? Domain names, the DOM, JSON structures, file directories? And ‘web devs’ build complicated sites with navigation hierarchies and taxonomies and comment threads and team based permissions and complex data visualizations - because web devs build applications that model real world businesses and data all the time, and those things are complex and hierarchical and fractal.
Web devs have plenty of uses for recursion. Don’t be dismissive.
I spent several years in the web dev trenches; my opinion is not an out of hand dismissal.
ML research can be super boring and hard to actually generalize/replicate, but recently became a lucrative field.
All commercial projects eventually become "work" when they are no longer fun, and you wouldn't show up unless paid.
You also can realize your NDA does not expire on some IP you wrote 15 years ago. lol =)
Are you arguing against the notion that some branches of programming are going to have to handle complex data structures more often than others?
It seems like you're just trying to smooth over a relatively inoffensive point. I don't think most web developers would disagree with GP. You're the one being rude and dismissive of the person you're replying to, frankly.
Tree traversal uses stacks / queues unless you're dealing with a small tree such that you're sure recursion won't blow your stack, or your algorithm can work with tail calls and your language guarantees TCO.
Any balanced tree will be shallow enough that this isn't a problem in practice. You'll run out of RAM / hard disk space to store your tree before you run out of stack space.
There are lots of trees that can't be balanced, like tries.
Tree traversal almost never makes use of tail call recursion. Also, most of the times, you can just use a stack object instead of the program stack which saves you from insane stacktraces, arbitrary stack limits, and a bunch of wasted memory in call frames
Yes, you can use std::vector or array or similar as a stack, without using recursion (which has some easily reachable depth limit that is much smaller than what your full RAM memory allows, especially in e.g. JS)
Of course ideally programming languages would figure out a way to not have this reachable limit and allow as much recursion as your RAM allows... We're still in the ancient world when it comes to this, it seems
Recursion is like an inductive proof, you can show it is correct and it normally fits on half of a small screen.
There is an argument that all recursive proofs can be made iterative due to isomorphism.
You are not lazy enough to be a good programmer yet. ;-)
Yeah, but that is error prone and more complex. A compiler can make those same transformations. I'd argue that the properly lazy programmer is the one using recursion. To get even lazier, one should move into relational algebra.
While I am not a fan of recursion, the call stack that enables it sounds infinitely better than statically allocating space for parameters and return values. Besides, it makes some algorithms clearer. That is certainly useful in learning environments, including early academic research.
Depends on the CPU, in some architectures the stack pointers had low finite capacity before overflowing.
The only time I ever do it is for tree walking, and then it's always a private internal function, so like:
The processing can even be brought outside sometimes, or the walker can just become a generator, yielding nodes out to a regular loop doing the actual work. Either way, the recursive part is kept very minimal and easy to reason about.Obviously for the filesystem these kinds of abstractions exist already in shutil/pathlib/glob, but it still can have a place for dealing with other kinds of hierarchies, like a package dependency tree or the like.
I agree. People love making things clever instead of making things done.
Recursion for iteration is just more complicated iteration. I've never seen a good argument for it in modern programming, it just ends up being the classic backwards rationalization of something people want to believe.
Recursion for traversing a tree is just using the call stack as a stack data structure.
Any balanced tree is never going to exceed 64 levels deep (really 48 on a 48 bit memory addressing cpu) and that could easily be put in a static array that gets used as a stack without recursing. This is easier to limit and debug.
The same argument suggests that a recursive stack traversal will never consume more than 64 stack frames, so consuming stack frames is no reason not to use a recursive function.
It's just as easy to limit recursion depth, if you need to, just pass along a counter and check it. I haven't found that hand-rolling a second stack, instead of using the program stack, is easier to debug, the opposite if anything, but your mileage may vary on that one.
so consuming stack frames is no reason not to use a recursive function.
Saving stack memory isn't the point (even though a static array definitely does, because instead of multiple pointers and variables on the stack it only would have to store the node index of a tree).
The point is that you can see the whole stack and all the data that you're using at one time in a debugger instead of trying switch through call stacks to find data.
Well, it's still recursion whether you're using the call stack or are using an explicit stack structure. You're still breaking the problem down into smaller subproblems inductively. I feel that people focus on the wrong things when talking about recursion, focusing on the aspect of having a function calling itself instead of the idea of having the problem solved by way of breaking it down into smaller problems.
it's still recursion whether you're using the call stack or are using an explicit stack structure
Recursion means defining something in terms of itself, so no, using a stack isn't recursion. The call stack of lots of different function calls in a normal program isn't called recursion either.
the idea of having the problem solved by way of breaking it down into smaller problems.
That's not recursion, that's organization, modularity and all sorts of other descriptions. Where did you get these ideas?
https://en.wikipedia.org/wiki/Recursion
Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself
Except on most modern large CPUs on each recursive call the entire processor register state including program counter offset etc. are pushed onto the stack.
Seems very inefficient for the risks it brings.
I think we all agree this is still very interesting to consider. =)
To my poor eyes it's creating self sustaining computing blocks.
You could recurse using an explicit stack or use tiny function that will thread themselves as see fit.
In a way it's more encapsulating than languages preaching encapsulation.
Are you saying recursion creates "self sustaining computing blocks" ? What does that mean and how does it do it?
that will thread themselves as see fit.*What does this mean?
In a way it's more encapsulating than languages preaching encapsulation.
Recursion does this? Are you talking about not depending on a stack structure in this specific instance or something else?
I'm trying to make my ideas clearer but I'm not sure it will be :)
- using recursion involves functions as basic block of logic, that represent sub-parts of a domain
- often creates finite set of self dependent functions
- all you do is call and pass other functions, that will call each others
If you consider `map` or `fold` these create opaque functional processes that operate very automatically. It's all inductive logic at work to me. On the other hand in OO you needed (until very recently) to create iterators and empty structure to modify step by step.
PEG parsing where the monadic flavor replaces an externally managed stack
ps: maybe i'm still too high on lambda calc
It's interesting because I love using Haskell in a stable production environment. In fact I'd hate to use anything else!
On the other hand, I almost never use naked recursion in Haskell. It typically do recursion indirectly through more familiar combinators such as for loops.
I like Haskell in many ways as it shifts ones perspectives, but it is just way too easy to leave unpredictable behaviors unconstrained. =)
I'd recommend you read Guy Steele's blog post Why Object-Oriented Languages Need Tail Calls [1] and perhaps the essay by William Cook [2] and the discussion over on Lambda the Ultimate [3]
[1] https://web.archive.org/web/20091206042608/http://projectfor...
[2] https://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf
[3] http://lambda-the-ultimate.org/node/3702
Sure, or just add a polymorphic functor to walk a tree autonomously.
There are shorter ways to admit you already sold your soul. lol =)
This limitation doesn’t just proscribe traditional recursive algorithms: Any reentrancy is impossible.
Your threads, co-routines, interrupt handlers, error handlers, all have to be careful not to stomp on someone else’s use of the same function, even if they’re not directly using a function recursively.
Yes, but you do need to handle local storage.