return to table of content

Subroutine calls in the ancient world, before computers had stacks or heaps

weinzierl
57 replies
1d2h

"There was just one catch: You can't do recursion."

You could do tail recursion, because only the return address of the first call would be needed to be stored. `branch_with_link` would be used for the initial call, but the recursive calls would have to be regular branches.

Joel_Mckay
55 replies
1d2h

Most recursion is about as bad as goto coding methodologies.

Personally, I think it should be avoided in modern compiled languages too.

Haskell is fun to play around on, but it is horrific to consider using in a stable production environment. =)

autoexecbat
40 replies
1d2h

Outside of a university course, if I see recursion in a non-FP language I consider it a code-smell

munificent
33 replies
1d1h

This opinion is totally wild to me.

Do you never work with tree data structures?

I can't think of a non-trivial program I've written in the past two decades that didn't have some recursive tree traversal in it.

mannyv
22 replies
23h21m

Recursion in production code is bad news, because you can't control the depth of your call tree.

At some point you will crash because the runtime won't be able to allocate any more stack. And you can't preflight it because if you could preflight it you wouldn't be doing recursion.

Recursion is a nice toy, but in real life it's a ticking time bomb.

munificent
11 replies
21h50m

This is just bananas. I work in programming languages.

I currently have open in my editor a code formatter that I maintain that uses at least half a dozen recursive algorithms to traverse syntax trees and other data structures. This program is used by almost every user of our language, invoked on every save, and probably executed billions of times a day.

Recursion is fine.

Joel_Mckay
10 replies
21h5m

"Recursion is fine [for your use-case]."

In general it is naive, often dangerous, and an inefficient space/time trade-off.

I have been writing software for several decades... does that make one less insightful or more biased?

https://youtu.be/pmu5sRIizdw?feature=shared&t=31

=)

jcgrillo
4 replies
19h40m

For a fun time, make a nested json blob about 1000 layers deep[1] and in python:

  with open('lulz.json') as ayy_lmao:
      oops = json.load(ayy_lmao)
If you parse arbitrary json bytes from the Internet (for example, if you have some public Python API with no auth) then you have given the world a fun little stacktrace generator, and a way to lessen your server room's heating bill.

EDIT: btw you can do the same thing in rust's serde_json if the type you're deserializing into somehow supports the nesting, but you'd have to work for it.

EDIT2: the only (decent) way I can think to mitigate this in the python app's case is to enforce a sufficiently restrictive Content-Length at the edge, which obviously isn't possible of you expect blobs O(100Kb) uncompressed. Or you could pre-parse to detect nesting.

[1] EDIT3: on my machine in ipython right now it's 1485:

  import json

  def nest(d):
      return '{"k":' + str(d) + '}'

  def make_mess(n):
      v = 1
      for _ in range(n):
          v = nest(v)
      return v

  json.loads(make_mess(1485))  # boom

munificent
2 replies
19h4m

Parsing untrusted input on a machine where failure could affect someone other than the person who provided the input is definitely a known case to be mindful of.

You'll want to cap the maximum allowed nesting depth. Even if not using recursion, you probably don't want untrusted input to be able to make your stack data structure allocate arbitrary amounts of memory.

If you do put a nesting limit in, you can do that equally well using an actual stack or recursion. (For example, I believe v8 still uses recursive descent for parsing JSON and JavaScript. It detects and handles stack overflows to ensure that untrusted input can't blow the stack and crash. I'm not sure if it's using a hardcoded nesting limit or actually trapping the stack/heap collision somehow.)

jcgrillo
0 replies
18h27m

Yes, unfortunately in the python case the standard library's json parser has no way to configure a reasonable limit. It'll just consume stack frames until it hits the interpeter's recursion limit. And, as has been mentioned elsewhere, python stack frames are expensive for a variety of reasons.

I've run into similar issues with recursive code in java. The story gets better in C or rust, but there wre still sharp edges. I guess there are sharp edges everywhere though...

Joel_Mckay
0 replies
18h32m

You mean like the reply depth limit in forums like YC.

Shouldn't you stick to a depth-first branch reply limit until moving to the next fork.

I want to respect your beliefs. =)

Joel_Mckay
0 replies
19h22m

In general, most json/bson parsers like libbson have a rather shallow depth restriction (xerces also constrains this if I recall.)

And yeah, recursion is avoided in some shops for good reasons.

jameshart
2 replies
5h49m

There’s an easy way to figure out if recursion is fine for your usecase

   def isRecursionOk(problem)
       for (subProblem in (breakDownIntoSubProblems(problem))
           if ! isRecursionOk(subProblem) return false
       return true

hughdbrown
0 replies
2h33m

This code terminates only if all child-problems eventually are found not to have subproblems. So either it has infinite subproblems or it returns true. So recursion is good for a use case unless the problem decomposes into infinite subproblems, but in that case this test function never returns false.

Makes more sense to do it as breadth-first with only one return value:

   from collections import deque
   def isRecursionOk(problem):
       q = deque(problem)
       while q:
           problem = q.popleft()
           q.append(list(breakDownIntoSubProblems(problem)))
       return true

Joel_Mckay
0 replies
2h32m

"kernel: Out of memory: Kill process 9163 (Insulin-pump-task) score 511 or sacrifice child"

There are safer hobbies available like snake juggling =)

munificent
1 replies
20h10m

If you're going to make a generalization, I think recursion is fine, efficient, and rarely dangerous.

There may be use cases where it's insecure or has performance concerns, but those are the exception. Most of the time, it's fine.

We don't generally tell programmers that loops are naive, often dangerous, and risk locking up the program. They certainly can, but most just... don't.

Just like loops, recursion can make code much simpler to write, read, and maintain. Every modern language under the sun supports it for very good reasons. Use it.

Joel_Mckay
0 replies
19h47m

For many reasons, people often add nesting limits in recursive structures to at least constrain how bad they behave when something eventually does go wrong.

Parsers are far from immune to these issues.

I think we will have to "agree to disagree" here... =)

naasking
8 replies
19h58m

Recursion in production code is bad news, because you can't control the depth of your call tree.

Of course you can, if you wanted to, just like you can control the iteration count of a loop. It's not even hard. This is simply a non-issue.

Some algorithms are much more naturally expressed recursively and writing the imperative equivalent with manual stack handling is just annoying. Stack growth is just something you don't have to worry about for almost all scenarios you're likely to encounter.

jcgrillo
4 replies
18h49m

This is simply a non-issue.

Well, it's about the tradeoffs right? If I have a recursive algorithm that's growing the stack (assuming no TCO, because few languages people actually use in production support it) I'm trading execution time, space, and reliability for economy of expression. In reverse order:

- reliability: if, as you suggest, I implement some hard depth limit (which is necessary because all the processes which are running concurrently need to not exceed my max stack depth), and assuming generally that things grow over time (more users, more concurrent processes, more recursive calls needed to get the job done) we face two issues. (1) theres a complicated relationship between the maximum number of concurrent processes and the maximum allowable recursion depth. Getting it wrong could crash the entire program! If this is a web server that's means we just killed a whole bunch of connections all at once. (2) eventually, over time, we'll need to raise the recursion limit, which entails rebalancing the concurrency limit. Hard walls like this are bad news in systems. If instead this was implemented iteratively the system would degrade softly as the iteration count grows--each process would take longer to complete, but they'd still all complete. Assuming I'm monitoring process execution time I can predict and respond to this proactively instead of being faced with an emergency where the system is just completely broken.

- space: this is obvious I guess, more stack frames == more memory. The problem is worse in some languages than others.

- time: this may be less obvious, and there may be optimizations which render it false, but generally in my experience iterative code gets pipelined better and runs quicker.

Stack growth is just something you don't have to worry about for almost all scenarios you're likely to encounter.

I guess that depends on the situation. I've encountered hard walls and performance issues from recursion enough times in my career thus far that I make the extra effort to avoid it. I could totally see the value, though, in areas where you know ahead of time how the recursion depth will scale over time. More often than not, though, that's unknowable at implementation time so better err on the side of caution.

EDIT: upon re-reading this I think it might have been clearer if I insted wrote "task" every time I wrote "process"--I'm not talking specifically about any OS feature.

naasking
1 replies
16h7m

I think you're overthinking it. Here are probably the people who need to worry about recursion depth: embedded developers working with limited memory, and developers working on data systems that process huge data sets (and even this is debatable as stack growth is log n with these tree data structures).

My process for deciding when to use iteration or recursion is simple: if each step needs to keep context then use recursion, otherwise use iteration (unless recursion with TCO is the native idiom of course).

I've never had an issue that wasn't an obvious bug that iteration would have somehow solved. If any bug that terminated a thread was fatal to the whole process then I suggest the thread termination handler should handle this more gracefully.

Increased space usage is a non-issue IMO. Translating a stack frame to a data structure you manage as an explicit stack requires the same space within a small constant factor.

And an oft-ignored factor is the cleanup advantages of allocating on the stack, which offsets any space and time disadvantages you might see with recursion.

jcgrillo
0 replies
13h22m

I think you're overthinking it

Quite likely :)

And an oft-ignored factor is the cleanup advantages of allocating on the stack, which offsets any space and time disadvantages you might see with recursion.

This is a very good point.

Alright, you've convinced me to start playing with recursion again. Just not in java or python.

furyofantares
1 replies
17h32m

They were referring to algorithms that need to use that much space either way and where the alternative is maintaining your own stack.

jcgrillo
0 replies
17h12m

Even in that case there's a tradeoff, if that stack needs to be large you can maintain it on the heap. Now you can do that to some extent with either a recursive implementation or not of course, but it's still a matter of deciding which is the better approach. It's not simply the case that "you can always just use recursion and it'll be totally fine mostly".

EnigmaFlare
1 replies
16h58m

That sounds awfully complicated modifying a recursive algorithm to control the recursion depth. By that, I mean, if sometimes the data happens to be a very deep unbalanced tree that would cause a stack overflow with a naive recursive algorithm, you detect that situation and make it work. Isn't that much harder than just using your own stack (from a library/framework)?

munificent
0 replies
3h56m

> That sounds awfully complicated modifying a recursive algorithm to control the recursion depth.

An easy way to do it is to just pass a separate "depth" integer parameter into the recursive function. It then passes that +1 when it makes a recursive call. At the top of the function, if the depth is above your chosen limit, you bail out in some way instead of continuing to recurse.

> you detect that situation and make it work.

If you really do need to "make it work" for arbitrarily-deeply nested inputs, then you are actually better off using a heap-allocated stack instead of recursion.

But in most cases, a recursive implementation will only exceed the stack if the input is malicious or pathological and you can just fail in that case.

Joel_Mckay
0 replies
19h7m

From my perspective, the main issue it causes (besides flood risks) is an overlapping state dependency that can process-bind large areas of fragmented memory.

So, if you try to unroll naive recursive code, than one wins n-many issues instead of 1 predictably separable one. Sure one could pass in a mutex etc., but it is back to using a global again and a single-core bounded context.

Best of luck, =)

josephg
0 replies
22h29m

If you're using some sort of balanced tree (red-black, AVL or a b-tree of some sort), the depth of the tree is guaranteed to be log(n) where n is the number of items. If you recursively descend down the tree, the number of stack frames involved will never exceed the height of the tree.

If you have a binary tree with 1 billion elements, the depth will be 20. In a b-tree with a reasonable node width (eg 16), assuming 50% occupancy the depth will be about 8.

This is, in practice, totally fine. You won't exhaust your stack traversing a balanced tree.

klyrs
4 replies
22h37m

I think there are a lot of web devs here who never do anything more complicated than process data in a loop, and complexity analysis can be accomplished by counting indentation levels. When this viewpoint is applied with a wide brush to all programmers or the whole practice of programming, it results in some pretty spicy takes.

On the other hand, recursion extremely expensive in some languages and Python has a notorious limit that forces library writers to roll their own stack. So despite the above, I'm actually in the anti-recursion camp.

jameshart
3 replies
5h54m

There are developers in every field who only work on trivial problems. I bet there are scientific programmers and quants and ML developers who never need more than loops in their work. Loops are powerful tools.

But it does you no credit to dismiss ‘web devs’ as more likely to be doing that kind of work. You know the web is full of hierarchies, right? Domain names, the DOM, JSON structures, file directories? And ‘web devs’ build complicated sites with navigation hierarchies and taxonomies and comment threads and team based permissions and complex data visualizations - because web devs build applications that model real world businesses and data all the time, and those things are complex and hierarchical and fractal.

Web devs have plenty of uses for recursion. Don’t be dismissive.

klyrs
0 replies
2h31m

I spent several years in the web dev trenches; my opinion is not an out of hand dismissal.

Joel_Mckay
0 replies
2h39m

ML research can be super boring and hard to actually generalize/replicate, but recently became a lucrative field.

All commercial projects eventually become "work" when they are no longer fun, and you wouldn't show up unless paid.

You also can realize your NDA does not expire on some IP you wrote 15 years ago. lol =)

DiggyJohnson
0 replies
2h25m

Are you arguing against the notion that some branches of programming are going to have to handle complex data structures more often than others?

It seems like you're just trying to smooth over a relatively inoffensive point. I don't think most web developers would disagree with GP. You're the one being rude and dismissive of the person you're replying to, frankly.

Arnavion
2 replies
1d1h

Tree traversal uses stacks / queues unless you're dealing with a small tree such that you're sure recursion won't blow your stack, or your algorithm can work with tail calls and your language guarantees TCO.

josephg
1 replies
22h26m

Any balanced tree will be shallow enough that this isn't a problem in practice. You'll run out of RAM / hard disk space to store your tree before you run out of stack space.

Arnavion
0 replies
22h0m

There are lots of trees that can't be balanced, like tries.

hehhehaha
0 replies
21h51m

Tree traversal almost never makes use of tail call recursion. Also, most of the times, you can just use a stack object instead of the program stack which saves you from insane stacktraces, arbitrary stack limits, and a bunch of wasted memory in call frames

Aardwolf
0 replies
10h7m

Do you never work with tree data structures?

Yes, you can use std::vector or array or similar as a stack, without using recursion (which has some easily reachable depth limit that is much smaller than what your full RAM memory allows, especially in e.g. JS)

Of course ideally programming languages would figure out a way to not have this reachable limit and allow as much recursion as your RAM allows... We're still in the ancient world when it comes to this, it seems

nerpderp82
2 replies
1d1h

Recursion is like an inductive proof, you can show it is correct and it normally fits on half of a small screen.

Joel_Mckay
1 replies
1d

There is an argument that all recursive proofs can be made iterative due to isomorphism.

You are not lazy enough to be a good programmer yet. ;-)

nerpderp82
0 replies
40m

Yeah, but that is error prone and more complex. A compiler can make those same transformations. I'd argue that the properly lazy programmer is the one using recursion. To get even lazier, one should move into relational algebra.

II2II
1 replies
1d1h

While I am not a fan of recursion, the call stack that enables it sounds infinitely better than statically allocating space for parameters and return values. Besides, it makes some algorithms clearer. That is certainly useful in learning environments, including early academic research.

Joel_Mckay
0 replies
1d

Depends on the CPU, in some architectures the stack pointers had low finite capacity before overflowing.

mikepurvis
0 replies
1d1h

The only time I ever do it is for tree walking, and then it's always a private internal function, so like:

    def process_that_tree(my_tree):
      def _handle(node):
        # do the things
        for child in node.children:
          _handle(child)
      _handle(my_tree.root)
The processing can even be brought outside sometimes, or the walker can just become a generator, yielding nodes out to a regular loop doing the actual work. Either way, the recursive part is kept very minimal and easy to reason about.

Obviously for the filesystem these kinds of abstractions exist already in shutil/pathlib/glob, but it still can have a place for dealing with other kinds of hierarchies, like a package dependency tree or the like.

CyberDildonics
8 replies
1d

I agree. People love making things clever instead of making things done.

Recursion for iteration is just more complicated iteration. I've never seen a good argument for it in modern programming, it just ends up being the classic backwards rationalization of something people want to believe.

Recursion for traversing a tree is just using the call stack as a stack data structure.

Any balanced tree is never going to exceed 64 levels deep (really 48 on a 48 bit memory addressing cpu) and that could easily be put in a static array that gets used as a stack without recursing. This is easier to limit and debug.

samatman
4 replies
23h17m

The same argument suggests that a recursive stack traversal will never consume more than 64 stack frames, so consuming stack frames is no reason not to use a recursive function.

It's just as easy to limit recursion depth, if you need to, just pass along a counter and check it. I haven't found that hand-rolling a second stack, instead of using the program stack, is easier to debug, the opposite if anything, but your mileage may vary on that one.

CyberDildonics
3 replies
21h3m

so consuming stack frames is no reason not to use a recursive function.

Saving stack memory isn't the point (even though a static array definitely does, because instead of multiple pointers and variables on the stack it only would have to store the node index of a tree).

The point is that you can see the whole stack and all the data that you're using at one time in a debugger instead of trying switch through call stacks to find data.

sham1
2 replies
10h44m

Well, it's still recursion whether you're using the call stack or are using an explicit stack structure. You're still breaking the problem down into smaller subproblems inductively. I feel that people focus on the wrong things when talking about recursion, focusing on the aspect of having a function calling itself instead of the idea of having the problem solved by way of breaking it down into smaller problems.

CyberDildonics
1 replies
6h35m

it's still recursion whether you're using the call stack or are using an explicit stack structure

Recursion means defining something in terms of itself, so no, using a stack isn't recursion. The call stack of lots of different function calls in a normal program isn't called recursion either.

the idea of having the problem solved by way of breaking it down into smaller problems.

That's not recursion, that's organization, modularity and all sorts of other descriptions. Where did you get these ideas?

https://en.wikipedia.org/wiki/Recursion

Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself

Joel_Mckay
0 replies
2h18m

Except on most modern large CPUs on each recursive call the entire processor register state including program counter offset etc. are pushed onto the stack.

Seems very inefficient for the risks it brings.

I think we all agree this is still very interesting to consider. =)

agumonkey
2 replies
21h55m

To my poor eyes it's creating self sustaining computing blocks.

You could recurse using an explicit stack or use tiny function that will thread themselves as see fit.

In a way it's more encapsulating than languages preaching encapsulation.

CyberDildonics
1 replies
20h23m

Are you saying recursion creates "self sustaining computing blocks" ? What does that mean and how does it do it?

that will thread themselves as see fit.*

What does this mean?

In a way it's more encapsulating than languages preaching encapsulation.

Recursion does this? Are you talking about not depending on a stack structure in this specific instance or something else?

agumonkey
0 replies
46m

I'm trying to make my ideas clearer but I'm not sure it will be :)

Are you talking about not depending on a stack structure in this specific instance or something else?

- using recursion involves functions as basic block of logic, that represent sub-parts of a domain

- often creates finite set of self dependent functions

- all you do is call and pass other functions, that will call each others

In a way it's more encapsulating than languages preaching encapsulation.

If you consider `map` or `fold` these create opaque functional processes that operate very automatically. It's all inductive logic at work to me. On the other hand in OO you needed (until very recently) to create iterators and empty structure to modify step by step.

Are you talking about not depending on a stack structure in this specific instance or something else?

PEG parsing where the monadic flavor replaces an externally managed stack

ps: maybe i'm still too high on lambda calc

tome
1 replies
1d

It's interesting because I love using Haskell in a stable production environment. In fact I'd hate to use anything else!

On the other hand, I almost never use naked recursion in Haskell. It typically do recursion indirectly through more familiar combinators such as for loops.

Joel_Mckay
0 replies
1d

I like Haskell in many ways as it shifts ones perspectives, but it is just way too easy to leave unpredictable behaviors unconstrained. =)

Joel_Mckay
0 replies
1d

Sure, or just add a polymorphic functor to walk a tree autonomously.

There are shorter ways to admit you already sold your soul. lol =)

twoodfin
0 replies
21h50m

This limitation doesn’t just proscribe traditional recursive algorithms: Any reentrancy is impossible.

Your threads, co-routines, interrupt handlers, error handlers, all have to be careful not to stomp on someone else’s use of the same function, even if they’re not directly using a function recursively.

wglb
0 replies
17h46m

Yes, but you do need to handle local storage.

dragontamer
21 replies
1d3h

I really liked the Art of Computer Programming with regards to this subject.

While seemingly obsolete, there are a ton of pre-heap / pre-stack algorithms for dynamically changing arrays or other data structures.

The book also builds up to garbage collection and how to implements Lisp-lists. The kind of encyclopedic knowledge you'd expect from Knuth.

-------

One of my favorites is how to have two Arrays dynamically take up one space.

Have one array grow normally from location#0, and the second array grow backwards from location#End.

Now both arrays take up the statically allocated space efficiently sharing.

This can be extended to an arbitrary number of arrays, but at that point you might as well use Malloc and Realloc. Or at least, the techniques therein are really close to a malloc-like routine IMO.

Retr0id
7 replies
1d2h

SQLite's on-disk format uses a similar array technique for storing the contents of table btree leaf node pages. Within a fixed-size page, there's an array of offsets growing forwards, and an array of (variable length) row values growing backwards from the end - the latter of which may end up with holes when rows are deleted (iiuc).

It wouldn't surprise me if it was a direct inspiration, since their docs cite TAOCP for the btree structure itself.

hiAndrewQuinn
3 replies
1d2h

We know from the Corecursive podcast on the hidden story of SQLite that Dr. Hipp did indeed reference and pull at least one alg straight from Knuth's books, which he had behind his desk. What a different world that must have been!

andrewshadura
2 replies
20h0m

FYI, he's D. R. Hipp, not Dr. Hipp.

jcgrillo
1 replies
19h48m

More like Dr. D. R. Hipp, actually.

tomcam
0 replies
15h58m

Hilarious and true

tanelpoder
0 replies
22h41m

Oracle does this as well and I suspect many other row-major RDBMS storage engines do too.

suncherta
0 replies
20h49m

As others have already mentioned this is used in a lot (most?) of databases. It is called "slotted-page". Searching for it gives good discussions/explanations of it.

felixge
0 replies
1d2h

This sounds like it was copied from PostreSQL which SQLite cites as a strong source of inspiration.

masfuerte
5 replies
21h53m

Some word processors on 8-bit computers worked like this. Your document took all of the available RAM. Text before the cursor was at the start of RAM and text after the cursor was at the end of RAM. Insertions and pastes didn't need to shuffle data around but navigation did. It worked well.

shadowgovt
4 replies
21h46m

emacs still does this IIUC. The beating heart of the emacs edit model is a "gap buffer" at the cursor.

There's a neat compare-and-contrast someone did awhile back on the gap buffer approach vs. the other approach often taken for code IDEs ("ropes"). The tl;dr is that gap buffers are actually really performant for a lot of cases except for having to edit at a lot of randomly-chosen cursor points far apart from each other (but how often is that your use case?).

https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-...

derefr
2 replies
16h45m

except for having to edit at a lot of randomly-chosen cursor points far apart from each other (but how often is that your use case?)

I use Sublime, and pretty often I do a "Find All" for some term; then Multiselect (Cmd+L); then select from cursor (Shift+RightArrow); and then type something. Which is essentially "editing at a lot of randomly [or at least arbitrarily]-chosen cursor points."

That, and IIRC Sublime actually does its "Find and Replace All" operation, as essentially this same multicursor-select-and-type operation internally. (If you have a large-enough buffer open, you can see it gradually doing it!)

shadowgovt
0 replies
13h27m

I've never been able to make that work, but I respect that you have. Generally, that kind of select almost always grabs a bunch of substrings that aren't appropriate and I end up mangling my file.

User23
0 replies
59m

I’ve never noticed any perceptible latency using iedit[1] in Emacs, which does what it sounds to me like you are describing.

It does suggest a possible multigap buffer structure though for efficiently doing simultaneous edits of distant locations in very large files. In that case though I’d probably be iediting a small wgrep[2] buffer anyhow so it might not really matter.

[1] https://github.com/victorhge/iedit

[2] https://github.com/mhayashi1120/Emacs-wgrep

cerved
0 replies
20h7m

seems like Vim does too

shadowgovt
2 replies
1d1h

This succinctly describes how old MacOS allocated per-application resources. Each app had a minimum and preferred RAM requirement tagged to it. When launched, it would take up the whole preferred slot, unless that amount wasn't available in which case it'd take up less than preferred (and fail to launch if it couldn't get minimum).

The system allocated the heap (and libraries, I think) at the bottom and the stack at the top of that physical-RAM slice, and off you went.

I believe around system 8 they added a virtualization layer that started to obviate the need for that approach (and of course by the time of MacOSX they were using paged memory like everyone else was and so such fancy tap-dancing was no longer needed). But it's fun to think about the era where this "one weird trick" from Art of Computer Programming was how we allocated RAM for multiple concurrent applications.

sgerenser
1 replies
7h29m

Pretty sure even with virtual memory (which was actually added sometime in the System 7 days), you still could manually set the minimum and desired memory sizes for each app. Maybe it made less of a difference than it used to, but I still remember tweaking those values for certain apps (like trying to open a large file in Photoshop) even after MacOS 8 and 9 were out. MacOS X was a breath of fresh air by comparison.

shadowgovt
0 replies
2h42m

You definitely could. I can't remember right now what effect that had, because the whole thing was running atop a virtual memory layer at that point... I can't recall if it was a lie (i.e. feature existed but had no effect) or if it had some effect on how long it'd be before an app started resorting to paging out RAM pages.

yongjik
0 replies
22h15m

Have one array grow normally from location#0, and the second array grow backwards from location#End.

Fun fact: Itanium had two stacks, one for manual push/pop and another for cycling through the register file. One stack grew upward, another grew downward. A fascinating architecture, though it never delivered the promised performance.

fuzztester
0 replies
10h42m

but at that point you might as well use Malloc and Realloc.

IIRC, the first edition of the classic K&R C book, The C Programming Language, had an example of creating a memory allocator and using it.

dspillett
0 replies
7h56m

> While seemingly obsolete, there are a ton of pre-heap / pre-stack algorithms for dynamically changing arrays or other data structures.

These are not as obsolete as they might seem to many. In some environments you might still be very restricted and want to avoid all dynamic allocation which could force you to use these types of algorithm so you can work in-place in a static memory block.

bewaretheirs
0 replies
1d1h

Have one array grow normally from location#0, and the second array grow backwards from location#End.

The stack in most ISAs and ABIs grows down from a high address, allowing this trick to be used to divide memory flexibly between heap and stack in single-threaded small-memory systems.

082349872349872
15 replies
1d6h

Note that before we had arbitrarily extensible heaps, programmers always did at least a little engineering, in that they had to consider the probable distribution of inputs and size* all their intermediate storage appropriately.

* giving rise to "BUGS AND LIMITATIONS"

kstrauser
6 replies
1d2h

Historically, that was also a big goal of GNU. It aimed to get rid of artificial limitations in core utilities. That was a big improvement over (made up example) sed having a finite and short maximum command length.

deckard1
5 replies
1d1h

I can understand why people wanted that, and the benefit of doing that.

With that said, I also see benefit in having limitations. There is a certain comfort in knowing what a tool can do and cannot do. A hammer cannot become a screwdriver. And that's fine because you can then decide to use a screwdriver. You're capable of selection.

Take PostgreSQL. How many devs today know when it's the right solution? When should they use Redis instead? Or a queue solution? Cloud services add even more confusion. What are the limitations and weaknesses of AWS RDS? Or any AWS service? Ask your typical dev this today and they will give you a blank stare. It's really hard to even know what the right tool is today, when everything is abstracted away and put into fee tiers, ingress/egress charges, etc. etc.

tl;dr: limitations and knowledge of those limitations are an important part of being able to select the right tool for the job

mywittyname
1 replies
21h9m

Imagine using a program that can only allocate 4GB of ram because it has 32-bit address space. There's no benefit to that limitation, it's an arbitrary limit imposed by the trades-offs made in the 80s. It just means that someone will need to build another layer to their program to chunk their input data then recombine the output. It's a needless waste of resources.

The benefit of not having a limitation is that the real limits scale with compute power. If you need more than 4GB of memory to process something, add more memory to the computer.

swatcoder
0 replies
20h6m

Imagine using a program that can only allocate 4GB of ram because it has 32-bit address space. There's no benefit to that limitation

You're looking at isolated parts of a system. In a system, an artificial "limit" in one component becomes a known constraint that other components can leverage as part of their own engineering.

In the example of memory addresses, it might be "artificial" to say that a normal application can only use 32-bit or 48-bit addresses when the hardware running the application operates in 64-bits, but this explicit constraint might enable (say) a runtime or operating system to do clever things with those extra bits -- security, validation, auditing, optimization, etc.

And in many cases, the benefits of being able to engineer a system of constrained components are far more common and far more constructive than the odd occasion that a use case is entirely inhibited by a constraint.

That's not to say that we should blindly accept and perpetuate every constraint ever introduced, or introduce new ones without thoughtful consideration, but it's wrong to believe they have "no benefit" just because they seem "artificial" or "arbitrary".

kstrauser
1 replies
1d

I see zero benefit in having artificial functionality limitations. In my hypothetical example, imagine that `sed 's/foo/bar/'` works but `sed 's/foo/bark/'` does not because it's 1 character too long. There's not a plausible scenario where that helps me. You wouldn't want to expand sed to add a fullscreen text editor because that's outside its scope. Within its scope, limitations only prevent you from using it where you need it. It would be more like a hammer that cannot be made to hammer 3 inch nails because it has a hard limit of 2.5 inches.

Those are the kinds of limits GNU wanted to remove. Why use a fixed-length buffer when you can alloc() at runtime? It doesn't mean that `ls` should send email.

kryptiskt
0 replies
19h49m

There's a major benefit: you can test that a program with an artificial limit works up to the limit, and fails in a well-defined manner above the limit. A program without any hardcoded limit will also fail at some point, but you don't know where and how.

Joker_vD
0 replies
1d1h

A hammer cannot become a screwdriver.

Don't tell me you've never hammered a screw into a wooden plank? Vice versa, a screwdriver also can be used as a hammer although a quite pathetic one.

lenerdenator
5 replies
1d1h

Really, the mistake was letting humans provide input to computer programs.

layer8
4 replies
23h18m

In particular input to assemblers, compilers, and interpreters.

082349872349872
3 replies
22h54m

I once wrote a 16-bit assembler as a recursive function, which emitted code and fixed data on the way down and patched in relative offsets on the way back up.

In principle, one might worry about blowing the stack, but as the sole purpose of this function was to assemble at most a half-k boot sector, in practice they fit.

blueflow
1 replies
8h37m

Im curious about that. At which point did you do the recursion? "Assemble everything after this line and tell me offset of symbol X when you are done" ?

082349872349872
0 replies
2h8m

Yes. (well: tell me the entire symtab when you are done, but you probably could've guessed that)

kragen
0 replies
1h22m

i seem to recall you wrote a sort of 16-bit assembler in machine code consisting entirely of printable ascii, except for a couple of places where it had to patch itself for instructions like int 21 that couldn't be done that way. but that was a different one, wasn't it?

eschneider
1 replies
1d3h

Those before times are still today, depending on what you're doing. For hard real-time, dynamic memory is (almost) never used, mostly because the time needed to alloc/free memory isn't deterministic. So everything is statically allocated at compile time and yeah, you've got to know how much memory your inputs are going to consume.

But knowing the bounds of memory consumption used to be normal for application programmers, too. I mean, you don't want to run out of memory. Ever. What do people do now, just YOLO memory usage?

shadowgovt
0 replies
1d1h

What do people do now, just YOLO memory usage?

Yes, literally. Careful allocation has given way to "Use what's there and fail hard if it ain't enough" in the era where machines are cheap and the dominant paradigm is how parallel you can make your algorithm so you can solve your problem with scale.

In the modern era, for most online service software (which is, I'd argue, most software people interface with these days), you write it as fast as you can, prototype it small, start to care about RAM allocation (but build it to just die screaming if it runs out of RAM so you know you need to make a change), and keep scaling up.

You don't care a lot about RAM allocation until you're scaled to a lot of nodes because only at that scale is whether the app takes up a kilobyte less going to start to matter on machines that are swinging gigabytes of storage around.

There are plenty of application spaces where this isn't the status quo (embedded architectures, console gaming), but that tends to be considered specialist engineering in this day and age.

gigel82
13 replies
1d3h

Am I the only one who read it as the ancient world before computers existed had stacks or heaps? English is so weird sometimes...

grraaaaahhh
1 replies
1d2h

Yeah, I was all ready to learn about how the ancient Egyptians used stacks to build the pyramids or something.

nerpderp82
0 replies
1d1h

Technically they did.

brudgers
1 replies
1d1h

The ordinarily “The Ancient World” means an era ending with the fall of the Western Roman Empire circa 476 CE.

All that world was well before computers were a thing (even the human type were called “scribes,” “clerks” etc.)

So of course I read it the same way even though “before computers” was redundant once your comment made me think about it.

anthk
0 replies
1d1h

They had abaci with stacks.

trealira
0 replies
1d2h

I was expecting some ancient Greek algorithms that used stacks or priority queues, haha.

samatman
0 replies
1d

It's a garden-path sentence: https://en.wikipedia.org/wiki/Garden-path_sentence

One is lead to believe the subject is "the ancient world before computers", as it would be in "the ancient world before computers had gladiators and triremes", but the subject is "computers", and the ancient world is the topic.

perihelions
0 replies
1d1h

"Novel analysis of the Antikythera mechanism suggests all of its wheels were statically allocated at construction time"

mr_toad
0 replies
17h21m

Well of course they had stacks and queues - computing adopted those terms from the real world. Stacks were even used for calculating in the form of an abacus.

marcellus23
0 replies
1d1h

This has nothing to do with English, it's just syntactic ambiguity. Every language has it.

ivanjermakov
0 replies
1d2h

Me too clicked the link expecting stories of heap and stack uses before computers.

glhaynes
0 replies
1d2h

I’m no grammarian but I think it should have a comma after “world”.

dfxm12
0 replies
1d2h

I don't find the ambiguity weird, but rather poetic. Perhaps it is intentional. After all, computer technology has come such a long way in just our lifetime. A modern computer would probably be just as alien to someone in the 1920s as the 20s. In the same way, an average person today would be just as baffled by the ENIAC as they would an Antikythera mechanism...

I sincerely doubt that English is the only language that supports such word play.

Toorkit
0 replies
23h17m

If that had been the way it was intended, I think it would have been phrased "stacks and heaps."

I.e. "the ancient world had stacks and heaps" as opposed to "the ancient world had stacks or heaps," which sounds weird.

HeyLaughingBoy
10 replies
23h16m

A long time ago (1991 maybe?) one of my first freelance projects during my first job out of college was to design an RS232 serial multiplexer -- take an incoming serial datastream, parse it, and redirect it to one of n outputs based on its header.

I remember doing something similar to what he describes. My hardware design was basically a Z80, a 2716 EPROM (I may have also had a Parallax EPROM emulator to speed debugging) and a couple of Z80-SIO serial devices. Notice that there is no SRAM chip :-)

The Z80 has enough registers that I could hold all the data I needed internally so I decided against the extra cost of RAM. This was the early 90's, remember. The one thing I was missing was a stack to make function calls. This was done (memory is fuzzy here) by preloading the return address onto a register pair and then calling the function. When the function was done, it would do an indirect jump to the location held in that register pair, which would return to the next instruction after the call.

I don't remember how I passed and returned values, but I probably dedicated a register to that. I have a couple of old hard drives around here somewhere. I should see if I saved the code anywhere; I was quite proud of it.

smcin
4 replies
21h17m

'2716 EPROM' was a 16K (2kb x 8) EPROM.

Which customer/sector was your project intended for?

mytailorisrich
2 replies
21h7m

Yes, the xx(x) number in the 27xx(x) EPROM series indicates the kilo (1024) bits. So indeed this is 2 KiB. This sounds OK for the application described, especially as it was no doubt written in assembly and obviously bare metal. When it does not fit but not far off it's when the fun of code optimisation kicks in ;)

Edit: Very interesting how easily available they still seem to be according to Google.

smcin
0 replies
19h19m

Ok I was trying to help you communicate "8-bit CPU which had 14 registers, no RAM, tiny EPROM, no persistent storage/disk/SSD". And you were doing function calls.

mark-r
0 replies
14h0m

My first personal computer was a Z80 with a 2716. Into that 2K I fit basic monitor commands (dump/store memory), a terminal emulator with VT52 escape sequences, and a floppy boot loader. Got very good at squeezing out every last byte.

HeyLaughingBoy
0 replies
14h41m

Don't remember. I do remember that I didn't get paid.

I was so excited to have a freelance job (don't remember how I got the customer either) that I jumped right into it and worked heads down for a couple weeks. Then when I went to tell the customer that I was done and ready to ship, I couldn't reach him. Finally heard back from him about 6-12 months later and he was surprised that I had been working on it without hearing back from him and that he no longer needed the product.

And that, kids, is how I learned to get at least partial payment upfront.

billforsternz
1 replies
21h21m

That is very cool and I say that as someone whose first project in my first job (1982) was to write the code for a serial multiplexer with a Z80, a 2716 EPROM, Zilog SIOs (or maybe DUARTs actually).... and 2K of static RAM. I basically learned my craft on the job and there's no way I could have done it without the RAM. A few years later I would have relished the challenge. Although you are really limiting the complexity of the protocol you can deal with ... (perhaps not such a terrible thing:).

ChrisMarshallNY
0 replies
16h53m

Writing code for the old Mac serial chip was an adventure.

I wrote a MIDI driver once, and that required dividing a 1MHz external clock, to get the 62.5KHz (I think) serial clock (you couldn’t divide the internal clock).

Their chip had 8 control lines, with each line controlling some aspect of the chip operation. They set it up as the lower 8 bits of the 16-bit address bus, with the upper 8 bits fixed (so you were working with 256 addresses, to control the chip).

So just referencing an address would change the state of the chip.

A lot of hardware limitations influenced software structure, and I’ll bet that a lot of software proclivities have their genesis in weird hardware compromises (like bytes).

RiverCrochet
1 replies
19h59m

This was done (memory is fuzzy here) by preloading the return address onto a register pair and then calling the function. When the function was done, it would do an indirect jump to the location held in that register pair, which would return to the next instruction after the call.

This is pretty much what the MIPS `jal` and ARM `BL` do. TMS9900 also did something similar (edit: it had a `BL` instruction too.)

I was fascinated by the fact the Signetics 2650 that had an 8-byte internal stack.

HeyLaughingBoy
0 replies
6h37m

MC68xx series also had a 16-bit indexed jump. It's a very useful instruction.

Sponge5
0 replies
22h24m

You should be proud of it, I can't even begin to think about programming with these constraints, and I work in embedded.

howerj
8 replies
1d2h

I wrote a Forth interpreter for a SUBLEQ machine (https://github.com/howerj/subleq), and for a bit-serial machine (https://github.com/howerj/bit-serial), both of which do not have a function call stack which is a requirement of Forth. SUBLEQ also does not allow indirect loading and stores as well and requires self-modifying code to do anything non-trivial. The approach I took for both machines was to build a virtual machine that could do those things, along with cooperative multithreading. The heap, if required, is written in Forth, along with a floating point word-set (various MCUs not having instructions for floating point numbers is still fairly common, and can be implemented as calls to software functions that implement them instead).

I would imagine that other compilers took a similar approach which wasn't mentioned.

EDIT: There were some BASIC interpreters which did this as well, implementing a VM and then targetting that instead. P-Code is a similar thing.

anthk
3 replies
1d1h

I was about to talk about subleq, but it's damn difficult even to write a "Hello world".

howerj
2 replies
1d

Yeah, it really is a pain to do anything in it. I got around that by writing in another language instead as soon as possible. You can paper over many issues with a VM, although it will slow things down on such limited computers.

anthk
1 replies
23h6m

How does muxleq work? It looks like a 'concurrent' subleq...

howerj
0 replies
22h42m

I am not sure what you mean by concurrent in this case, muxleq is just subleq with one extra instruction. The extra instruction is based off of multiplexing (see https://en.wikipedia.org/wiki/Multiplexer). Subleq as you know is incredibly inefficient, muxleq is an experiment to add a single instruction to Subleq in order to greatly improve its efficiency.

The `mux` instruction added computes:

   m[operand2] = (m[operand1] & ~selector) | (m[operand2] & selector)
As multiplexing is a universal gate (so long as you have access to true and false as constants) you can use this to calculate AND/OR/XOR, which are very expensive to do in a pure Subleq machine. It also means you can do a MOV in a single instruction instead of four (set the `selector` to zero in this case). This in turns speeds up indirect loads and stores.

parlortricks
2 replies
21h36m

I came across your works learning and consuming all things Forth and Subleq. It was great to read over how you approach things. I wanted to purchase your book but Amazon says no...will there be another run?

howerj
1 replies
20h54m

Thanks! You should still be able buy it, you might have to either click on "Paperback" or "Hardcover", if you want to get the kindle edition you have to go to the Amazon webpage that serves you country (e.g. if you are in the UK and you are on amazon.com you get "This title is not currently available for purchase", but if you go to amazon.co.uk, you get buy it). That's an odd user interface issue...

parlortricks
0 replies
19h32m

Ahhhh Amazon weirdness...all good, i managed to sort it out, paperback on its way!

bitwize
0 replies
12h32m

EDIT: There were some BASIC interpreters which did this as well, implementing a VM and then targetting that instead.

The TI-99/4A had 256 bytes (128 words) of main, CPU-accessible RAM. Most of the base system's memory was video RAM, accessible by a relatively cumbersome process of poking and peeking registers on the system's video chip. The video chip maintained an autoincrementing current memory pointer, so successive reads (or writes) would bump the pointer by one allowing straightforward transfers of data, but the very fact that most of the system's memory was only accessible in this way made significant programs difficult to write. So, TI's solution was to create an abstract machine called GPL in which memory accesses to this video RAM were more natural. It was interpreted on the TMS9900 and therefore slower than native code, though -- especially given that the CPU can only access the video chip's RAM while the chip itself is not doing scanout to the display, so during horizontal and vertical retrace.

And since all the BASIC code and variables lived in this video memory, guess what the TI-99/4A's BASIC interpreter was written in! Yeah, it wasn't very fast, like at all.

The neat part, apropos of the article's topic, is that there were no actual general-purpose registers on the TMS9900: workspace registers WR0 through WR15 were instead located somewhere in memory, pointed to by the WP (workspace pointer) register. The CPU only had three physical registers: PC (program counter), WP, and a status register. What this amounted to was you could do a very primitive form of register windowing: by using the BLWP (Branch and Load WP) instruction, you can branch to a subroutine in which a new set of "registers" will be active elsewhere in memory -- and the return address will be saved in the new workspace.

If I'm going on about the TI-99/4A a lot recently, it's because I'm writing an assembler for it as a personal project.

taeric
5 replies
19h58m

If it blows your mind to consider the time before call stacks were somewhat the norm in computers, you should look into the time from before register renaming.

I don't think many people realize just how much of a modern processor is made around supporting preemptive multitasking. Folks know that we have a class of security bugs from it, but a lot of what happens in a CPU is around juggling multiple workloads.

mr_toad
4 replies
17h25m

In ten years or so all the people who worked on machines without preemptive multitasking will be retired.

froh
1 replies
12h25m

wait what? you're right! 386 came 1985, 45 years later is 2030

now I feel old

Narishma
0 replies
2h15m

Why the 386? You could do preemptive multitasking on a 286 or even an 8088.

Tabular-Iceberg
1 replies
8h19m

Cooperative multitasking is in fashion now, so we might get a whole new generation to take over.

Or they’ll just cargo-cult everything and have no clue what’s happening under the hood.

taeric
0 replies
3h13m

To be fair, I doubt I fully understand what is going on fully under the hood. I usually bring this line up when folks lament that we don't have abstractions that fits the instruction set of the computer. Ignoring that many of the abstractions of the computer are orthogonal to isolated instructions, at this point. https://stackoverflow.com/questions/72423227/is-a-schedulabl... was a fun read as I was looking for something that would explore some of that for me.

kazinator
4 replies
1d2h

In the @let feature in Enhanced GNU Awk, for those @let blocks that are outside of a function, like in BEGIN or END blocks, I have the compiler allocate secret global variables. They are reused as much as possible between blocks.

  $ ./gawk --dump-variables 'BEGIN { @let (a, b, c = 1) { } }'
  $ cat awkvars.out
  $let0001: untyped variable
  $let0002: untyped variable
  $let0003: 1
  ARGC: 1
  ARGIND: 0
  ARGV: array, 1 elements
  BINMODE: 0
  [ .. snip many ]
https://www.kylheku.com/cgit/egawk/about/

1letterunixname
3 replies
19h9m

That website doesn't work from my ISP. Can't even ping it or nc -z 104.37.63.7 443.

Edit update: Your security infrastructure is broken because I don't know what that is and don't use Twitter. If you check the AS, it's Google Fiber. And I'd appreciate it if you wouldn't dox me.

wizzwizz4
0 replies
15h35m

That isn't your dox: if you can't even ping it, you wouldn't have been able to request a specific repository, and won't appear in that access log.

kazinator
0 replies
17h32m

Not sure why. Out of the 26 IP addresses that accessed the egawk repository in the last day or so, only one was banned. The client identified as Twitterbot, coming from 136.49.X.X.

kazinator
0 replies
9h35m

I made it clear it's something reporting itself as "Twitterbot". Unlikely to be your web browser, unless you went out of your way to impersonate Twitterbot. Still, I only quoted two octets out of the IPv4 Class B address, even though I feel that it would be fine to reveal the full address of a bot.

weinzierl
3 replies
1d2h

The article does not distinguish between recursive and nested subroutine calls.

I understand, why the provided example does not allow for recursion, but doesn't it also prevent a nested call to another subroutine?

If I remember correctly before FORTRAN90 we already had nested subroutine calls. How did that work?

EDIT: I think I get it. The hidden global variables are prefixed with the sub's name. This is pretty wasteful, but as long as a function does not call itself (even indirectly) we are good.

kps
0 replies
1d

On some systems, arguments were stored at the call site, typically immediately after the call instruction, where the callee could use the return address to find and skip over them. This made calls using constant arguments easy.

   CALL MUL
   2
   3
   BNE 6, FAIL

dbcurtis
0 replies
1d1h

Nor reentrant. In machines where the JSR that stores the return address at the jump target, usually interrupts are locked for 1 following instruction so that you have a chance to load the return address into the accumulator. Back in the day I worked with a RTOS on one of those beasties.

comex
0 replies
1d2h

It does allow nested calls, because each function has its own storage for the return address and local variables.

PaulHoule
3 replies
1d1h

For the kind of programs I write for AVR-8 it seems insane to use the calling conventions of C, so if I write assembly I can sometimes keep the inner loop variables in registers (big register file) and otherwise use the methods he describes. I like “coloring” functions in an app like that, if I know a red and a green function will never be active at once I can reuse locals/parameters for them.

thadt
2 replies
1d1h

Yes, C's stack usage can be unintuitive when working in a constrained environment, especially when accustomed to the affordances of desktop operating system. I once joined a project where several developers had spent a couple weeks trying to track really hard to pin down bugs in several subsystems they were developing in a microcontroller codebase. They would move things around and the bugs would move. After tracing things a bit and setting a trap, I found the places in the codebase where the call stack got too deep and was scribbling over other data structures.

PaulHoule
0 replies
23h40m

Recursion is a bad thing in embedded systems and generally overrated in CS pedagogy.

It can be useful in analyzing algorithms but often in cases like search it is a "second best" answer compared to say, using nondeterminism.

JonChesterfield
0 replies
1d

Putting a `out-of-stack? branch` check in every function prolog made an embedded chip programmed in C much nicer to work with. Cost a few instructions which was contentious but definitely a development aid.

monocasa
2 replies
22h21m

As I recall, some processors stored the return address at the word before the first instruction of the subroutine.

Yep, that's what the PDP-8 did. The evolution of the PDP-8 is arguably a journey in hardware support for recursion.

Initially the JMS instruction stuck the return address in the first word of the function (as an aside, a lot of time caller would put it's arguments after the JMS instruction, and the callee would read arguments offset of the return instruction, incrementing it with each read argument until the return address pointed to code again).

Then it became relatively common to use one of the autoincrement locations (the PDP-8 had 8 memory locations that would increment any time you used them as a pointer) to create a simple stack, and function prologues/epilogues manually managed this stack to allow full recursion.

Then later on hardware stacks were added in the microprocessor implementations like the Harris 6120 to make this more performant.

wglb
0 replies
17h54m

As did the IBM 1800 and IBM 1130 and many machines of that era. Machines such as the Xerox Sigma series had enough registers to avoid this practice.

drfuchs
0 replies
16h25m

Way back in 1956, Librascope's LGP-30 had an "R" instruction ("store Return address") that stored (already-incremented)PC+1 into the address portion of the instruction at the destination, which was by convention an unconditional branch just in front of the beginning of a subroutine. You'd follow the "R" instruction with a "U" (unconditional branch) to that subroutine. The subroutine would return by branching to the address just in front of it, which conveniently was an unconditional branch back to just after the call location. So, recursion was out of the question unless you used some more advanced calling convention. (And all opcodes in assembly language were a single letter.)

le-mark
2 replies
23h21m

Cobol upto and including the 85 Standard was stackless and heapless which is quite a cognitive divide. For those going to and coming from that language.

I didn’t read Raymond say anything about security in the present article, but the advantages a clear. No stack or buffer overflow vulnerability for example.

tedunangst
0 replies
22h9m

Just fixed size static buffers guaranteed to overflow.

froh
0 replies
12h32m

I remember having to preallocate an array and do manual stack management there. the old hands told me to 'simply' use a disk if I needed the stack (or heap) to be larger. with the mainframe VM these could then be virtual in the 1980s/1990s, 'for performance'

stackless Cobol really did fun things to brains on long term exposure.

robocat
0 replies
1d2h

Definitely "a tale of intrigue, betrayal, and advanced programming-language semantics"

dang
0 replies
20h44m

Related:

How recursion got into programming: intrigue, betrayal, and advanced semantics - https://news.ycombinator.com/item?id=33123916 - Oct 2022 (8 comments)

How Recursion Got into Programming (2014) - https://news.ycombinator.com/item?id=23061881 - May 2020 (47 comments)

How recursion got into Algol 60: a comedy of errors - https://news.ycombinator.com/item?id=10131664 - Aug 2015 (124 comments)

How recursion got into programming: a comedy of errors - https://news.ycombinator.com/item?id=8073361 - July 2014 (108 comments)

JoeAltmaier
2 replies
1d

The old IMB 360 use a 'display' where you just chose a register by convention, pointed it at some memory then did your own linked list on call and return.

In fact call was really jump-and-link(?) where you got the return address in a register. Anticipating that if your subroutine took more than a little code you'd just store it temporarily in your allocated 'display'.

No push/pop at all! Unless you wanted to write it that way. You were just on your own.

wglb
1 replies
17h41m

In fact call was really jump-and-link(?)

Yes, technically "BALR" for Branch and Link Register. (I knew a guy who had been a 360 assembly language programmer who called his consulting firm BALR consulting, referring to that instruction.)

Interestingly, Gene Amdahl was asked why the 360 architecture didn't have a stack. "Too expensive" he said. I found this amusing at the time you could buy an 8085 for $5 retail quantity one. Perhaps he meant culturally expensive?

JoeAltmaier
0 replies
5h13m

The ol 360 used core memory if I recall. A stack would likely have meant, another whole cabinet on your computer room floor!

Ericson2314
2 replies
1d

Whenever Raymond Chen retires, I am going to be really sad.

abhinavk
1 replies
1d

Why? I feel he will able to write more.

Ericson2314
0 replies
1d

I mean when he stops writing. Fair enough if he does it for fun afterwards.

tombert
1 replies
20h43m

I've been doing functional programming so long that I genuinely have a hard time thinking about how I would write things without recursion.

Like, I technically know how to convert a recursive algorithm to an iterative one, I've done it before on more resource-constrained stuff, but I don't like it. I think the recursive stuff is generally prettier and for 99% of things it's fast enough (and 100% if your compiler supports tail recursion, though you'd still be stuck maintaining a stack for most of the more interesting stuff).

Occasionally I'll do things to force myself to learn how things were done before I was born. I've been on/off hacking on a Commodore 64 game, but man I feel pretty grateful to be as spoiled as I am with fast, cheap, easy-to-use hardware.

wglb
0 replies
17h51m

Yes, instruction sets these days are significantly more useful.

To do recursion in those old machines, one would need to build your own stack mechanism, but there would still be issues to take care of, as there was no native way to use anything but global storage.

Having lived through those times, I don't wish them on anyone.

pmontra
1 replies
10h24m

If my memory is correct "no stack" was the way I was writing BASIC programs on my ZX81.

   1 GOTO 30
  10 LET C = A + B
  20 RETURN
  30 LET A = 1
  40 LET B = 2
  50 GOSUB 10
  60 LET A = C
  70 LET B = 3
  80 GOSUB 10
  90 PRINT C

  RUN
  6
I was doing the job of the compiler in the article. Line numbers are memory addresses and the hidden variables are not hidden to me, because I'm the compiler. The only thing the interpreter made for me is storing the return address of GOSUB.

Disclaimer 1: the code could be syntactically wrong and hallucinated (40 years are a long time) but it gives the general idea.

Disclaimer 2: The Z80 processor inside the machine had stack management, the BASIC interpreter was really very basic but it could be excused: it had 1 kB RAM and a 8 kB ROM with the OS, the interpreter and everything.

dspillett
0 replies
7h25m

That is using a stack, at least a call stack: GOSUB will store a line number (or other reference) for RETURN to refer to, and if you nest GOSUB calls you have to remember multiple return points which requires some form of stack.

Though some BASICs didn't have a generic stack, instead a fixed array of return pointers and an index to the current one, so there was a fixed depth limit of, say, 7 calls, but from your PoV as the programmer this behaves as a call stack. Obviously this isn't a “proper” stack with local variables/parameters & such which you might expect when someone refers to a stack.

An interesting demo of what happens with nested (including recursive) calls that you could do with BBC BASIC in its native environment was that you could set the location of the stack to the top of display memory, and make sure you didn't do anything that would cause things to be drawn there of course, then you can watch the stack grow as work happens. The display was low-res enough that the pair of bytes for a return address could be seen as (in screen mode 1 or 5) eight chunky pixels (four in mode 2, but they would include flashing colours which is less ideal, sixteen in mode 0, 3, 4, or 6, but seeing individual bits doesn't work as well because a sequence of eight colours repeating amongst others is slightly easier to pick out then a sequence of 16 black/white ones).

gumby
1 replies
19h47m

This post happens to describe the world of the famous “Goto considered harmful” paper.

Most people just know the title and simply fetishistically refuse to use a goto (most of the time they are terrible, but not always). But really the paper argues for giving subroutines a single entry point.

Sometimes I write assembly code that falls through into another subroutine, but I don’t want all my assembly to be spaghetti like that.

wglb
0 replies
17h57m

While those are of the same era, the goto issue was its own whole issue, generally referring to goto statements in a give block of code, and not talking about subroutines.

While this, as you note, often led to mass refusal to use a goto, the effect of the paper led to much discussion and presaged the practice of much better control flow constructs in languages.

guenthert
1 replies
9h31m

"First, the compiler defined a secret global variable for each inbound function parameter, plus another secret global variable for each function to hold the return address"

I believe 'hidden' is the word he's looking for.

And one doesn't have to go all that far into history to find architectures w/o hardware support for a return stack, see e.g. Parallax Propeller.

And the problem with static variables (hidden or not) is not only that they prevent recursion, but also reentrancy.

I think that article could have benefited from a peer review from one of his colleagues at Microsoft.

mauricioc
0 replies
8h59m

Raymond Chen has been writing his blog for more than 20 years now, almost 7000 posts. It is a treasure trove of information and first-hand account. I can see an argument for peer-reviewing every post, but I think it would have limited (a lot) the amount of "from the trenches" knowledge we get from him.

Your reference to the Parallax Propeller is domain-specific and a bit anachronic; you're talking about embedded computing, while he is talking about general-purpose computing a few decades prior. The point of the post is that general-purpose computing was different in the past (in a sense, that's the theme for the whole blog! that and compatibility hacks), so going this far back is necessary.

greesil
1 replies
23h13m

It's not ancient if you still do embedded development

dnedic
0 replies
21h35m

It really depends on what you classify under embedded. Everything more expensive than a Padauk or newer than PIC will have hardware support for the stack, and also most likely have the heap provided by newlib.

dhosek
1 replies
1d1h

My first assembly was 6502 which kept a call stack in page 3 (128 levels deep being the maximum call depth as a consequence). When I learned 370 assembly, I remember being shocked to discover that an application was responsible for maintaining its own call stack.

dhosek
0 replies
21h45m

(D’oh, not page 3, page one. Page 3 on the Apple ][ was reserved for & handlers and, IIRC page 2 was the input buffer.

derefr
1 replies
16h41m

Funny enough, I was forced to program in exactly this way, when I was first learning to program. But not in the 1970s... in the year 2001!

Why? Because my first exposure to programming was the semi-graphical scripting "language" exposed by the game-development tool "RPG Maker 2000."

For those who haven't seen RM2K scripting before: picture a cross between Scratch and Emacs Paredit mode. (E.g. https://forums.rpgmakerweb.com/data/attachments/21/21958-f89...) It's presented as textual, but you can't edit it like text — only as blocks with associated properties dialogs.

And, of course, that scripting language in RPG Maker doesn't have anything so fancy as a stack.

Want some reusable subroutines? Well, you better believe you're allocating secret global variables for their parameters — no re-entrancy for you!

---

Mind you, thinking back on it, it's probably possible to implement both registers and a runtime stack in RPG Maker 2000, given sufficient stubbornness.

Both features seem easy enough at first: you can do pseudo-"registers" like the zero page on a 6502; and you can do a stack through indirect variable access (https://rpgmaker.net/tutorials/523/).

The problem with both of these, though, is that RM2K actually has concurrency in the form of "parallel process" scripts — so any use of either of these abstractions by these parallel processes, will have different "threads" stomping all over one-another's state.

So you'd actually need multiple "zero pages" and "stacks" — one for each "virtual core" — and then you'd need to somehow assign/bind/schedule the "virtual cores" to parallel scripts (i.e. somehow get each script its own privately-known "stack pointer.") Which, to be stable in the face of race conditions, would normally require something like mutexes...

Knowing the bloody-mindedness of RPG Maker gamedevs, I'm sure someone did come up with a way to trick some runtime feature into acting like a mutex. But I'm genuinely scared to know what it was they did.

63stack
0 replies
8h27m

I started with rpgmaker as well, and your comment made me so nostalgic. I remember downloading a game from rpgmaker.net that had a "custom battle system" implemented in it. They replaced the entire built in battle system with a custom implementation that used techniques like you described. I was absolutely blown away when I opened it in the editor to see how it works. It was hundreds and hundreds of "variables" (which if I remember correctly only allowed i64), and hundreds and hundreds of "switches" (which was booleans). I had no concept of stacks and heaps or function calls at that point.

I can't imagine the amount of energy it took to pull that off and maintain/debug it.

not2b
0 replies
1d

When I was first starting out a very long time ago, I wrote a lot of Fortran (mainly for digital signal processing applications), and a couple of times I had to work around the lack of recursion by building my own stacks for intermediate results, with arrays and a position to keep track of where I was. It was ugly.

mirkodrummer
0 replies
21h37m

Very unfortunate that the mobile layout of the devblog has so many fixed elements on the side, they are distracting and can’t navigate the page without feeling some discomfort

mikewarot
0 replies
14h0m

I was looking through the ROM listing on a really old diskette controller for an S-100 computer, and I saw all those jumps... and didn't understand what was going on. Then a friend told me there wasn't any guarantee of RAM in any given address, so they used the BX register (if I recall correctly) for the return address.

matja
0 replies
6h56m

This would be transformed by the compiler into something like this:

Even modern C compilers for "modern" 8051-derived MCUs work like this :)

mark-r
0 replies
13h49m

My first exposure to assembly language was one of those computers that used self-modifying code for subroutine calls, the Control Data Cyber 6400. It had a special instruction (I think it was called RJ) that replaced the first word of the subroutine with a jump instruction back to the caller. Every function had to start with a special no-op so that it could be replaced, and you returned from a function by jumping back to the beginning.

justinlloyd
0 replies
16h18m

A couple of personal experiences from the ancient world:

I wrote code for a microprocessor that did not have a stack as an actual concept. There are still embedded processors out there today like this. To work around this you stored a value at a zero page location, and when you wanted to jump to a subroutine, you would first load up this value from zero-page, advance it, put the program counter into the memory address now pointing to a new memory location, then execute a simple go to, and at the end of your function, to return, you would load up that value in zero page, then load up your return address, decrement your pointer, store it back to zero page, then go to back to the calling function. Every value you wanted to pass to the subroutine would be either in one of your three registers, or you would pass those values by storing them in memory pointed to by your zero page value. The 6502 offers nice instructions to do these very operations, by turning the assembly/machine code into microcode that does the exact same thing, but more succinctly.

Another trick I used was bank weaving. You only had a limited amount of addressable memory, but you could bank switch ROMs, so you'd write your code to have it reside at a fixed memory location, and another chunk of code at another fixed memory location in a different ROM bank, then your first code would execute down to a point where it would switch banks based on a condition, and when the other ROM bank switched in, your alternative code path would be waiting for you, you'd execute that, then switch the ROM bank again back to the first one, and the PC (program counter) will have advanced several hundred bytes possibly, so you'd return to a point in your code, back in the first ROM bank, that was a completely different point in memory, but still the same calling function.

A few years later I used a similar technique on a large text adventure game, a compiler and a word processor, where the core of the program was always resident, but chunks of code could be loaded in at fixed memory locations from disk. So if you ran a spell check, the spell check functions would load in at a known memory location, various bits were set in RAM to indicate which functions were resident, and the application could perform the spell check, or run the logic for a part of the text adventure game. And functions would automatically unload to make room for new functions based on the last time they were invoked.

I wrote code for a Timex processor that had a "execute the instruction at the following address" opcode, so effectively your instruction here, would execute that instruction over there, but only for one instruction. It made writing the equivalent of switch/case in assembly interesting, and also good for self-modifying code.

Zero-page on some old CPUs was a faster RAM, a few dozen bytes, or even 256 bytes, so if you had a tight loop, you'd copy the code into zero page, perhaps having to move other stuff out of the way first, and execute it there.

I wrote a word processor that stored only a tiny fraction of its data in under 3KB of RAM, the rest was stored on big floppy disks. As you scrolled through the text, it would page through the pages (memory pages, not pages of text) on the floppy disc, but the memory pages were not stored continuously, either in RAM or on disk. The RAM acted more like a cache, and a couple of memory pages were reserved for edit operations and even copy & paste. To copy and paste entire pages was fast, you only had to adjust a few pointers in RAM and on disk, plus a few hundred bytes for head and tail page operations so moving large blocks of text around was very fast, and inserting large blocks of text, or even just typing, was very fast, because the buffers were quite small. It was the only word processor I know of that came with a disk defrag operation built in, but the company called it "housekeeping" in the manual with no explanation to the end user of what it was doing. I learned a lot about "don't mark that as deleted until you are damn sure the operation is completed and you've verified it worked."

I did a 6507 and Z80 emulator on the MIPS R3000 that ran the code in a very pedestrian way, but as the 6507 or Z80 ran through the code, each memory address was added to a list, and the 6507/Z80 opcode found there was translated into an intermediate assembly language, and then I post-processed that intermediate assembly into R3000, which gave me a huge performance boost; post-process dynamic recompilation effectively. I had to do other sneaky things with it too because the original hardware raced the electron beam whereas the target platform just had a big VRAM buffer. Used the same trick to port 6507 code to an early ARM processor for an old handheld too.

There's a lot of other tricks we used to in the before-times, such as LFSR and LCG to permute game objects in seemingly random patterns, cheating on distance checks by counting the clock cycles between two objects drawn on screen, low-rez bitmaps of the screen to speed up collision detection, compiled graphics/sprites, even sprite blitting routines that were hard-coded to specific pixel offsets.

ginko
0 replies
1d2h

If you write a GPU shader in most graphics APIs today you will also not have a stack or heap.

elvis70
0 replies
1d1h

And to think that's where we come from. The Intel 8008, the ancestor of the x86 processor family, just has a 7-level call hardware stack only used by the unconditional and conditional call and return instructions. So, there is no push and pop instructions. As the only way to access a byte in memory was to store the address in the HL register pair (no absolute addressing mode) before the load or store instruction and as there was no way to disable hardware interrupts, it was almost impossible to write useful interrupt routines without some external hardware trick.

deanCommie
0 replies
22h1m

Another commenter said: "I really liked the Art of Computer Programming with regards to this subject. While seemingly obsolete, there are a ton of pre-heap / pre-stack algorithms for dynamically changing arrays or other data structures."

I wonder what programming techniques exist today that will be obsolete in 30 years? I imagine Transformers made a bunch of early ML algorithms completely obsolete?

What else?

caycep
0 replies
1d

the headline reminded me of "ancient computers" in sci fi games, which was essentially a tower of Hanoi problem inmplemented in-game

bsder
0 replies
22h16m

One of the things that people miss is that a "stack" was not just for recursion.

A "stack" was also a small memory, single thread optimization because it allows you to multiplex memory for function calls. If two functions do not call each other, they can share the memory used for their activation records.

Keeping track of this memory multiplexing in the compilers of the day would have been very hard and highly memory intensive--a stack solves that problem.

Without a stack, your memory usage (static) goes as O(n) with the number of functions. With a stack, your memory usage (dynamic) goes as O(n) with the depth of function calls. The memory usage of these two scenarios is wildly different.

benreesman
0 replies
1d

Raymond Chen has to be one of the few OG legends where the title of the post alone is just like: “Today is Old New Thing day, and that’s just going to be the coolest thing I learn today.” I started salivating before my eyes scanned so far as the domain name.

And as usual, worth reading at least one or two more times.

Fucking. Legend.

01HNNWZ0MV43FF
0 replies
1d3h

This was posted yesterday