HN comments for: Why the CORDIC algorithm lives rent-free in my head

blowski

14 replies

1d2h

2024-05-11 16:19:54 UTC

“Lives rent-free in my head” is a horrible cliche. By now, it has the same effect as saying “it’s nice” but uses lots more letters.

wodenokoto

6 replies

1d1h

2024-05-11 17:19:57 UTC

I never liked that phrase either. So what exactly lives in your head and pays rent?

pynappo

3 replies

22h2m

2024-05-11 20:25:55 UTC

the "rent" in the saying is the effort to learn and upkeep a certain piece of knowledge in your head, to the point where you can recall it as needed

living rent free generally means that you are able to easily recall a piece of knowledge (and often do, even if you don't want to).

something that pays rent might be like an important concept for an upcoming school test, for a class you're not interested in.

strken

2 replies

19h44m

2024-05-11 22:43:24 UTC

I always thought rent was utility, and a piece of knowledge or an opinion that didn't pay rent was one that provided little or no practical benefit to you.

The term in its original use described an obsession with a particular group or thing and called it out as pointless. I don't think it would make sense for rent to mean effort in this original sense, nor in the broader way it's used here.

junon

1 replies

6h45m

2024-05-12 11:42:49 UTC

I have no idea what pseudo-intellectual discourse this is trying to be but the phrase "X lives rent free in my head" means only that "I think of X a lot because I enjoy thinking about it". Nothing more.

strken

0 replies

5h16m

2024-05-12 13:11:50 UTC

Go read https://knowyourmeme.com/memes/rent-free and learn the origins and predominant usage of the phrase.

In addition, it literally says "rent free". There's nothing intellectual or pseudo-intellectual about reading the phrase "rent free" and concluding that the information does not pay anything to live in your head. This is basic reading comprehension.

sublinear

0 replies

1d1h

2024-05-11 17:23:46 UTC

Knowledge work

__MatrixMan__

0 replies

22h12m

2024-05-11 20:15:20 UTC

Kubernetes, which I would evict if it stopped paying rent.

alexjplant

1 replies

2024-05-11 17:52:26 UTC

“Lives rent-free in my head” is a horrible cliche.

Yes it is, much like "dumpster fire" or "toxic" or "gaslighting" or any other contemporary Reddit-isms that are associated with hyper-reductionist cultural hot takes. My personal distaste for them, however, has no bearing on the fact that they have very real meaning and people enjoy using them.

By now, it has the same effect as saying “it’s nice” but uses lots more letters.

The commonly-held meaning of this is that it describes a pervasive, intrusive thought, not a nice one. This particular turn of phrase is often used to describe how politicians devote energy to talking about other politicians that they don't like, for instance.

kevindamm

0 replies

13h45m

2024-05-12 04:42:55 UTC

"Lives rent free in my head" lives rent free in my head.

Hey it's a quine, too!

B1FF_PSUVM

1 replies

2024-05-11 17:58:30 UTC

I take it as short for the "why are you still carrying her?" parable https://medium.com/@soninilucas/two-monks-and-a-woman-zen-st... - something you should have let go and forgotten.

Doesn't seem to be what the author meant in this case.

magicalhippo

0 replies

23h24m

2024-05-11 19:03:34 UTC

Having never heard the expression before, I thought it meant it wouldn't be forgotten, like learning to ride a bicycle kinda thing.

rnmmrnm

0 replies

1d1h

2024-05-11 17:10:02 UTC

I didn't finish yet but I was thinking he's talking about memorizing the algorithm to calculate sines in your head.

junon

0 replies

1d1h

2024-05-11 17:11:59 UTC

And what does this pedantry solve, exactly? It clearly is something the author thinks often enough about and enjoys talking about.

Let people enjoy things.

beefnugs

0 replies

2024-05-11 17:59:07 UTC

I thought it was more often "negative" like you cant stop thinking about some dumbass getting disproportionate affect on the world

nico

13 replies

1d5h

2024-05-11 13:23:25 UTC

Cool algorithm. Might come in handy soon for running neural networks on less powerful hardware :)

KeplerBoy

10 replies

2024-05-11 17:33:44 UTC

Neural networks usually don't use trigonometry or complex numbers.

nico

9 replies

23h0m

2024-05-11 19:27:29 UTC

From the neural network Wikipedia page:

The signal each neuron outputs is calculated from this number, according to its activation function

The activation function is usually a sigmoid, which is also usually defined in terms of trigonometric functions

Which neural networks don’t use trigonometric functions or equivalent?

PeterisP

4 replies

20h7m

2024-05-11 22:20:19 UTC

In current neural networks the activation function usually is not sigmoid but something like ReLU ( y = 0 if x<0 else x ), and in any case the computation of activations is not meaningful part of total compute - for non-tiny networks, almost all the effort is spent on the the large matrix multiplications of the large layers before the activation function, and as the network size grows, they become even less relevant, as the amount of activation calculations grows linearly with layer size but the core computation grows superlinearly (n^2.8 perhaps?).

mcguire

3 replies

17h29m

2024-05-12 00:58:20 UTC

Really?

It's literally been decades, but the last I learned about neural networks, a non-linear activation function was important for Turing completeness.

kragen

0 replies

16h27m

2024-05-12 02:01:00 UTC

neural networks aren't turing complete (they're circuits, not state machines) and relu is not just nonlinear but in fact not even differentiable at zero

jszymborski

0 replies

14h48m

2024-05-12 03:39:56 UTC

ReLU are indeed non-linear, despite the confusing name.

The nonlinearity (plus at least two layers) are required to solve nonlinear problems (like the famous XOR example).

PeterisP

0 replies

10h31m

2024-05-12 07:56:25 UTC

Yes, some non-linearity is important - not for Turing completeness, but because without it the consecutive layers effectively implement a single linear transformation of the same size and you're just doing useless computation.

However, the "decision point" of the ReLU (and it's everywhere-differentiable friends like leaky ReLU or ELU) provides a sufficient non-linearity - in essence, just as a sigmoid effectively results in a yes/no chooser with some stuff in the middle for training purposes, so does the ReLU "elbow point".

Sigmoid has a problem of 'vanishing gradients' in deep networks, as the sigmoid gradients of 0 - 0.25 in standard backpropagation means that a 'far away' layer will have tiny, useless gradients if there's a hundred sigmoid layers in between.

KeplerBoy

2 replies

22h34m

2024-05-11 19:53:39 UTC

Most activations functions are based on exponential functions.

I don't immediately see how cordic would help you calculate e^(x) unless x is complex valued (which it is usually not).

adgjlsfhk1

1 replies

22h17m

2024-05-11 20:10:07 UTC

see section B of https://eprints.soton.ac.uk/267873/1/tcas1_cordic_review.pdf. Generalized cordic can compute hyperbolic trig functions which gives you exponentials. That said, it's still not useful for ML because it's pretty hard to beat polynomials and tables.

KeplerBoy

0 replies

22h1m

2024-05-11 20:26:06 UTC

Interesting, thanks.

Dylan16807

0 replies

10h9m

2024-05-12 08:18:42 UTC

You just need a vague approximation. Even CORDIC is overkill.

clamp(x/2, -1, 1) is basically a sigmoid.

SJC_Hacker

1 replies

2024-05-11 17:57:12 UTC

The primary advantage of CORDIC is that it saves space in the design.

Lookup tables with interpolation are much faster

nimish

0 replies

20h20m

2024-05-11 22:07:49 UTC

Polynomial interpolation for sin/cos is even better when you have access to an FPU: https://gist.github.com/publik-void/067f7f2fef32dbe5c27d6e21...

apitman

11 replies

22h52m

2024-05-11 19:35:13 UTC

While the author mentions this is mostly applicable to things like FPGAs, there's also an application in gamedev (or any distributed physics simulation). Floating point calculations are tricky to get deterministic across platforms[0]. One solution is to skip floats entirely and implement a fixed-point physics engine. You'd need something like CORDIC to implement all the trig functionality.

I started working on such a thing as a fun side project a few years ago but never finished it. One of these days I hope to get back to it.

[0]: https://randomascii.wordpress.com/2013/07/16/floating-point-...

teleforce

6 replies

21h29m

2024-05-11 20:58:42 UTC

The author did mention about fixed point was very popular for gamedev before floating point becoming popular due the increased in hardware capability, and most likely CORDIC was being used as well together with fixed point.

In fact, before IEEE 754 became the popular standard that it is today, fixed point was used all the time (go and ask any gamedev who worked on stuff between 1980 and 2000ish and they'll tell you all about it).

aappleby

4 replies

20h59m

2024-05-11 21:28:27 UTC

Was gamedev between 1980 and 2000ish, can confirm. PS1 had no floating point unit.

TuringTourist

3 replies

17h28m

2024-05-12 00:59:56 UTC

This was the cause of the signature jiggly textures that were pervasive in PS1 games

fayalalebrun

2 replies

13h0m

2024-05-12 05:27:59 UTC

This is a common misconception, but is not the case. For example, look at the Voodoo 1, 2, and 3, which also used fixed point numbers internally but did not suffer from this problem.

The real issue is that the PS1 has no subpixel precision. In other words, it will round a triangle coordinates to the nearest integers.

Likely the reason why they did this is because then you can completely avoid any division and multiplication hardware, with integer start and end coordinates line rasterization can be done completely with addition and comparisons.

Sharlin

1 replies

1h41m

2024-05-12 16:46:50 UTC

Didn’t PS1 also lack perspective corrected texture mapping? That would definitely make textures wobbly. AFAIK they compensated for it simply by using as finely subdivided geometry as possible (which wasn’t very finely, really).

sgtnoodle

0 replies

58m

2024-05-12 17:29:27 UTC

The folk that made Crash Bandicoot were pretty clever. They figured out that the PlayStation could render untextured, shaded triangles a lot faster than textured triangles, so they "textured" the main character with pixel-scale geometry. This in turn saved them enough memory to use a higher resolution frame buffer mode.

https://all-things-andy-gavin.com/2011/02/04/making-crash-ba...

apitman

0 replies

18h26m

2024-05-12 00:01:59 UTC

I believe that was mostly for performance reasons, not determinism, right?

boulos

3 replies

21h30m

2024-05-11 20:57:29 UTC

That blog post is now a decade old, but includes an important quote:

The IEEE standard does guarantee some things. It guarantees more than the floating-point-math-is-mystical crowd realizes, but less than some programmers might think.

Summarizing the blog post, it highlights a few things though some less clearly than I would like:

  * x87 was wonky

  * You need to ensure rounding modes, flush-to-zero, etc. are consistently set

  * Some older processors don't have FMA

  * Approximate instructions (mmsqrtps et al.) don't have a consistent spec

  * Compilers may reassociate expressions

For small routines and self-written libraries, it's straightforward, if painful to ensure you avoid all of that.

Briefly mentioned in the blog post is IEEE-754 (2008) made the spec more explicit, and effectively assumed the death of x87. It's 2024 now, so you can definitely avoid x87. Similarly, FMA is part of the IEEE-754 2008 spec, and has been built into all modern processors since (Haswell and later on Intel).

There are still cross-architecture differences like 8-wide AVX2 vs 4-wide NEON that can trip you up, but if you are writing assembly or intrinsics or just C that inspect with Compiler Explorer or objdump, you can look at the output and say "Yep, that'll be consistent".

Someone

2 replies

20h59m

2024-05-11 21:28:33 UTC

but if you are writing assembly or intrinsics or just C that inspect with Compiler Explorer or objdump, you can look at the output and say "Yep, that'll be consistent".

Surely people have written tooling for those checks for various CPUs?

Also, is it that ‘simple’? Reading https://github.com/llvm/llvm-project/issues/62479, calculations that the compiler does and that only end up in the executable as constants can make results different between architectures or compilers (possibly even compiler runs, if it runs multi-threaded and constant folding order depends on timing, but I find it hard to imagine how exactly that would happen).

So, you’d want to check constants in the code, too, but then, there’s no guarantee that compilers do the same constant folding. You can try to get more consistent by being really diligent in using constexpr, but that doesn’t guarantee that, either.

mjcohen

0 replies

20h5m

2024-05-11 22:22:41 UTC

Years ago, I was programming in Ada and ran across a case where the value of a constant in a program differed from the same constant being converted at runtime. Took a while to track that one down.

boulos

0 replies

20h21m

2024-05-11 22:07:03 UTC

The same reasoning applies though. The compiler is just another program. Outside of doing constant folding on things that are unspec'ed or not required (like mmsqrtps and most transcendentals), you should get consistent results even between architectures.

Of course the specific line linked to in that GH issue is showing that LLVM will attempt constant folding of various trig functions:

https://github.com/llvm/llvm-project/blob/faa43a80f70c639f40...

but the IEEE-754 spec does also recommend correctly rounded results for those (https://en.wikipedia.org/wiki/IEEE_754#Recommended_operation...).

The majority of code I'm talking about though uses constants that are some long, explicit number, and doesn't do any math on them that would then be amenable to constant folding itself.

That said, lines like:

https://github.com/llvm/llvm-project/blob/faa43a80f70c639f40...

are more worrying, since that may differ from what people expect dynamically (though the underlying stuff supports different denormal rules).

Either way, thanks for highlighting this! Clearly the answer is to just use LLVM/clang regardless of backend :).

1oooqooq

5 replies

1d5h

2024-05-11 12:47:15 UTC

so, it's a https://en.wikipedia.org/wiki/Farey_sequence (aka mediant, aka naive fraction sum for approximation) but for angles?

ykonstant

1 replies

1d5h

2024-05-11 13:25:51 UTC

It is not, because the subdivisions in CORDIC do not possess the best approximation properties of Farey fractions. For that, you would have to partition the circle into major/minor arcs instead, in the sense of Hardy-Littlewood's circle method. But that would be computationally very expensive, whereas this binary subdivision is fast, and is made faster using the shift-add tricks.

1oooqooq

0 replies

1d3h

2024-05-11 15:22:35 UTC

i see. thanks. assumed it was just bit shift tricks implementation. but indeed, no mediant.

kevindamm

1 replies

1d5h

2024-05-11 13:17:45 UTC

More like a binary search, but instead of adjusting an array index you're iterating an orthonormal matrix which represents rotation about the origin.

tapatio

0 replies

21h15m

2024-05-11 21:12:19 UTC

Or a second order step response.

nico

0 replies

1d5h

2024-05-11 13:21:15 UTC

That’s super cool

Thank you for that link, very interesting diagrams and structures

They look very similar to what a neural network might be in certain cases, and it seems like the structures are trying to map some sort of duality

the__alchemist

4 replies

21h20m

2024-05-11 21:07:16 UTC

Side note for 2023: Some modern MCUs are low cost, but have FPUs. A good example is the STM32G4. So, unlike on an M0 MCU or w/e, you can freely use f32, if you don't want fixed point. And you can get these for ~$1-2 / MCU.

That said... The G4 also has a hardware CORDIC peripheral that implements this algo for fixed-point uses. Is this mainly to avoid floating point precision losses? You program it using registers, but aren't implementing CORDIC yourself on the CPU; dedicated hardware inside the IC does it.

kragen

2 replies

16h28m

2024-05-12 01:59:13 UTC

the cheapest cortex-m4fs in stock on digi-key, trying to omit duplicates, are the 3-dollar nuvoton m481le8ae https://www.digikey.com/en/products/detail/nuvoton-technolog..., the 3-dollar maxim max32660 https://www.digikey.com/en/products/detail/analog-devices-in..., and the 5-dollar atmel atsamd51 https://www.digikey.com/en/products/detail/microchip-technol.... their cheapest stm32g4 is the stm32g441kbt6 which rounds to 4 dollars https://www.digikey.com/en/products/detail/microchip-technol...

where do you get them for under 2 dollars?

(digi-key does list the nuvoton chip for barely under 2 dollars in quantity 500)

the__alchemist

1 replies

15h54m

2024-05-12 02:33:55 UTC

https://jlcpcb.com/partdetail/Stmicroelectronics-STM32G431CB...

JLC: $2.1 for qty=1. $1.3 at 1k qty.

Loads of stock. It's a low flash and low RAM variant, but sufficient for some purposes. 170Mhz CPU with many onboard periphs.

kragen

0 replies

14h52m

2024-05-12 03:35:26 UTC

fantastic, thanks!

ddingus

0 replies

15h30m

2024-05-12 02:57:42 UTC

The second Parallax Propeller chip has a CORDIC engine in silicon. It's fast, and handles 64bit intermediate products, which makes the divide and trig stuff more than adequate precision for most things. One can always increase precision in software too.

I was late to the game learning about CORDIC. I had used fixed point a lot in 8 and 16 bit assembly land for performance, and determinism.

When I found out about it, I was stunned! It was fast, and only required basic math capability to be useful.

rmorey

3 replies

21h59m

2024-05-11 20:28:31 UTC

I remember in high school precalc, we learned about the Taylor series, and my teacher told us that it was how trig functions were actually implemented on calculators. Well I looked it up and found that it was actually CORDIC, and went and had some fun implementing it in TI Basic

rossant

2 replies

10h56m

2024-05-12 07:31:56 UTC

Well, are there _any_ calculators out there that use Taylor expansions?

toolslive

0 replies

27m

2024-05-12 18:00:48 UTC

well, from memory (this was > 20y ago)

  - Taylor gives you polynomial optimized for the evaluation of one point.
  - naive Newton gives you a polynomial with 0 error in the interpolation points [but the system has a bad condition]
  - Chebychev gives you a polynomial with minimal error over the interval of choice.

So there is no real reason to ever use the Taylor series to approximate a function over an interval. The high school math teachers are lying.

snovv_crash

0 replies

7h55m

2024-05-12 10:32:17 UTC

I've done some work with fast approximate math libraries,and went down this path. Taylor was a dead-end, but Chebyshev polynomials work great. They have the nice property of having (close to) the minimum maximum (minimax) error over the entire range you are approximating.

aftbit

2 replies

1d2h

2024-05-11 16:00:43 UTC

Now, it's fairly obvious that rotating by say 22.75˚ is the same as rotating by 45˚ and then -22.5˚ - i.e. we can break up a rotation into smaller parts, with both positive and negative components.

Wouldn't that be rotating by 22.5°? Is this an error in the article or my understanding?

AlexSW

1 replies

1d2h

2024-05-11 16:16:15 UTC

An error in the article.

aftbit

0 replies

21h20m

2024-05-11 21:07:22 UTC

I made a PR

https://github.com/francisrstokes/githublog/pull/7

teleforce

1 replies

20h26m

2024-05-11 22:01:50 UTC

Fun facts, in addition to sine and cosine calculation and generation, CORDIC can also be used for many operations including log, exponent, square root, vector magnitude, polar-cartesian conversion and vector rotation. Spoiler alerts, the authors provided the teaser regarding these propects in the conclusions.

I've got the feeling that by using quaternion instead of conventional orthonormal matrices, CORDIC based operations can be executed more efficiently (less compute cycles and memory) and effectively (less errors) [1].

[1] CORDIC Framework for Quaternion-based Joint Angle Computation to Classify Arm Movements:

https://core.ac.uk/works/8439118

nimish

0 replies

20h23m

2024-05-11 22:04:34 UTC

It can be extended to arbitrary Lie groups IIRC

mchinen

1 replies

2h55m

2024-05-12 15:32:36 UTC

Nice to read this.

Curious how CORDIC performs against cubic or other polynomial interpolation with a small table. I was taught that resource constrained synthesizers sometimes used cubic interpolation, and this was presumably when CORDIC was relatively new.

At a glance, the single bit of precision you get with each iteration of CORDIC means it would be more expensive in compute but cheaper in space than a polynomial.

But on the space note I feel we should highlight that it can be cheaper than the article might suggest with the 4096 entry lookup tables for sin(x) - Only a quarter of the full circle is needed due to symmetry.

Sharlin

0 replies

1h38m

2024-05-12 16:49:54 UTC

Back in the day gamedevs (and demosceners) used a merely 256-entry LUT for sin and cos – it was convenient to have byte-sized angles that auto-wraparound, and for rotation in 2D games, 2^8 was quite enough. Wouldn’t get you very far in 3D though if you want smooth motion.

larsrc

1 replies

10h20m

2024-05-12 08:07:46 UTC

I would love to see how it compares to regular software and hardware trigonometric implementations on various kinds of hardware through the ages.

teleforce

0 replies

8h54m

2024-05-12 09:34:04 UTC

Thanks for the links.

It is a very strange that for a very pervasive and hugely popular computer techniques, CORDIC do not get a proper detailed treatment and coverage in books. Since IoT and machine-to-machine communication are taking off, and with the efficiency of CORDIC implementation and operation, it usages will most probably increase exponentially, hence good references are needed for correct and optimized implementation.

The two anomalies are books by Prof. Omondi, and Prof. Deschamps [1],[2].

[1]Computer-Hardware Evaluation of Mathematical Functions:

https://www.worldscientific.com/worldscibooks/10.1142/p1054

[2]Guide to FPGA Implementation of Arithmetic Functions:

http://www.arithmetic-circuits.org/guide2fpga/vhdl_codes.htm

brcmthrowaway

1 replies

22h57m

2024-05-11 19:30:05 UTC

This used to be all the rage when folks were into DSP

tapatio

0 replies

21h15m

2024-05-11 21:12:52 UTC

cos + jsin baby.

sunpazed

0 replies

1d4h

2024-05-11 13:30:35 UTC

I have a few vintage programmable HP calculators that implement CORDIC on their 4-bit CPUs. You can also program them to calculate the Taylor expansion of sin() as another approximate method.

If you liked this, it’s worth reading Donald Knuth seminal “The Art of Computer Programming” which explains a number of mathematical algorithms by example.

srean

0 replies

6h59m

2024-05-12 11:28:16 UTC

I got reminded of a rather cute snippet of code I had a part in.

One needed to find the coordinates of the bisector of an angle subtended by an arc of a unit circle. They (x,y) coordinates of the 'arms' were available. The existing implementation was a mass of trigonometry - going from the x,y coordinates to polar (r,θ), then making sure that the θ computed was in the correct quadrant, halving the θ and converting back to x,y coordinates. All in all, many calls to trigonometric functions and inverse functions.

This was in Python and thus we had first class access to complex numbers. So all that was needed was to define two complex numbers. z1 from (x1,y1) z2 from (x2,y2) and then take the geometric mean of the product: √(z1*z2). Done.

No explicit trigonometry and no explicit conversion and back in the new code.

pintor-viejo

0 replies

3h56m

2024-05-12 14:31:10 UTC

The first place I personally saw CORDIC mentioned was:

"How Many Ways Can You Draw a Circle?", the first *Jim Blinn's Corner* article:

https://ieeexplore.ieee.org/document/4057251

He found 15 ways, which generalized to methods like CORDIC, etc.

A couple years ago I was wondering if SDF shape representation could use something like a taxi cab distance metric to move floating point operations to integer, although I haven't spent too much time investigating it.

pintor-viejo

0 replies

19h40m

2024-05-11 22:47:17 UTC

Meagher's octree system notably used only integer arithmetic with no integer multiplication/division:

Efficient (linear time) algorithms have been developed for the Boolean operations (union, intersection and difference), geometric operations (translation, scaling and rotation), N-dimensional interference detection, and display from any point in space with hidden surfaces removed. The algorithms require neither floating-point operations, integer multiplications, nor integer divisions.

https://doi.org/10.1016/0146-664X(82)90104-6

This facilitated creation of fast, custom VLSI graphics acceleration hardware for octree representation.

kazinator

0 replies

12h54m

2024-05-12 05:33:39 UTC

If, having it in your head, you could use it to calculate trig functions in your head, that would be a way of it paying rent.

So, yeah.

femto

0 replies

7h1m

2024-05-12 11:26:55 UTC

sin and cos are often used to rotate vectors. In these cases, a trick with a CORDIC is to avoid the traditional sin/cos/multiply calculation by using the vector to be rotated as the input to the CORDIC. The CORDIC will directly produce the rotated vector, without having to compute sin/cos or do a complex multiplication.

CORDIC really shines when latency doesn't matter so much. You can pipeline every stage of the computation and get a huge throughput. It's well suited for digital mixing in radio systems.

azubinski

0 replies

7h42m

2024-05-12 10:45:49 UTC

Most important keywords are:

BLDC + motor control + sensorless

CORDIC is a de facto standard algorithm in the relevant application area.

Infineon explains algorithms compactly and clearly:

https://www.infineon.com/dgdl/AP1610510_CORDIC_Algorithm.pdf

Microchip gives an idea of where and what the algorithm is used for:

https://ww1.microchip.com/downloads/aemDocuments/documents/O...

CORDIC hardware accelerators are found in many microcontrollers designed to control electric motors.

Waterluvian

0 replies

1h11m

2024-05-12 17:16:49 UTC

I think the love for this algorithm feels similar to my love for Raycasting. You get a 3D game on barely any hardware with no trigonometry. Like… how?! It makes complete sense once I implemented it, but it still feels like wizardry.