return to table of content

Why the CORDIC algorithm lives rent-free in my head

blowski
14 replies
1d2h

“Lives rent-free in my head” is a horrible cliche. By now, it has the same effect as saying “it’s nice” but uses lots more letters.

wodenokoto
6 replies
1d1h

I never liked that phrase either. So what exactly lives in your head and pays rent?

pynappo
3 replies
22h2m

the "rent" in the saying is the effort to learn and upkeep a certain piece of knowledge in your head, to the point where you can recall it as needed

living rent free generally means that you are able to easily recall a piece of knowledge (and often do, even if you don't want to).

something that pays rent might be like an important concept for an upcoming school test, for a class you're not interested in.

strken
2 replies
19h44m

I always thought rent was utility, and a piece of knowledge or an opinion that didn't pay rent was one that provided little or no practical benefit to you.

The term in its original use described an obsession with a particular group or thing and called it out as pointless. I don't think it would make sense for rent to mean effort in this original sense, nor in the broader way it's used here.

junon
1 replies
6h45m

I have no idea what pseudo-intellectual discourse this is trying to be but the phrase "X lives rent free in my head" means only that "I think of X a lot because I enjoy thinking about it". Nothing more.

strken
0 replies
5h16m

Go read https://knowyourmeme.com/memes/rent-free and learn the origins and predominant usage of the phrase.

In addition, it literally says "rent free". There's nothing intellectual or pseudo-intellectual about reading the phrase "rent free" and concluding that the information does not pay anything to live in your head. This is basic reading comprehension.

sublinear
0 replies
1d1h

Knowledge work

__MatrixMan__
0 replies
22h12m

Kubernetes, which I would evict if it stopped paying rent.

alexjplant
1 replies
1d

“Lives rent-free in my head” is a horrible cliche.

Yes it is, much like "dumpster fire" or "toxic" or "gaslighting" or any other contemporary Reddit-isms that are associated with hyper-reductionist cultural hot takes. My personal distaste for them, however, has no bearing on the fact that they have very real meaning and people enjoy using them.

By now, it has the same effect as saying “it’s nice” but uses lots more letters.

The commonly-held meaning of this is that it describes a pervasive, intrusive thought, not a nice one. This particular turn of phrase is often used to describe how politicians devote energy to talking about other politicians that they don't like, for instance.

kevindamm
0 replies
13h45m

"Lives rent free in my head" lives rent free in my head.

Hey it's a quine, too!

magicalhippo
0 replies
23h24m

Having never heard the expression before, I thought it meant it wouldn't be forgotten, like learning to ride a bicycle kinda thing.

rnmmrnm
0 replies
1d1h

I didn't finish yet but I was thinking he's talking about memorizing the algorithm to calculate sines in your head.

junon
0 replies
1d1h

And what does this pedantry solve, exactly? It clearly is something the author thinks often enough about and enjoys talking about.

Let people enjoy things.

beefnugs
0 replies
1d

I thought it was more often "negative" like you cant stop thinking about some dumbass getting disproportionate affect on the world

nico
13 replies
1d5h

Cool algorithm. Might come in handy soon for running neural networks on less powerful hardware :)

KeplerBoy
10 replies
1d

Neural networks usually don't use trigonometry or complex numbers.

nico
9 replies
23h0m

From the neural network Wikipedia page:

The signal each neuron outputs is calculated from this number, according to its activation function

The activation function is usually a sigmoid, which is also usually defined in terms of trigonometric functions

Which neural networks don’t use trigonometric functions or equivalent?

PeterisP
4 replies
20h7m

In current neural networks the activation function usually is not sigmoid but something like ReLU ( y = 0 if x<0 else x ), and in any case the computation of activations is not meaningful part of total compute - for non-tiny networks, almost all the effort is spent on the the large matrix multiplications of the large layers before the activation function, and as the network size grows, they become even less relevant, as the amount of activation calculations grows linearly with layer size but the core computation grows superlinearly (n^2.8 perhaps?).

mcguire
3 replies
17h29m

Really?

It's literally been decades, but the last I learned about neural networks, a non-linear activation function was important for Turing completeness.

kragen
0 replies
16h27m

neural networks aren't turing complete (they're circuits, not state machines) and relu is not just nonlinear but in fact not even differentiable at zero

jszymborski
0 replies
14h48m

ReLU are indeed non-linear, despite the confusing name.

The nonlinearity (plus at least two layers) are required to solve nonlinear problems (like the famous XOR example).

PeterisP
0 replies
10h31m

Yes, some non-linearity is important - not for Turing completeness, but because without it the consecutive layers effectively implement a single linear transformation of the same size and you're just doing useless computation.

However, the "decision point" of the ReLU (and it's everywhere-differentiable friends like leaky ReLU or ELU) provides a sufficient non-linearity - in essence, just as a sigmoid effectively results in a yes/no chooser with some stuff in the middle for training purposes, so does the ReLU "elbow point".

Sigmoid has a problem of 'vanishing gradients' in deep networks, as the sigmoid gradients of 0 - 0.25 in standard backpropagation means that a 'far away' layer will have tiny, useless gradients if there's a hundred sigmoid layers in between.

KeplerBoy
2 replies
22h34m

Most activations functions are based on exponential functions.

I don't immediately see how cordic would help you calculate e^(x) unless x is complex valued (which it is usually not).

adgjlsfhk1
1 replies
22h17m

see section B of https://eprints.soton.ac.uk/267873/1/tcas1_cordic_review.pdf. Generalized cordic can compute hyperbolic trig functions which gives you exponentials. That said, it's still not useful for ML because it's pretty hard to beat polynomials and tables.

KeplerBoy
0 replies
22h1m

Interesting, thanks.

Dylan16807
0 replies
10h9m

You just need a vague approximation. Even CORDIC is overkill.

clamp(x/2, -1, 1) is basically a sigmoid.

SJC_Hacker
1 replies
1d

The primary advantage of CORDIC is that it saves space in the design.

Lookup tables with interpolation are much faster

apitman
11 replies
22h52m

While the author mentions this is mostly applicable to things like FPGAs, there's also an application in gamedev (or any distributed physics simulation). Floating point calculations are tricky to get deterministic across platforms[0]. One solution is to skip floats entirely and implement a fixed-point physics engine. You'd need something like CORDIC to implement all the trig functionality.

I started working on such a thing as a fun side project a few years ago but never finished it. One of these days I hope to get back to it.

[0]: https://randomascii.wordpress.com/2013/07/16/floating-point-...

teleforce
6 replies
21h29m

The author did mention about fixed point was very popular for gamedev before floating point becoming popular due the increased in hardware capability, and most likely CORDIC was being used as well together with fixed point.

In fact, before IEEE 754 became the popular standard that it is today, fixed point was used all the time (go and ask any gamedev who worked on stuff between 1980 and 2000ish and they'll tell you all about it).
aappleby
4 replies
20h59m

Was gamedev between 1980 and 2000ish, can confirm. PS1 had no floating point unit.

TuringTourist
3 replies
17h28m

This was the cause of the signature jiggly textures that were pervasive in PS1 games

fayalalebrun
2 replies
13h0m

This is a common misconception, but is not the case. For example, look at the Voodoo 1, 2, and 3, which also used fixed point numbers internally but did not suffer from this problem.

The real issue is that the PS1 has no subpixel precision. In other words, it will round a triangle coordinates to the nearest integers.

Likely the reason why they did this is because then you can completely avoid any division and multiplication hardware, with integer start and end coordinates line rasterization can be done completely with addition and comparisons.

Sharlin
1 replies
1h41m

Didn’t PS1 also lack perspective corrected texture mapping? That would definitely make textures wobbly. AFAIK they compensated for it simply by using as finely subdivided geometry as possible (which wasn’t very finely, really).

sgtnoodle
0 replies
58m

The folk that made Crash Bandicoot were pretty clever. They figured out that the PlayStation could render untextured, shaded triangles a lot faster than textured triangles, so they "textured" the main character with pixel-scale geometry. This in turn saved them enough memory to use a higher resolution frame buffer mode.

https://all-things-andy-gavin.com/2011/02/04/making-crash-ba...

apitman
0 replies
18h26m

I believe that was mostly for performance reasons, not determinism, right?

boulos
3 replies
21h30m

That blog post is now a decade old, but includes an important quote:

The IEEE standard does guarantee some things. It guarantees more than the floating-point-math-is-mystical crowd realizes, but less than some programmers might think.

Summarizing the blog post, it highlights a few things though some less clearly than I would like:

  * x87 was wonky

  * You need to ensure rounding modes, flush-to-zero, etc. are consistently set

  * Some older processors don't have FMA

  * Approximate instructions (mmsqrtps et al.) don't have a consistent spec

  * Compilers may reassociate expressions

For small routines and self-written libraries, it's straightforward, if painful to ensure you avoid all of that.

Briefly mentioned in the blog post is IEEE-754 (2008) made the spec more explicit, and effectively assumed the death of x87. It's 2024 now, so you can definitely avoid x87. Similarly, FMA is part of the IEEE-754 2008 spec, and has been built into all modern processors since (Haswell and later on Intel).

There are still cross-architecture differences like 8-wide AVX2 vs 4-wide NEON that can trip you up, but if you are writing assembly or intrinsics or just C that inspect with Compiler Explorer or objdump, you can look at the output and say "Yep, that'll be consistent".

Someone
2 replies
20h59m

but if you are writing assembly or intrinsics or just C that inspect with Compiler Explorer or objdump, you can look at the output and say "Yep, that'll be consistent".

Surely people have written tooling for those checks for various CPUs?

Also, is it that ‘simple’? Reading https://github.com/llvm/llvm-project/issues/62479, calculations that the compiler does and that only end up in the executable as constants can make results different between architectures or compilers (possibly even compiler runs, if it runs multi-threaded and constant folding order depends on timing, but I find it hard to imagine how exactly that would happen).

So, you’d want to check constants in the code, too, but then, there’s no guarantee that compilers do the same constant folding. You can try to get more consistent by being really diligent in using constexpr, but that doesn’t guarantee that, either.

mjcohen
0 replies
20h5m

Years ago, I was programming in Ada and ran across a case where the value of a constant in a program differed from the same constant being converted at runtime. Took a while to track that one down.

boulos
0 replies
20h21m

The same reasoning applies though. The compiler is just another program. Outside of doing constant folding on things that are unspec'ed or not required (like mmsqrtps and most transcendentals), you should get consistent results even between architectures.

Of course the specific line linked to in that GH issue is showing that LLVM will attempt constant folding of various trig functions:

https://github.com/llvm/llvm-project/blob/faa43a80f70c639f40...

but the IEEE-754 spec does also recommend correctly rounded results for those (https://en.wikipedia.org/wiki/IEEE_754#Recommended_operation...).

The majority of code I'm talking about though uses constants that are some long, explicit number, and doesn't do any math on them that would then be amenable to constant folding itself.

That said, lines like:

https://github.com/llvm/llvm-project/blob/faa43a80f70c639f40...

are more worrying, since that may differ from what people expect dynamically (though the underlying stuff supports different denormal rules).

Either way, thanks for highlighting this! Clearly the answer is to just use LLVM/clang regardless of backend :).

ykonstant
1 replies
1d5h

It is not, because the subdivisions in CORDIC do not possess the best approximation properties of Farey fractions. For that, you would have to partition the circle into major/minor arcs instead, in the sense of Hardy-Littlewood's circle method. But that would be computationally very expensive, whereas this binary subdivision is fast, and is made faster using the shift-add tricks.

1oooqooq
0 replies
1d3h

i see. thanks. assumed it was just bit shift tricks implementation. but indeed, no mediant.

kevindamm
1 replies
1d5h

More like a binary search, but instead of adjusting an array index you're iterating an orthonormal matrix which represents rotation about the origin.

tapatio
0 replies
21h15m

Or a second order step response.

nico
0 replies
1d5h

That’s super cool

Thank you for that link, very interesting diagrams and structures

They look very similar to what a neural network might be in certain cases, and it seems like the structures are trying to map some sort of duality

the__alchemist
4 replies
21h20m

Side note for 2023: Some modern MCUs are low cost, but have FPUs. A good example is the STM32G4. So, unlike on an M0 MCU or w/e, you can freely use f32, if you don't want fixed point. And you can get these for ~$1-2 / MCU.

That said... The G4 also has a hardware CORDIC peripheral that implements this algo for fixed-point uses. Is this mainly to avoid floating point precision losses? You program it using registers, but aren't implementing CORDIC yourself on the CPU; dedicated hardware inside the IC does it.

kragen
2 replies
16h28m

the cheapest cortex-m4fs in stock on digi-key, trying to omit duplicates, are the 3-dollar nuvoton m481le8ae https://www.digikey.com/en/products/detail/nuvoton-technolog..., the 3-dollar maxim max32660 https://www.digikey.com/en/products/detail/analog-devices-in..., and the 5-dollar atmel atsamd51 https://www.digikey.com/en/products/detail/microchip-technol.... their cheapest stm32g4 is the stm32g441kbt6 which rounds to 4 dollars https://www.digikey.com/en/products/detail/microchip-technol...

where do you get them for under 2 dollars?

(digi-key does list the nuvoton chip for barely under 2 dollars in quantity 500)

kragen
0 replies
14h52m

fantastic, thanks!

ddingus
0 replies
15h30m

The second Parallax Propeller chip has a CORDIC engine in silicon. It's fast, and handles 64bit intermediate products, which makes the divide and trig stuff more than adequate precision for most things. One can always increase precision in software too.

I was late to the game learning about CORDIC. I had used fixed point a lot in 8 and 16 bit assembly land for performance, and determinism.

When I found out about it, I was stunned! It was fast, and only required basic math capability to be useful.

rmorey
3 replies
21h59m

I remember in high school precalc, we learned about the Taylor series, and my teacher told us that it was how trig functions were actually implemented on calculators. Well I looked it up and found that it was actually CORDIC, and went and had some fun implementing it in TI Basic

rossant
2 replies
10h56m

Well, are there _any_ calculators out there that use Taylor expansions?

toolslive
0 replies
27m

well, from memory (this was > 20y ago)

  - Taylor gives you polynomial optimized for the evaluation of one point.
  - naive Newton gives you a polynomial with 0 error in the interpolation points [but the system has a bad condition]
  - Chebychev gives you a polynomial with minimal error over the interval of choice.
So there is no real reason to ever use the Taylor series to approximate a function over an interval. The high school math teachers are lying.

snovv_crash
0 replies
7h55m

I've done some work with fast approximate math libraries,and went down this path. Taylor was a dead-end, but Chebyshev polynomials work great. They have the nice property of having (close to) the minimum maximum (minimax) error over the entire range you are approximating.

aftbit
2 replies
1d2h

Now, it's fairly obvious that rotating by say 22.75˚ is the same as rotating by 45˚ and then -22.5˚ - i.e. we can break up a rotation into smaller parts, with both positive and negative components.

Wouldn't that be rotating by 22.5°? Is this an error in the article or my understanding?

AlexSW
1 replies
1d2h

An error in the article.

teleforce
1 replies
20h26m

Fun facts, in addition to sine and cosine calculation and generation, CORDIC can also be used for many operations including log, exponent, square root, vector magnitude, polar-cartesian conversion and vector rotation. Spoiler alerts, the authors provided the teaser regarding these propects in the conclusions.

I've got the feeling that by using quaternion instead of conventional orthonormal matrices, CORDIC based operations can be executed more efficiently (less compute cycles and memory) and effectively (less errors) [1].

[1] CORDIC Framework for Quaternion-based Joint Angle Computation to Classify Arm Movements:

https://core.ac.uk/works/8439118

nimish
0 replies
20h23m

It can be extended to arbitrary Lie groups IIRC

mchinen
1 replies
2h55m

Nice to read this.

Curious how CORDIC performs against cubic or other polynomial interpolation with a small table. I was taught that resource constrained synthesizers sometimes used cubic interpolation, and this was presumably when CORDIC was relatively new.

At a glance, the single bit of precision you get with each iteration of CORDIC means it would be more expensive in compute but cheaper in space than a polynomial.

But on the space note I feel we should highlight that it can be cheaper than the article might suggest with the 4096 entry lookup tables for sin(x) - Only a quarter of the full circle is needed due to symmetry.

Sharlin
0 replies
1h38m

Back in the day gamedevs (and demosceners) used a merely 256-entry LUT for sin and cos – it was convenient to have byte-sized angles that auto-wraparound, and for rotation in 2D games, 2^8 was quite enough. Wouldn’t get you very far in 3D though if you want smooth motion.

teleforce
0 replies
8h54m

Thanks for the links.

It is a very strange that for a very pervasive and hugely popular computer techniques, CORDIC do not get a proper detailed treatment and coverage in books. Since IoT and machine-to-machine communication are taking off, and with the efficiency of CORDIC implementation and operation, it usages will most probably increase exponentially, hence good references are needed for correct and optimized implementation.

The two anomalies are books by Prof. Omondi, and Prof. Deschamps [1],[2].

[1]Computer-Hardware Evaluation of Mathematical Functions:

https://www.worldscientific.com/worldscibooks/10.1142/p1054

[2]Guide to FPGA Implementation of Arithmetic Functions:

http://www.arithmetic-circuits.org/guide2fpga/vhdl_codes.htm

brcmthrowaway
1 replies
22h57m

This used to be all the rage when folks were into DSP

tapatio
0 replies
21h15m

cos + jsin baby.

sunpazed
0 replies
1d4h

I have a few vintage programmable HP calculators that implement CORDIC on their 4-bit CPUs. You can also program them to calculate the Taylor expansion of sin() as another approximate method.

If you liked this, it’s worth reading Donald Knuth seminal “The Art of Computer Programming” which explains a number of mathematical algorithms by example.

srean
0 replies
6h59m

I got reminded of a rather cute snippet of code I had a part in.

One needed to find the coordinates of the bisector of an angle subtended by an arc of a unit circle. They (x,y) coordinates of the 'arms' were available. The existing implementation was a mass of trigonometry - going from the x,y coordinates to polar (r,θ), then making sure that the θ computed was in the correct quadrant, halving the θ and converting back to x,y coordinates. All in all, many calls to trigonometric functions and inverse functions.

This was in Python and thus we had first class access to complex numbers. So all that was needed was to define two complex numbers. z1 from (x1,y1) z2 from (x2,y2) and then take the geometric mean of the product: √(z1*z2). Done.

No explicit trigonometry and no explicit conversion and back in the new code.

pintor-viejo
0 replies
3h56m

The first place I personally saw CORDIC mentioned was:

"How Many Ways Can You Draw a Circle?", the first *Jim Blinn's Corner* article:

https://ieeexplore.ieee.org/document/4057251

He found 15 ways, which generalized to methods like CORDIC, etc.

A couple years ago I was wondering if SDF shape representation could use something like a taxi cab distance metric to move floating point operations to integer, although I haven't spent too much time investigating it.

pintor-viejo
0 replies
19h40m

Meagher's octree system notably used only integer arithmetic with no integer multiplication/division:

Efficient (linear time) algorithms have been developed for the Boolean operations (union, intersection and difference), geometric operations (translation, scaling and rotation), N-dimensional interference detection, and display from any point in space with hidden surfaces removed. The algorithms require neither floating-point operations, integer multiplications, nor integer divisions.

https://doi.org/10.1016/0146-664X(82)90104-6

This facilitated creation of fast, custom VLSI graphics acceleration hardware for octree representation.

kazinator
0 replies
12h54m

If, having it in your head, you could use it to calculate trig functions in your head, that would be a way of it paying rent.

So, yeah.

femto
0 replies
7h1m

sin and cos are often used to rotate vectors. In these cases, a trick with a CORDIC is to avoid the traditional sin/cos/multiply calculation by using the vector to be rotated as the input to the CORDIC. The CORDIC will directly produce the rotated vector, without having to compute sin/cos or do a complex multiplication.

CORDIC really shines when latency doesn't matter so much. You can pipeline every stage of the computation and get a huge throughput. It's well suited for digital mixing in radio systems.

azubinski
0 replies
7h42m

Most important keywords are:

BLDC + motor control + sensorless

CORDIC is a de facto standard algorithm in the relevant application area.

Infineon explains algorithms compactly and clearly:

https://www.infineon.com/dgdl/AP1610510_CORDIC_Algorithm.pdf

Microchip gives an idea of where and what the algorithm is used for:

https://ww1.microchip.com/downloads/aemDocuments/documents/O...

CORDIC hardware accelerators are found in many microcontrollers designed to control electric motors.

Waterluvian
0 replies
1h11m

I think the love for this algorithm feels similar to my love for Raycasting. You get a 3D game on barely any hardware with no trigonometry. Like… how?! It makes complete sense once I implemented it, but it still feels like wizardry.