return to table of content

A 100x speedup with unsafe Python

Animats
27 replies
1d16h

Oh, array striding.

This is a classic bikeshedding issue. When Go and Rust were first being designed, I brought up support for multidimensional arrays. For both cases, that became lost in discussions over what features arrays should have. Subarrays, like slices but multidimensional? That's the main use case for striding. Flattening in more than one dimension? And some people want sparse arrays. Stuff like that. So the problem gets pushed off onto collection classes, there's no standard, and everybody using such arrays spends time doing format conversion. This is why FORTRAN and Matlab still have strong positions in number-crunching.

yosefk
17 replies
1d13h

From everything I'm seeing, it follows that Matlab & Fortran have been decimated by Python and C/C++ around 1990s and 2010s, respectively. Of course I could be wrong; any evidence of their still strong position will be greatly appreciated, doubly so evidence that this position is due to stride issues.

(Of course Python provides wrappers to Fortran code, eg FITPACK, but this code specifically was mostly written in the 80s, with small updates in this century, and is probably used more thru the numpy wrappers, stride conversion issues and all, than thru its native Fortran interface)

MobiusHorizons
6 replies
1d12h

I think pretty much all linear algebra libraries are still Fortran and are unlikely to ever be C because Fortran is legitimately faster than C for this stuff. I don't know if it has anything to do with strong opinions about how values are represented, I think it has more to do with lower overhead of function calling, but that is just repeating what someone told me about Fortran vs C in general, not necessarily applicable to BLAS libraries.

Fortran at least used to be very common in heavy scientific computing, but I would bet that relies on GPUs or other accelerators these days.

yosefk
4 replies
1d12h

The question is how much new numeric code is written in Fortran vs C/C++. My guess is way below 10%, certainly if we measure by usage as opposed to LOC and I would guess by LOC as well.

Is Fortran legitimately faster than C with the restrict keyword? Regardless of the function call cost diffs between the two - meaning, even if we assume Fortran is somehow faster here - there's no way numeric production code does enough function calls for this to matter. If Fortran was faster than C at any point in time I can only imagine pointer aliasing issues to have been the culprit and I can't imagine it still being relevant, but I am of course ready to stand corrected.

eyegor
1 replies
1d12h

I don't think fortran is faster by language virtue, but it's certainly easier to scrap together high performance numeric fortran code. And ifort/aocc are amazing at making code that runs well on clusters, which is not a priority for any other domain. Fortran is absolutely still on top for modern numerics research code that involves clusters, talk to anyone who works in simulations at a national lab. If your code is mostly matrix math, modern fortran is very nice to work with.

Emphasis on modern because maintaining 70s-80s numeric mathematician cowboy code is a ninth circle of hell. It has a bad rep for that reason.

semi-extrinsic
0 replies
1d2h

When you speak about code from the 1970s in particular, you need to appreciate the extremely limited language features that were available.

They did not even have while-loops in the language until Fortran 77. While-loop functionality used to be implemented using DO-loops mixed with GOTOs. You can't fault the people of that era for using the only tools that were available to them.

josefx
0 replies
1d11h

Restrict probably helps. I can't say much about fortran but C still has warts that can significantly impact its math library. For example almost every math function may set errno, that is a sideeffect that isn't easy to eliminate by the compiler and might bloat the code significantly. For example with gcc a single instruction sqrt turns into a sqrt instruction, followed by several checks to see if it succeeded, followed by a function call to the sqrt library function just to set errno correctly. I just started to disable math-errno completely once I realized that C allows several alternative ways to handle math errors, which basically makes any code relying on it non portable.

ant6n
0 replies
1d11h

At the core, Fortran uses multidimensional numerical arrays, with the shapes being defined being defined in the code. So the compiler knows much more about the data that’s being operated on, which in theory allows better optimization.

I thought blas/lapack is still written in Fortran, so most numerical code would still be built on top of Fortran.

Galanwe
0 replies
1d11h

I think pretty much all linear algebra libraries are still Fortran and are unlikely to ever be C

That is more of an urban legend than reality in 2024. Fact is, although the original BLAS implementation was fortran, it has been at least a decade since every Linux distribution ships either OpenBLAS or ATLAS or the MKL, which are both written in a mix of C and assembly. All modern processor support is only available in these implementations.

LAPACK itself is still often built from the Fortran code base, but it's just glue over BLAS, and it gets performance solely from its underlying BLAS. Fortran doesn't bring any value to LAPACK, it's just that rewriting millions of algebraic tricks and heuristics for no performance gain is not enticing.

xdavidliu
2 replies
1d2h

Matlab & Fortran have been decimated by Python and C/C++ around 1990s and 2010s, respectively.

Nit: this failed to be respective; the years are the other way around

kbelder
1 replies
1d2h

Interesting... maybe the respective comparison was Matlab to Python and Fortran to C/C++? This sentence actually had three parallel clauses.

But that was a great nit to find.

xdavidliu
0 replies
9h9m

yes that's what I originally meant. Matlab to Python was 2010's and Fortran to C/C++ was 1990s. (Probably not the exact time frame, but that's close)

StableAlkyne
2 replies
1d3h

any evidence of their still strong position will be greatly appreciated

Fortran still dominates certain areas of scientific computing / HPC, primarily computational chemistry and CFD. - https://fortran-lang.org/packages/scientific - you don't hear about most of them because they're generally run on HPC centers by scientists in niche fields. But you do get their benefit if you buy things that have the chemical sector in their supply chain.

The common thread is generally historical codes with a lot of matrix math. Fortran has some pretty great syntax for arrays and their operations. And the for-loop parallelization syntax in parallel compilers (like openmp) is also easy to use. The language can even enforce function purity for you, which removes some of the foot guns from parallel code that you get in other languages.

The kinds of problems those packages solve tend to bottleneck at matrix math, so it's not surprising a language that is very ergonomic for vector math found use there.

Same for Matlab, it's mostly used by niche fields and engineers who work on physical objects (chemical, mechanical, etc). Their marketing strategy is to give discounts to universities to encourage classes that use them. Like Fortran, it has good syntax for matrix operations. Plus it has a legitimately strong standard library. Great for students who aren't programmers and who don't want to be programmers. They then only know this language and ask their future employer for a license. If you don't interact with a lot of engineers at many companies, you aren't going to see Matlab.

moregrist
0 replies
21h1m

While it's not as ubiquitous as it used to be, Matlab is still very heavily used within shops that do a lot of traditional engineering (ME, EE, Aero, etc.).

This surprises people who just write software, but consider:

- The documentation and support is superb. This alone can justify the cost for many orgs.

- Did I mention the support? MathWorks has teams of application support engineers to help you use the tool. The ability to pay to get someone on the phone can also justify the price.

- The toolkits tend to do what specific fields want, and they tend to have a decent api. In contrast, you might end up hacking together something out of scipy and 3-4 weirdly incompatible data science packages. That's fine for me, but your average ME/EE would rather just have a tool that does what they need.

- It also has SIMULINK, which can help engineers get from spec to simulation very quickly, and seems deeply embedded in certain areas (eg: it's pretty common for a job ad with control systems work to want SIMULINK experience).

Is Python gradually displacing it? Probably.

(Honestly, I wish it would happen faster. I've written one large program in Matlab, and I have absolutely no desire to write another.)

embwbam
0 replies
1d2h

I work for the National Solar Observatory, creating Level 2 data for the DKIST Telescope’s observations of the sun. (For example, an image of the Temperature that lines up with the observation)

Just the other day, the solar physicist I work with said “yeah, that code that runs on the supercomputer needs to be rewritten in Fortran” (from C, I think.

He’s nearing retirement, but it’s not that he’s behind the times. He knows his stuff, and has a lot of software experience in addition to physics

aragilar
1 replies
1d5h

Matlab has the issue of not being open source (similarly IDL), but I still see it popping up (though naturally, if I've got any choice it the matter, it's going to be ported to Python). I've also seen new Fortran codebases as well, mainly where the (conceptual, not computational) overhead of using helper libraries isn't worth it.

I'd suggest Python (via the various memoryview related PEPs, plus numpy) does provide most of the required information (vs. e.g. Go or Rust or Ruby), so I'm not sure that proving much other than if the value of using a library to handle multidimensional arrays is enough such that the combined value of a more general language plus library beats Matlab or Fortran (likely due to other ecosystem or licensing effects), people will switch.

RugnirViking
0 replies
1d5h

we still get new Matlab codebases at my work, also R. In my case it comes from academics in the power systems and renewables fields who move into industry and keep using the tools they used there. We try to gently ease them into python and version control and sensible collaboration but we get new hires all the time

samatman
0 replies
1d4h

Julia has gained a lot of mindshare in array-heavy coding, with a lot of hard work put into making it possible to get comparable speeds to C++ and Fortran. The manual[0] offers a good introduction to what's possible: this includes sparse and strided arrays, as well as the sort of zero-copy reinterpretation you accomplish with pointer wizardry in the Fine Article.

[0]: https://docs.julialang.org/en/v1/manual/interfaces/#man-inte...

chillfox
0 replies
1d12h

If everyone is just using the Fortran libraries instead of reimplementing it in a modern language, then that's evidence that it's still being used for that purpose.

lifthrasiir
4 replies
1d15h

But format conversion is inevitable because C and FORTRAN has a different axis ordering anyway, isn't it?

blt
3 replies
1d14h

Not really, you can always change the indexing to account for it. For example, the GEMM matrix multiplication subroutines from BLAS can transpose their arguments [1]. So if you have A (m x n) and B (n x p) stored row-major, but you want to use a column-major BLAS to compute A*B, you can instead tell BLAS that A is n x m, B is p x n, and you want to compute A' * B'.

As the article mentions, NumPy can handle both and do all the bookkeeping. So can Eigen in C++.

[1] https://www.math.utah.edu/software/lapack/lapack-blas/dgemm....

lifthrasiir
2 replies
1d14h

That's conceptually still a format conversion, though the actual conversion might not happen. Users have to track which format is being used for which matrix and I believe that was what the GP was originally complaining for.

yosefk
1 replies
1d13h

Agreed, realistically you're going to either convert the data or give up on using the function rather than reimplementing it to support another data layout efficiently; while converting the code instead of the data, so to speak, is more efficient from the machine resource use POV, it can be very rough on human resources.

(The article is basically about not having to give up in some cases where you can tweak the input parameters and make things work without rewriting the function; but it's not always possible)

semi-extrinsic
0 replies
1d2h

Talking about rough on human resources, I take it you've never written iterative Krylov methods where matrices are in compressed sparse row (CSR) format?

Because that's the kind of mental model capacity scientific programmers needed to have back in the days. People still do similarly brain-twisting things today. In this context, converting your logic between C and Fortran to avoid shuffling data is trivial.

sa-code
3 replies
1d14h

What kind of support would you have hoped for?

infogulch
2 replies
1d12h

A way to reinterpret a slice of size N as a multidimensional array with strides that are a factorization of N, including optional reverse order strides. Basically, do the stride bookkeeping internally so I can write an algorithm only considering the logic and optimize the striding order independently.

Animats
1 replies
1d11h

That's where you end up after heavy bikeshedding. Lots of features, terrible performance, as the OP points out.

infogulch
0 replies
1d2h

I agree with you on sparse arrays and multidimensional slices, but this is basically the same as what you'd do manually. Saying that "track strides for me" is "lots of features" is a bit uncharitable.

idkdotcom
20 replies
1d17h

Safety is one of Python's greatest advantages over C or C++. Why would anyone use unsafe Python when P0ython doesn't have other features such as type safety and all the debugging tools that have been built for C and C++ over time is beyond me.

Python is a great language, but just as the generation of kids who got out of computer science programs in the 2000s were clueless about anything that wasn't Java, it seems this generation is clueless about anything that isn't Python.

There was life before Python and there will be life after Python!

bongodongobob
8 replies
1d15h

I don't use Python because of type safety, I use it because it's easy to write. I couldn't give a single fuck about some missed pointers that weren't cleaned up in the 2.5 seconds my program ran. I'm not deploying public code.

I tend to prototype in Python and then just rewrite the whole thing in C with classes if I need the 10000x speedup.

SunlitCat
7 replies
1d14h

That easy to write part is something, I am not so sure about. It took me ages to understand why my first try at writing a blender plugin didn't work.

It was because of white spaces not lining up. I was like, really? Using white spaces as a way to denote the body of a function? Really?

nottorp
6 replies
1d10h

I'm curious... what rock have you been living under in the past 33 years since python was launched?

SunlitCat
5 replies
1d

Let's see, I was living under the:

- Amiga E - Assembler (on Amiga) - perl - tcl/tk - javascript - vbscript - java - vba - c - c++ - python

rock. Granted at some of those rocks, I just took a quick peek (more like a glance), but a rock with whitespaces being important was, at that moment, new to me!

nottorp
4 replies
1d

Oh cmon, I can still write some z80 assembly from memory and remember the zx spectrum memory layout somewhat, but I check new programming languages now and then :)

SunlitCat
3 replies
1d

:)

I just meant that for someone coming from a language(s) where different kinds of delimiters are used to denote function bodies, a language that puts such a huge emphasis on whitespace was kinda throwing me off to get an error message slapped at me even though I was replicating the example letter by letter (although, obviously, I missed the importance of whitespace).

nottorp
2 replies
1d

It's all fuzzy, but I'm sure I knew about the importance of whitespace in python long before I actually tried python.

Even Raymond's article which was one of the first whined about it [1]. I definitely remember reading that one and thinking I have to check out that language later.

[1] https://www.linuxjournal.com/article/3882

SunlitCat
1 replies
1d

Thank you for the article. I have to put it on my "to read" pile! :)

nottorp
0 replies
7h40m

It's from 2000, python has grown up in the mean time and it has added complexity. Put a (small) grain of salt on Raymond's praise.

jandrese
3 replies
1d16h

Seems to me if you could do most of the work in Python and then just make the critical loop unsafe and 100x faster then that would certainly have some appeal.

gibolt
2 replies
1d16h

Plenty of people would gladly not have to learn another language (especially C).

You could also benefit from testing blocks of code with safety enabled to have more confidence when safety is removed.

TylerE
0 replies
1d11h

Or if you’re going to learn another language might as well learn nim. Keep most of the python syntax, ditch the performance and the packaging insanity.

KeplerBoy
0 replies
1d12h

It was explained pretty well in the blog: Installing the OpenCV python package is easy, fast and painless as long as you're happy with the default binary. Building OpenCV from source to get it into a C/C++ program can quickly turn into a multi-hour adventure, if you're not familiar with building sizeable C++ projects.

omoikane
1 replies
1d15h

The "unsafe" in the title appears to be used in the sense of "exposing memory layout details", but not in the sense of "direct unbounded access to memory". It's probably not the memory safety issue you are thinking of.

yosefk
0 replies
1d12h

You can absolutely get direct unbounded access to memory with ctypes, with all the bugs that come from this. I just think/hope the code I show in TFA happens to have no such bugs.

SunlitCat
1 replies
1d14h

Thing about academia (like computer science and the like) is, you need a programming language you can get across easily, quickly and can get people to produce satisfying results in no time.

Back then it might have been java, then python, till the next language comes around.

The java thing was so apparent, that when you looked at c++ code, you were able to spot the former java user pretty easily. :)

About python, sometime ago, I was looking into arudino programming in c and found someone offering a library (I think) in c (or c++, can't remember). What i remember is, that this person stopped working on it, because they got many requests and questions regarding to be able to use their library in python.

pjmlp
0 replies
1d8h

The java thing was so apparent, that when you looked at c++ code, you were able to spot the former java user pretty easily. :)

Well, Java was designed based on C++ GUI development patterns typical in the 10 years of C++ GUI frameworks that predated Java.

Yet, people keep using these kind of remarks.

kragen
0 replies
1d13h

while python is eminently criticable, yosef kreinin has designed his own cpu instruction set and gotten it fabbed in a shipping hardware product, and is the author of the c++fqa, which is by far the best-informed criticism of c++; criticizing him with 'this generation is clueless about anything that isn't Python' seems off the mark

gibolt
0 replies
1d16h

"Why would anyone use" -> "when" usage generally means lots of use cases are being ignored / swept under the rug.

1 billion people exist. Each has a unique opinion / viewpoint.
eru
0 replies
1d16h

Why would anyone use unsafe Python when Python doesn't have other features such as type safety and all the debugging tools that have been built for C and C++ over time is beyond me.

C and C++ 'type safety' is barely there (compared to more sane languages like OCaml or Rust or even Java etc). As to why would anyone do that? The question is 'why not?' It's fun.

intelVISA
16 replies
1d17h

Isn't all Python, by design, unsafe?

eru
6 replies
1d16h

Is this supposed to be a joke?

Have a look at the linked article to see what meaning of 'unsafe' the author has in mind.

intelVISA
5 replies
1d16h

Safety is catching (more) errors ahead of time ...for which Python is grossly unsuitable imo.

Fun lang though, just not one that ever comes to mind when I hear 'safe'.

eru
3 replies
1d16h

That's one definition of safety. But it's not the one the author uses in this case.

The generic snide about Python that has nothing to do with the article ain't all that informative.

You could make the same remark you just made about all Python being unsafe about Rust: 'Isn't all Rust, by design, unsafe?'. And compared to eg Agda or Idris that would be true, but it wouldn't be a very useful nor interesting comment when talking about specifically 'unsafe' Rust vs normal Rust.

intelVISA
2 replies
1d16h

I would agree, Rust is not safe. We need to encourage more formal rigor in our craft and avoid misconceptions like 'safe Rust' or 'safe Python'. Thus my original comment :P

eru
0 replies
1d16h

Eh, even Agda is only safe in the sense of 'does what you've proven it to do'. That doesn't mean that if you eg write a machine learning library in Agda, your AI won't come and turn us all into paperclips.

So it doesn't really make sense to pretend there's some single meaning of 'safe' vs 'unsafe' that's appropriate in all contexts.

dahart
0 replies
1d16h

It’s a term of art in this case. Or someone proposed the term improper noun https://news.ycombinator.com/item?id=32673100

Fine and good to advocate rigor, whether or not it’s specifically relevant to this post, but maybe be careful with the interpretation and commentary lest people decide the misconception is on your part?

recursive
0 replies
1d16h

Memory corruption vs type safety.

"Safety" is an overloaded term, but that happens a lot in software. You'll probably get best results if you try to understand what people are talking about, rather than just assuming everyone else is an idiot.

westurner
5 replies
1d16h

ctypes, c extensions, and CFFI are unsafe.

Python doesn't have an unsafe keyword like Rustlang.

In Python, you can dereference a null pointer and cause a Segmentation fault given a ctypes import.

lancedb/lance and/or pandas' dtype_backend="pyarrow" might work with pygame

Retr0id
4 replies
1d16h

You don't even need a ctypes import for it, you'll get a segfault from this:

   eval((lambda:0).__code__.replace(co_consts=()))
cpython is not memory safe.

westurner
3 replies
1d15h

Static analysis of Python code should include review of "unsafe" things like exec(), eval(), ctypes, c strings, memcpy (*),.

Containers are considered nearly sufficient to sandbox Python, which cannot be effectively sandboxed using Python itself. Isn't that actually true for all languages though?

There's a RustPython, but it does support CFFI and __builtins__.eval, so

maple3142
2 replies
1d13h

The example given by parent does not need eval to trigger though. Just create a function and replace its code object then call it, it will easily segfault.

jwilk
1 replies
1d8h

Complete example without eval:

  def f(): pass
  f.__code__ = f.__code__.replace(co_consts=())
  f()

Retr0id
0 replies
1d8h

yup, eval was just there for golfing purposes

emmanueloga_
2 replies
1d14h

As others commented "safety" is a heavily overloaded term, for instance there's "type safety" [1] and "memory safety" [2], and then there's "static vs dynamic typing" [3], and "weak vs strong typing" [4], so talking about types and safety offered by a language can be very nuanced.

I suspect when you say "unsafe by design" you may be referring to the dynamic type checking aspect of Python, although Python has supported for a while type annotations, that can be statically checked by linter-like tools like PyRight.

--

1: https://en.wikipedia.org/wiki/Type_safety

2: https://en.wikipedia.org/wiki/Memory_safety

3: https://en.wikipedia.org/wiki/Type_system#Type_checking

4: https://en.wikipedia.org/wiki/Strong_and_weak_typing

d0mine
1 replies
1d12h

Python both type and memory safe e.g., ""+1 in Python is TypeError and [][0] is IndexError

But you can perform unsafe operations e.g.,

    import ctypes
    ctypes.string_at(0)  # access 0 address in memory -> segfault

eru
0 replies
13h57m

Yes, though `ctypes' is kind of marked as `unsafe' in Python by convention.

ggm
15 replies
1d15h

Why does numpy do column order data?

Is it because in much of the maths domain it turns out you manipulate columns more often than rows?

(not a mathematician but I could believe if the columns represent "dimensions" or "qualities" of some dataset, and you want to apply a function to a given dimension, having data in column-natural order makes that faster.)

Obviously you want to believe naievely there is no difference to the X and Y in the X,Y plane, but machines are not naieve and sometimes there IS a difference to doing things to the set of verticals, and the set of horizontals.

bee_rider
7 replies
1d13h

It is possible that my brain had been damaged by convention, but it looks much tidier to have your vector multiplicand on the right of the matrix when doing a matrix vector multiplication y=Ax, where A is the matrix and x is the vector, y is the result.

So, x and y must be columns. So, it is nice if our language has column-order as the default, that way a vector is also just a good old 1d array.

If we did y=xA, x and y would be rows, the actual math would be the same but… I dunno, isn’t it just hideous? Ax is like a sentence, it should start with an upper case!

It also fits well with how we tend to learn math. First we learn about functions, f(x). Then we learn that a function can be an operator. And a matrix can represent a linear operator. So, we tend to have a long history of stuff sitting to the left of x getting ready to transform it.

planede
2 replies
1d6h

Your reasoning starts out nicely with y=Ax, but you get the wrong conclusion. The layout of x and y are not affected at all by row or column order, they are just consecutive numbers in either case. So you have to look at the layout of A.

For the row-major layout the multiplication Ax ends up being more cache-efficient than for the column major layout, as for calculating each component of y you need to scan x and a row of A.

bee_rider
1 replies
1d4h

I’m pretty sure BLAS will tile out the matrix multiplication internally anyway, so it doesn’t matter. At least for matmat would, is matvec special?

planede
0 replies
1d2h

However matmat is done, row-major vs column-major for both matrices shouldn't make a difference (for square matrices at least).

I don't know if tiling is done for matvec. I don't think it makes sense there, but I didn't think about it too hard.

Joker_vD
1 replies
1d6h

Well, there used to be a tradition in universal algebra (and category theory) of putting functions/operators at the right side of the arguments but it seems to have ultimately died out. And "y = xA" is the standard notation in the linear coding theory, even to this day: message vectors are bit strings, not bit columns.

bee_rider
0 replies
1d2h

I think there’s also some other pretty big field that tends to put the vector on the left… statistics? Economics? For some reason I couldn’t find any links, so maybe I just got this from, like, one random statistician. If only they’d told me about sample sizes.

montebicyclelo
0 replies
1d6h

y = x @ W + b

Is how you'd write it in NumPy and most (/all?) deep learning libraries, with x.shape=[batch, ndim]

I'd personally prefer that to be the convention, for consistency.

SailorJerry
0 replies
1d13h

I think the reason I prefer columns is I do the mental expansion into large bracketed expressions. If x is a row and kept inline, then the expansion gets really wide. To keep it compact and have the symbols oriented the same as their expansion, you'd have to put the x above A and that's just silly.

thayne
1 replies
1d14h

Numpy was originally designed for mathematicians and scientists, and in those domains convention often lines up better with column major order. For example, vectors are a single column, and the column index comes before the row index in many notations. So using column major order meant familar formulas and algorithms (including translating from fortran code which is column major, or matlab for that matter) are easy to translate to numpy code.

Also, numpy is built on some fortran libraries, like BLAS and LAPACK that assume a column major order. If it took row major input, it would need to transpose matrices in some cases change the order, or rewrite those libraries to use row major form.

kolbusa
0 replies
1d14h

Copying is usually not necessary. Often times you can swap data and/or shape arguments and get a transposed result out. While it is true that Fortran BLAS only supports col-major, CBLAS supports both row- and col-major. Internally, all the libraries I worked on use col-major, but that is just a convention.

sdeer
1 replies
1d14h

Probably because Fortran stores matrices and other multidimensional arrays in column order. Traditionally most numerical computation software was written in Fortran and numpy calls those under the hood. Storing in row order would have meant copying the data to column major order and back for any call to Fortran.

Thrymr
0 replies
1d3h

NumPy can _store_ arrays in either row-major (C-style) or column-major (Fortran-style) order. Row-major is the default. Index order is always row, column.

Pinus
1 replies
1d12h

Hang on, doesn’t numpy use C (row major) array ordering by default? checks docs Seems like it does. However, numpy array indexing also follows the maths convention where 2D arrays are indexed by row, column (as does C, by the way), so to access pixel x, y you need to say im[y, x]. And the image libraries where I have toyed with with numpy integration (only Pillow, to be honest) seem to work just fine like this — a row of pixels is stored contiguously in memory. So I don’t quite see why the author claims that numpy stores a column of pixels contiguously, but I have only glanced at the article, so quite probably I have missed something.

KeplerBoy
0 replies
1d11h

You're right. Numpy stores arrays in row-major order by default. One can always just have a look at the flags (ndarray.flags returns some information about the order and underlying buffer).

rdtsc
0 replies
1d13h

I think in math columns as vectors is a more common representation. Especially if we talk about matrix multiplication and linear systems.

qaq
7 replies
1d16h

Well with mojo it looks like you will have both safety and performance

grandma_tea
6 replies
1d16h

Closed source and not relevant.

viraptor
4 replies
1d14h

Those are examples and docs. Mojo is still closed.

viraptor
2 replies
1d11h

Yeah, that's some minimal progress, but really not that interesting. (As in, it's cool that it exists, but given we've got Python stdlib and numpy already open, it's not really new/exciting) And doesn't allow you to port to platforms they don't care enough about.

qaq
1 replies
1d7h

It might be not interesting to you. Having a lot of Rust features with much smarter compiler and Python syntax is pretty interesting to me.

viraptor
0 replies
1d6h

No, mojo itself is interesting. The stdlib itself that's posted isn't. I mean, is not very interesting code beyond "what are the implementation details". Any decent programmer could redo that based on python's stdlib.

quietbritishjim
3 replies
1d15h

That article fails to touch on the fundamental issue with its title "The Secrets of Colour Interpolation": RGB values are a nonlinear function of the light emitted (because we are better at distinguishing dark colours, so it's better to allow representing more of those), so to interpolate properly you need to invert that function, interpolate, then reapply. The difference that makes to colour gradients is really striking.

topherclay
2 replies
1d14h

Do you mean converting the RGB value to LAB values in the CIELAB color space and doing the interpellation there?

Is there a better way to do it?

CarVac
1 replies
1d14h

No, you just need to linearize the brightness.

drjasonharrison
0 replies
22h36m

Typically this is done by using a look-up table to convert the 8-bit gamma encoded intensities to 10-bit (or more) linear intensities. You can use the same look-up table for R, G, and B. Alpha should be linear.

yosefk
0 replies
1d12h

While this is a valid point, cv2.resize doesn't actually implement color conversion in a way addressing this issue; at least in my testing I get identical results whether I interpret the data as RGB or BGR. So if you want to use cv2.resize, AFAIK you can count on it not caring which channel is which. And if you need fast resizing, you're quite likely to settle for the straightforward interpolation implemented by cv2.resize.

planede
0 replies
1d6h

Even if the RGB components correspond to sRGB, to linearize you apply the same non-linear function to each component value, independently. So even if you do the interpolation in a linear colorspace, the order of the sRGB components does not matter.

comex
4 replies
1d16h

Do you actually need ctypes here, or can you just use np.reshape? That would cut out the unsafety.

yosefk
0 replies
1d12h

You... probably shouldn't, but, very interesting stuff in that link! I didn't know CPython had no bounds checks in the load_const bytecode op

yosefk
0 replies
1d12h

I think you'll have a problem with BGR data absent ctypes, since you have a numpy array with the base pointing at the first red channel value, and you want an array with the base pointing 2 bytes before the first red channel value. This is almost definitely "unsafe"?.. or does numpy have functions knowing that due to the negative z stride, the red value before the blue value is within the bounds of the original array? I somehow doubt it though it would be very impressive. And I think the last A value is hopelessly out of reach absent ctypes since a safe API has no information that this last value is within the array bounds; more strictly speaking all the alpha values are out of bounds according to the shape and strides of the BGRA array.

im3w1l
0 replies
1d16h

I think what you want is a combination of swapaxes and flip (reshape does not turn rows into columns, it turns m input rows into n output rows), but yeah.

Actually let me include a little figure

  original  reshape  swapaxes

  abc       ab       ad
  def       cd       be
            ef       cf

londons_explore
3 replies
1d9h

What can we do about this? We can't change the layout of pygame Surface data. And we seriously don't want to copy the C++ code of cv2.resize, with its various platform-specific optimizations,

Or... you could have sent a ~25 line pull request to opencv to fix this performance problem not just for you, but for thousands of other developers and millions of users.

I think your fix would go here:

https://github.com/opencv/opencv/blob/ba65d2eb0d83e6c9567a25...

And you could have tracked down that slow code easily by running your slow code in gdb, hitting Ctrl+C while it's doing the slow thing, and then "bt" to get a stack trace of what it's doing and you'd see it constructing this new image copy because the format isn't correct.

yosefk
2 replies
1d9h

25 line pull request doing what? Supporting this format efficiently is probably more LOC and unlikely to get merged as few need this and it complicates the code. Doing the thing I did in Python inside OpenCV C++ code instead (reinterpreting the data) is not really possible since you have less knowledge about the input data at this point (like, you're going to assume it's an RGBA image and access past the area defined as the allocated array data range for the last A value? The user gives a 3D array and you blithely work on the 4th dimension?)

And what about the other examples I measured, including pure numpy code? More patches to submit?

londons_explore
1 replies
1d6h

inside the python opencv bindings?

Or just special case resize, since there the channel order doesn't matter. If you interpret R as A you'll still get the right answer.

yosefk
0 replies
1d6h

The pygame API gives you a 3D RGB array and a separate 2D alpha array, which happen to reference the same RGBA/BGRA memory. Are you suggesting that OpenCV Python bindings (in most of their functions, and likewise other libraries including numpy operators themselves) should have code assuming that a WxHx3 RGB array really references the memory of a WxHx4 RGBA array?..

If you really want a solution which is a patch rather than a function in your own code, I guess the way to go would be to persuade the maintainers of pygame and separately, the maintainers of its fork, pygame-ce, to add a pixels4d_maybe_rgba_and_maybe_bgra() function returning a WxHx4 array. Whether they would want to add something like that I don't know. But in any case, I think that since closed source projects, open source projects, and code by sister teams exist that will not accept patches, or will not accept them in a timely manner, it's interesting what you can do without changing someone else's code.

jofer
3 replies
1d4h

I'm confused on the need for cytpes/etc here. You can directly modify somearr.strides and somearr.shape. And if you need to set them both together, then there's numpy.lib.stride_tricks.as_strided. Unless I'm missing something, you can do the same assignment that ctypes is used for here directly in numpy. But I might also be missing a lot - I'm still pre-coffee.

On a different note, I'm surprised no one has mentioned Cython in this context. This is distinct from, but broadly related to things like @cython.boundscheck(False). Cython is _really_ handy, and it's a shame that it's kinda fallen out of favor in some ways lately.

yosefk
2 replies
1d4h

You need to modify base pointer in this example, specifically to point 2 bytes before the current base pointer (moving back from the first red pixel value to the first blue value.) I don't think you can do it without ctypes, maybe I'm wrong.

What does Cython do better than numbs except static compilation? Honest q, I know little about both

jofer
1 replies
1d3h

You actually can do the "offset by 2 bytes back" with a reshape + indexing + reshape back. But I suspect I'm still missing something. I need to read things in more depth and try it out locally.

By "numbs" here, I'm assuming you mean numba. If that's not correct, I apologize in advance!

There are several things where Cython is a better fit than numba (and to be fair, several cases where the opposite is true). There are two that stick out in my mind:

First off, the optimizations for cython are explicit and a matter of how you implement the actual code. It's a separate language (technically a superset of python). There's no "magic". That's often a significant advantage in and of itself. Personally, I often find larger speedups with Cython, but then again, I'm more familiar with it, and understand what settings to turn on/off in different situations. Numba is much more of a black box given than it's a JIT compiler. With that said, it can also do things that Cython can't _because_ it's a JIT compiler.

If you want to run python code as-is, then yeah, numba is the better choice. You won't see a speedup at all with Cython unless you change the way you've written things.

The second key thing is one that's likely the most overlooked. Cython is arguably the best way to write a python wrapper around C code where you want to expose things as numpy arrays. It takes a _huge_ amount of the boilerplate out. That alone makes it worth learning, i.m.o.

jofer
0 replies
1d1h

Ah, right, you mean _outside_ the memory block of the array! Sorry, my mind was just foggy this morning.

That's not strictly possible, but "circular" references are with "stride tricks". Those can accomplish similar things in some circumstances. But with that said, I don't think that would work in this case.

antirez
2 replies
1d7h

MicroPython allows to do even more than that, and is fantastic and I wish it was part of standard Python as well. Using a decorator you can write functions in Viper, a subset of Python that has pointers to bytearrays() and only fast integer types that have a behavior similar to C (fixed size, they wrap around and so forth). It's a very nice way to speedup things 10 or 100x without resorting to C extensions.

yosefk
1 replies
1d7h

I dunno if it's "more than that" since they're not directly comparable? What you describe sounds more like numba or Cython than what TFA describes which is a different use case?..

antirez
0 replies
1d7h

Indeed, I don't mean they are directly comparable, but it's a great incarnation of the "Unsafe Python" idea, because it does not go too far with the subset of Python that you can use, like introducing a complete different syntax or alike. There are strict rules, but if you follow them you still kinda write Python that goes super fast.

ruined
1 replies
1d14h

oh THIS is why image byte order and dimensions are so confusing every time i fuck with opencv and pygame

well, half of why. for some reason i keep doing everything directly on /dev/fb0

TeMPOraL
0 replies
1d10h

Image byte order, axis directions, coordinate system handedness when in 3D... after enough trying, you eventually figure out the order of looping at any given stage of your program, and then it Just Works, and then you never touch it again.

C4stor
1 replies
1d10h

All of this seems unnecessary, and easily replaced in the provided benchmark by :

i2 = np.ascontiguousarray(pg.surfarray.pixels3d(isurf))

Which does the 100x speedup too and is a "safe" way to adjust memory access to numpy strides.

Whether the output is correct or not is left as an exercise to the author, since the provided benchmark only use np.zeros() it's kind of hard to verify

yosefk
0 replies
1d9h

I just measured this with the np.ascontiguousarray call (inside the loop of course, if you do it outside the loop when i2 is assigned to, then ofc you don't measure the overhead of np.ascontiguousarray) and the runtime is about the same as without this call, which I would expect given everything TFA explains. So you still have the 100x slowdown.

TFA links to a GitHub repo with code resizing non-zero images, which you could use both to easily benchmark your suggested change, and to check whether the output is still correct.

sylware
0 replies
1d5h

I wonder how it would perform with the latest openstreetmap tile renderer which was released not long ago.

Too
0 replies
1d13h

Clickbait title. The speedup only applies to one particular interaction between SDL and numpy.

Joker_vD
0 replies
1d7h

Yep, welcome to the world of RGBA and ARGB storage and memory representation formats, with little- and big-endianness thrown in for the mix, it's all very bloody annoying for very little gain.