More than seeing it in main, I'm happy for the "python thread slow" meme officially going away now.
Although this is nice, the problems with the GIL are often blown out of proportion: people stating that you couldn't do efficient (compute-bounded) multi-processing, which was never the case as the `multiprocessing` module works just fine.
multiprocessing only works fine when you're working on problems that don't require 10+ GB of memory per process. Once you have significant memory usage, you really need to find a way to share that memory across multiple CPU cores. For non-trivial data structures partly implemented in C++ (as optimization, because pure python would be too slow), that means messing with allocators and shared memory. Such GIL-workarounds have easily cost our company several man-years of engineer time, and we still have a bunch of embarrassingly parallel stuff that we still cannot parallelize due to GIL and not yet supporting shared memory allocation for that stuff.
Once the Python ecosystem supports either subinterpreters or nogil, we'll happily migrate to those and get rid of our hacky interprocess code.
Subinterpreters with independent GILs, released with 3.12, theoretically solve our problems but practically are not yet usable, as none of Cython/pybind11/nanobind support them yet. In comparison, nogil feels like it'll be easier to support.
And I guess what I don't understand is why people choose Python for these use cases. I am not in the "Rustify" everything camp, but Go + C, Java + JNI, Rust, and C++ all seem like more suitable solutions.
Notably, all of those are static languages and none of them have array types as nice as PyTorch or NumPy, among many other packages in the Python ecosystem. Those two facts are likely closely related.
Python is just the more popular language. Julia array manipulation is mostly better (better syntax, better integration, larger standard library) or as good as python. Julia is also dynamically typed. It is also faster than Python, except for the jit issues.
It is also faster than Python, except for the jit issues.
I was intrigued by Julia a while ago, but didn't have time to properly learn it.
So just out of curiosity: what's the issues with jit and Julia ?
Julia's JIT compiles code when its first executed, so Julia has a noticable delay from you start the program and until it starts running. This is anywhere from a few hundred milliseconds for small scripts, to tens of seconds or even minutes for large packages.
I wonder why they don't just have an optional pre-compilation, so once you have a version you're happy with and want to run in production, you just have a fully compiled version of the code that you run.
Effectively, it does - one of the things recent releases of Julia have done is to add more precompilation caching on package install. Julia 1.10 feels considerably snappier than 1.0 as a result - that "first time to plot" is now only a couple of seconds thanks to this (and subsequent plots are, of course, much faster than that).
The "issue" is Julia is not Just-in-Time, but a "Just-Ahead-of-Time" language. This means code is compiled before getting executed, and this can get expensive for interactive use.
The famous "Time To First Plot" problem was about taking several minutes to do something like `using Plots; Plots.plot(sin)`.
But to be fair recent Julia releases improved a lot of it, the code above in Julia 1.10 takes 1.5s on my 3-year old laptop
Preaching to the choir here.
Julia’s threading API is really nice. One deficiency is that it can be tricky to maintain type stability across tasks / fetches.
If only there were a dynamic language which performs comparably to C and Fortran, and was specifically designed to have excellent array processing facilities.
Unfortunately, the closest thing we have to that is Julia, which fails to meet none of the requirements. Alas.
If only there was a car that could fly, but was still as easy and cheap to buy and maintain :D
Why do people use python for anything beyond glue code? Because it took off, and machine learning and data science now rely on it.
I think Python is a terrible language that exemplifies the maxim "worse is better".
Some speculate that universities adopted it as introductory language for its expressiveness and flat learning curve. Scientific / research projects in those unis started picking Python, since all students already knew it. And now we're here
I have no idea if this is verifiably true in a broad sense, but I work at the university and this is definitely the case. PhD students are predominantly using Python to develop models across domains - transportation, finance, social sciences etc. They then transition to industry, continuing to use Python for prototyping.
To quote from Eric Raymond's article about python, ages ago:
"My second [surprise] came a couple of hours into the project, when I noticed (allowing for pauses needed to look up new features in Programming Python) I was generating working code nearly as fast as I could type.
When you're writing working code nearly as fast as you can type and your misstep rate is near zero, it generally means you've achieved mastery of the language. But that didn't make sense, because it was still day one and I was regularly pausing to look up new language and library features!"
Source: https://www.linuxjournal.com/article/3882
It doesn't go for large code bases, but if you need quick results using existing well tested libraries, like in machine learning and data science, I think those statements are still valid.
Obviously not when you're multiprocessing, that is going to bite you in any language.
but Go + C, Java + JNI, Rust, and C++ all seem like more suitable solutions.
apart from go (maybe java) those are all "scary" languages that require a bunch of engineering to get to the point that you can prototype.
even then you can normally pybind the bits that are compute bound.
If Microsoft had been better back in the say, then c# should have been the goto language of choice. It has the best tradeoff of speed/handholding/rapid prototyping. Its also statically typed, unless you tell it to not be.
#pragma omp parallel for
gets you 90% of the potential performance of a full multithreaded producer/consumer setup in C++. C++ isn't as scary as it used to be.
People choose Python the use case, regardless what that is, because it's quick and easy to work with. When Python can't realistically be extended to a use case then it's lamented, when it can it's celebrated. Even Go, while probably the friendliest of that buch when it comes to parallel work, is on a different level.
"Ray" can share python objects memory between processes. It's also much easier to use than multi processing.
How does that work? I'm not familiar with Ray, but I'm assuming you might be referring to actors [1]? Isn't that basically the same idea as multiprocessing's Managers [2], which also allow client processes to manipulate a remote object through message-passing? (See also DCOM.)
[1] https://docs.ray.io/en/latest/ray-core/walkthrough.html#call...
[2] https://docs.python.org/3/library/multiprocessing.html#manag...
Shared memory:
According to the docs, those shared memory objects have significant limitations: they are immutable and only support numpy arrays (or must be deserialized).
Sharing arrays of numbers is supported in multiprocessing as well: https://docs.python.org/3/library/multiprocessing.html#shari...
I think that 90 or maybe even 99% of cases has under 1GB of memory per process? At least it has been the case for me the last 15 years.
Of course, getting threads to be actually useful for concurrency (GIL removed) adds another very useful tool to the performance toolkit, so that is great.
`multiprocessing` works fine for serving HTTP requests or do some other subset of embarrassingly-parallel problems.
`multiprocessing` works fine for serving HTTP requests
Not if you use Windows, then it's a mess. I have a suspicion that people who say that the multiprocessing works just fine never had to seriously use Python on Windows.
Why is it a mess? What's wrong with it on Windows?
Adding on to the other comment, multiprocessing is also kinda broken on Linux/Mac.
1. Because global objects are refcounted, CoW effectively isn't a thing on Linux. They did add a way to avoid this [0], but you have to manually call it once your main imports are done.
2. On Mac, turns out a lot of the system libs aren't actually fork-safe [1]. Since these get imported inadvertently all the time, Python on Mac actually uses `spawn` [2] -- so it's roughly as slow as on Windows.
I haven't worked in Python in a couple years, but handling concurrency while supporting the major OSes was a goddamn mess and a half.
[0]: https://docs.python.org/3.12/library/gc.html#gc.freeze
[1]: https://bugs.python.org/issue33725
[2]: https://docs.python.org/3.12/library/multiprocessing.html#co...
Re (1), are there publicly documented cases with numbers on observed slowdowns with it?
I see this mentioned from time to time, but intuitively you'd think this wouldn't pose a big slowdown since the system builtin objects would have been allocated at the same time (startup) and densely located on smaller nr of pages. I guess if you have a lot of global state in your app it could be more significant.
Would also be interesting to see a benchmark using hugepages, you'd think this could solve remaining perf problems if they were due to large number of independent CoW page faults.
Replying to my self: it seems one poster case was Instagram and their very large Django app: https://bugs.python.org/issue40255#msg366835
* A lack of fork() makes starting new processes slow.
* All Python webservers that somewhat support multiprocessing on Windows disable the IOCP asyncio event loop when using more than one process (because it breaks in random ways), so you're left with the slower select() event loop which doesn't support more than 512 connections.
Probably a very small minority of Python codebases run on Windows, no? That's my impression. It would explain why so many people are unaware of multiprocessing issues on Windows. I've never ran any serious Python code on windows...
Managing processes is more annoying than threads, though. Incl. data passing and so forth.
The "ray" library makes running python code on multi core and clusters very easy.
Interesting - looking at their homepage they seem to lean heavily into the idea that it's for optimising AI/ML work, not multi-process generally.
You can use just ray.core to do multi process.
You can do whatever you want in the workers, I parse JSONs and write to sqlite files.
Although its great the library helps with multicore Python, the existence of such package shouldnt be an excuse not to improve the state of things in std python
On the other hand, this particular argument also gets overused. Not all compute-bounded parallel workloads are easily solved by dropping into multiprocessing. When you need to share non-trivial data structures between the processes you may quickly run into un/marshalling issues and inefficiency.
as the `multiprocessing` module works just fine.
Something that tripped me up when I last did `multiprocessing` was that communication between the processes requires marshaling all the data into a binary format to be unmarshaled on the other side; if you're dealing with 100s of MB of data or more, that can be quite some significant expense.
While the title is correct, it is a bit misleading, because disabling the GIL breaks the asyncio tests. It's like saying the engine can be removed from my car. Sure, it can, but the car won't work.
Well this release will break any code that uses threads. The goal of this particular release is to work for thread-free programs.
How do single-threaded programs benefit from a lack of GIL?
Speed. Admittedly not quite as much so the way this patch is implemented, since it just short circuits the extra function calls, doesn’t omit them entirely.
Removing the GIL results in slower execution. Without the guarantees of single thread action, the interpreter needs to utilize more locks under the hood.
Not in single threaded code.
Umm, yes it does? For the longest time, Guido’s defense for the GIL was that all previous efforts resulted in an unacceptable hit to single threaded performance.
Read PEP-703 (https://peps.python.org/pep-0703/#performance) where the performance hit is currently 5-8%
That's to make it thread safe without the GIL.
If you only care about single thread there's all kinds of stuff you can do.
How to ensure there are no other threads, confidently enough that one can turn thread safety off?
From when I was reading the proposal, the idea is that until a C extension is loaded, you can assume that there are no other threads. Then when a module is loaded, by default you assume that it uses threads but modules that are thread free can indicate that using a flag, so if a module indicates it's thread free then you continue running without the thread safety features.
Disabling the GIL can unlock true multi-core parallelism for multi-threaded programs, but this requires code to be restructured for safe concurrency, which isn't that difficult it seems:
When we found out about the “nogil” fork of Python it took a single person less than half a working day to adjust the codebase to use this fork and the results were astonishing. Now we can focus on data acquisition system development rather than fine-tuning data exchange algorithms.
We frequently battle issues with the Python GIL at DeepMind. In many of our applications, we would like to run on the order of 50-100 threads per process. However, we often see that even with fewer than 10 threads the GIL becomes the bottleneck. To work around this problem, we sometimes use subprocesses, but in many cases the inter-process communication becomes too big of an overhead. To deal with the GIL, we usually end up translating large parts of our Python codebase into C++. This is undesirable because it makes the code less accessible to researchers.
Maybe they should look in to translating parts of their code base to Shedskin Python. It compiles (a subset of) Python to C++.
How's it different from Cython, which compiles a subset of Python to C or C++?
Shedskin has stricter typing, and about 10-100 times performance vs Cython.
It could remove the locking/unlocking operations.
Doesn't removing the GIL imply adding back new, more granular locks?
Sort of, but the biased reference counting scheme they’re using avoids a lot of locks for the common case.
Removing the GIL requires more locking/unlocking operations. For single-threaded program, it's a performance penalty on average: https://peps.python.org/pep-0703/#performance
They don't.
They don't benefit much from a lack of GIL, perhaps a small reduction in overhead. This feature is a first step towards being able to disable the GIL completely. It is intended to be implemented in a very conservative manner, bit by bit and so for this first step it should work for thread free code.
This isn't correct. TFA said that small threaded programs had been run successfully, but that the test suite broke in asyncio.
Async I/O and threads are two different things, and either can be present in real code without the other.
"small threaded programs had been run successfully"
I have ran a lot of programs containing race conditions successfully many times until I ran into an issue.
Not quite sure what your comment means exactly or how it implies what I said is incorrect.
At any rate, test_asyncio contains a lot of tests that involve threads and specifically thread safety between coroutines and those tests fail. As far as async I/O and threads being distinct, I mean sure that is true of a lot of features but people mix features together and mixing asyncio with threads will not work with this particular release.
Really, any code? I thought they were adding fine-grained locks to the python objects themselves? Are you saying that if I share a python list between two threads and modify it on one and read it on the other, I can segfault python?
With this particular release, yes it will segfault. But down the road what you state is correct, this is just a first step towards that goal.
Couldn’t it work if each threads only touch thread-specific data structures?
Being able to remove the engine from my car with the push of a button would be a pretty amazing feature!
Analogy breaking down and all, but …
Only as long at it’s as easy to put back in
While the title is correct, it is a bit misleading, because disabling the GIL breaks the asyncio tests. It's like saying the engine can be removed from my car. Sure, it can, but the car won't work.
You're not supposed to drive a car that hasn't got out of the research and development laboratory either, so there's that.
You also need to compile Python with a special flag activated. It’s not only an environment variable or a command line option.
I mean, you're not wrong, but also it's a huge feat to provide a toggle for a major feature like the GIL. Though, if it's just asyncio that's broken, perhaps it's not like removing your engine, but rather your antilock brakes :)
EDIT:
[the test synchronous programs] all seem to run fine, and very basic threaded programs work, sometimes
Perhaps this is closer to removing the oil pan
This comment feels very disingenuous because non-threaded programs do in fact work.
I've been programming in Python for over 6 years now and every week I learn something new. But recently I've been thinking about moving to a more capable language with proper concurrency for backend API requests (FastAPI sucked).
I also want types, so Elixir is not in the picture. I dabbled in Rust a bit. Although I was able to get the hang of things and build a CLI tool pretty quickly, I'm worried I'll have to deal with numerous quirks later if I keep using Rust (like numerous string types). Is that something to be worried about if all I want from Rust is Python+Types+Concurrency?
Python supports types! https://www.mypy-lang.org/
This seems a bit like saying "JavaScript supports types!" because of typescript.
It's not a separate language, you can just start typing your programs right now.
except nothing enforces your types at run time, you can have typi hints all you want and everyone else cn ignore them
Same exists with laws. And in most languages if I want to avoid the type system and hand you a goldfish instead of an int, I can. It just may take more effort. Other language will blow up too if you hand them strange things. Just like python those languages have ways to verify you're not passing goldfish, You just may need more or less effort to use them
Thats a lot of words for: yeah, you are right
Everyone else can ignore every coding guideline your team sets up, that’s why code reviews and managers exist.
Python actually has type safety though, as you can't do `'1' + 1` like in JS (not that a linter wouldn't scream at you). If I hear another "I compile <insert language> so I know it will work, but you can't do that in Python" I'll lose it. Having the compiler not complain that the types match is not effing "testing".
Having the compiler not complain that the types match is not effing "testing".
It absolutely is - it's just testing at a _very_ low level of correctness, and is not sufficient for testing actual high-level functionality.
Both true. What’s wrong with them?
Types are way more mainstream in the JS ecosystem than they are in the Python ecosystem. If you want a "scripty" language with types, then TypeScript is a reasonable choice.
What sucked about FastAPI?
Not fast enough! I used it to call llama.cpp server but it would crash if requests were "too fast". Calling the llama.cpp server directly solved the issue.
Did you get "connection reset by peer" when you sent a bit too many requests perchance? I've never found the source of that in my programs. There's no server logging about it, connections are just rejected. None of the docs talk about this.
I don't remember the exact error name but the FastAPI server would just freeze.
a freeze is not a crash.
You ran FastAPI app proper server like Uvicorn?
This sounds like a layer 8 problem
Honestly, that doesn't sound right. I'm curious what you mean by crash thoguh.
Interesting, I've used fastAPI to serve many thousands of requests a second (per process) for a production system. How were you buffering the requests?
Swift or Kotlin might be what you’re looking for but nobody uses Swift for backend really, and I’m not sure about Kotlin.
Kotlin has basically replaced java for many spring shops. It's really common in backend nowadays.
As someone in a similar boat, just pick Golang. People often dislike the basic syntax, explicit error checking and the lack of algebraic data types. I did. Rust just seems like it offers so much more and you fear missing out on something really cool.
But once you get over it, you realize Golang has a good type system, concurrency model, package manager that's not pip, fast compile times, and static binaries. For most cases it will also offer great performance.
It has everything you need to build APIs, CLI tools, web servers, microservices - pieces which will form the building blocks of your software infrastructure. I have heard numerous stories of people being productive in Go in a few days, sometimes even hours.
If Python is 0 quality of life and Rust is a 100, Golang gets you all they way up to 80-90. That last bit is something you might never need.
Rust is a great language and something I hope to be in proficient someday, but I ll save it for where I actually need that last microsecond of performance.
Golang? Pretty easy to pick up coming from Python and proper concurrency.
GoLang as others suggested is a good pick. If you're in the "I don't like coloured functions" camp it's likely your best bet for web workloads.
If you're not a fan of GoLang's spartan syntax and are cool with async/await, C#/dotnet core is a great experience on all platforms. IMO it has the best async/await implementation (it originated there) on top of a multi-threaded event loop. ASP.NET is a great web framework and it has great library support for everything else. As someone who avoids "traditional" ORMs (Hybernate, Django) I really like Dapper.
Python+Types+Concurrency
Sounds like Groovy. But I wouldn't recommend it. Also the career padding hype is gone.
Strong recommendation to look into ASP.NET Core with C#. It gives static typing, easy first-class concurrency, it runs on all platforms and now can be AOT compiled if that's your thing, if not - can still produce just a single executable if you choose to. Also its CLI is nice and similar to cargo. Naturally, you won't be having performance issues.
Sounds like you want Julia. It looks like Python, but also has, what you ask for.
You can even run Python from Julia, so that alleviates the problem with a lack of libraries somewhat.
i moved from python timo Rust and i really like it. moving to a staticly typed language is bit of a mind fuck but cargo is amazing and so is the speed and portability of rust.
You might wanna try Nim [0].
Nim is a statically-typed compiled language with very pythonic syntax. It's easy to learn, especially if you already know Python, because Nim's stdlib is heavily inspired by it.
For multithreading in Nim see:
Weave - https://github.com/mratsim/weave
Malebolgia - https://github.com/Araq/malebolgia
[0] - https://nim-lang.org
DevX in Rust for backend development is quite good nowadays. Things are WAY easier than it was a couple of years ago.
I migrated a couple of projects from Java, TS and Go to Rust and honesly, I couldn't be happier.
It will be very exciting to see how much faster they'll be able to make vanilla python. The value proposition is being challenged by the plethora of tools aiming to alleviate those issues. Speed improvers like Mojo, pytorch, triton, numba, taichi come to mind.
There are so many different attempts at solving this problem that that the last time I wanted to try one of them, I found myself overwhelmed with options. I chose taichi which is pretty fun and easy to use, although somewhat limited in scope.
Mojo should be viewed as an attack on the Python ecosystem due to it being a superset. It can consume Python, but it itself is not Python.
Taichi is really underrated, it works across all platforms (including Metal), has tons of examples and the code is easy to write. And lastly, it integrates with the ecosystem and doesn't displace it.
great demo reel of what Taichi can do, https://www.youtube.com/watch?v=oXRJoQGCYFg
I think "attack" is a bit much; C++ isn't an attack on C.
While Ken Thompson never used the word attack, he certainly didn't have a positive opinion of the language or of Bjarne Stroustrup either in terms of his technical contributions or his handling of C++ adoption:
https://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-p...
Thanks for posting that, I thought it was a great read (as someone who last used C++ probably about 25 years ago...)
Given that so many of the criticisms were about C++ being over-complicated, I do worry about languages just becoming more and more difficult over time as everyone wants their pet feature added, but due to backwards-compatibility concerns old/obsolete features are rarely removed. For example, take Java. I think that a ton of goodness has been added to Java over the decades, and for people who have been working with it throughout, it's great. But it feels like the learning curve for someone just getting involved with Java would be really steep, not just because there is just a ton of stuff, but because without having the context of the history and how things were added over time (usually with an eye towards backwards-compatibility) it feels like it would be hard to wrap your head around everything. If you're writing your own new program that's not really a problem as you can just stick to what you know, but if you're getting into an existing codebase that could use lots of different features it feels like it could be daunting.
It's been quite a while since I've programmed in Java, so I'm just speculating, but would be curious how other folks relatively new to the language in production environments find the learning curve.
I was doing primarily go development since it was first released up until a few years ago when the pandemic allowed me the opportunity to move into a full time remote gig doing primarily Java development, so I can answer this as I hadn't done Java at that point for over 10 years, so I felt completely new (what Java I did before that, I was mostly trying to not use Java by using play framework or jruby on rails)
As someone in the boat you mentioned (sort of) the short answer is modern Java development for 90% of tasks is not complicated at all: it's very much like any programming language used in a bizdev/corp environment -- you are mostly using a framework and a bunch of DSLs. Almost everyone uses Intellij and Gradle for IDE and build, and Junit5 or Spock for unit testing. I passed a technical interview mostly on Spring Framework concepts knowing almost nothing about it, nor having ever used it in production by simply just having the documentation open while I was being interviewed so I could look up the answers. Any language that is popular is going to have frameworks with decent documentation that help you be productive quickly, so I just jumped in doing Spring. The java stuff came as needed, or I referenced something like Effective Java (great book), or a Baeldung article. Java world has made some great strides since the 2000's and early 2010s of XML chaos. It took a while, but I feel like it's in a really good spot and getting better.
As an aside, if it hasn't been mentioned to you before, if you like simplicity in a language, but still incredibly productive, you might enjoy Go.
I truly appreciate your answer, as someone that primarily does C#, but learned Java first back around 2003 to 2005. More so because my son is going to be learning Java next year in high school, and I know he'll be looking to me for help, and I'd like to at least be at the same language version as he's learning.
I've also been looking at Go quite a bit, especially from a lot of commentary even recently on YouTube, as well as plenty of time during work hours to experiment with languages I'm not familiar with. I've been deciding between Python and Go, and I picked up Python very quickly for the language, but ultimately I think it's the libraries and ecosystem around both that will decide it for me. Just good to see another vote for Go, especially when comparing to Java. I feel like I'm in the same boat as you seem to be, where it's all just syntax and learning frameworks, while having enough experience at this point to also pick up the smaller details while quickly being effective.
No problem. I am also a huge Dream Theater fan.
I haven't used Java since v1.5, roughly around 2005. I do use C# quite a bit, and have since v1.1 (I got into .NET through VB.NET at v1.0, as I had just learned VB through schooling). I look back at all the features that have been added over time, and since I have followed along as it has developed, I embrace the changes. They have made my code more concise and easier to read, with less boilerplate. When I think about someone brand new getting into the language, I truly hope they have a mentor. I can't imagine having to work with someone new to the language, and not being given plenty of time to get them up to speed with both current codebases, as well as newer ones (and I haven't even adopted the newest language versions unless it was suggested to me through Resharper, and I looked up the feature and found how it made my code easier to understand).
I feel like the comparison with Java is similar, as both Oracle and Microsoft are large corporations that largely control the language and ecosystem, while also having open source implementations. C# has made some more major changes to the language itself than Java, but both have diverged quite a bit from what they were back in 2005.
I'll find out next year, as my son goes into high school, and they are offering Java software development classes. I got Apple Basic on the Apple ][e, QuickBasic, and a bit of C++ in high school, and graduated in 1997. I wouldn't be surprised if he's going to be dealing with Java v1.5 instead of the latest features. At least I'll have a motivation to learn the latest features if he enjoys it and keeps working at it.
I wouldn't be surprised if he's going to be dealing with Java v1.5 instead of the latest features.
Well I certainly hope not… LTS for Java 5 probably hit end of life when he was 2. I’d say most likely the version being installed is Java 17, or 11 if theyre really dated. Java 8 would likely be the absolute minimum, mainly due to the industry being so slow to migrate away from it. Newer programmers are unlikely to ever know what Java felt like before generics and lambdas existed.
Thanks for sharing.
perhaps Erik Naggum, scourge of Usenet, was right when he said: “life is too long to know C++ well.”
I feel similar about Rust. I have read “the book”. Did a couple of small projects. It is also sort of committee administered language that sucks up tiny features like a giant vacuum cleaner sweeping the streets and changes every hour.
that sucks up tiny features like a giant vacuum cleaner sweeping the streets and changes every hour.
Can you provide some (preferably recent) examples? My experience has been the opposite. It feels like new features are given a lot of thought and deliberation, and stablizing even something small can take upwards of a year.
I will agree that today's Rust is different from Rust 1.0, but I don't see that as necessarily a bad thing. More that they were able to come to a stable 1.0-ready core fairly early on, and have been adding on the more tricky parts bit by bit since then.
I wasn't sure whether to agree with this or not, so I finally took a slightly closer look at Mojo just now.
This depends on how they license it going forward, and whether they make it open, or use being a superset as a way to capture then trap python users in their ecosystem, and I don't think we have a certain answer which path they'll take yet.
The way they let you mix python compatible code with their similar but more performant code [1] looks interesting and provides a nice path for gradual migration and performance improvements. It looks like one of the ways they do this is by letting you define functions that only use typed variables which is something I would like to see make its way back to CPython someday (that is optionally enforcing typing in modules and getting some performance gains out of it).
[1] https://en.wikipedia.org/wiki/Mojo_(programming_language)#Pr...
The typing thing is really unnecessary and a step backwards, IMO. Added typing is maybe great for managing a code base, but it isn’t necessary for performance with a compiler that can do type inference.
It looks like one of the ways they do this is by letting you define functions that only use typed variables which is something I would like to see make its way back to CPython someday (that is optionally enforcing typing in modules and getting some performance gains out of it).
This is already how Cython and MYPYC work. You add standard PEP-484 type annotations, and they use that to infer where code can be compiled to native instructions.
https://cython.readthedocs.io/en/latest/src/tutorial/pure.ht...
I didn't mention C++ at all. Was that my argument? This thread is about Python, the GIL. Mojo was brought up as a way to speed up Python code.
C++ predates on C in a similar way to how Mojo predates on Python. At least C++ has extern C.
https://docs.modular.com/mojo/manual/python/#call-mojo-from-...
As shown above, you can call out to Python modules from Mojo. However, there's currently no way to do the reverse—import Mojo modules from Python or call Mojo functions from Python.
One way street. Classic commons harvesting.
I think the person you're replying to is just trying to use an analogous example; you didn't need to bring it up.
Regardless, I think it's a bit alarmist and overly aggressive to assume nefarious intent. Have the developers acted in ways such that this reputation is deserved?
Also, little OT, but it took me unreasonably long to understand that you meant "predates" as the verb form of "predator", not as in "comes before chronologically". The phrase "preys on" may be more clear.
Never heard of taichi before looks promising. Do you know any shop that uses it for prod code?
ETH Zurich is using it for their physics sim courses, University of Utah is using it for simulations (SIGGRAPH 2022), OPPO (they make smart devices running Android), Kuaishou uses it for liquid and gas simulation on GPUs. Lots of GPU accelerated sim stuff.
https://www.researchgate.net/publication/337118128_Taichi_a_...
Cython is also a superset, is Cython also guilty of such crimes?
Cython is dependent on CPython. Cython outputs Python extension modules, which can only be used when imported into a standard CPython environment, and which interoperate cleanly with the rest of the Python ecosystem.
Mojo explicitly does the opposite, allowing Mojo to use Python but requiring Mojo to be in control, while making it hard/impossible for code written in Mojo to benefit code written in Python:
Our long-term goal is to make Mojo a superset of Python (that is, to make Mojo compatible with existing Python programs). […] Mojo lets you import Python modules, call Python functions and interact with Python objects from Mojo code. […]
As shown above, you can call out to Python modules from Mojo. However, there's currently no way to do the reverse—import Mojo modules from Python or call Mojo functions from Python. […]
This pattern doesn't work because you can't pass Mojo callbacks to a Python module.
Since Python can't call back into Mojo, one alternative is to have the Mojo application drive the event loop and poll for updates.
No comment on whether this should be viewed as an attack.
Mojo should be viewed as an attack on the Python ecosystem due to it being a superset
A superset of pure .py code, not the numpy, cython, ctypes and stuff.
But once you get "superset" of CPython's C bindings, congratulations, you get GIL.
Mojo is not a superset. Not even close. They say they are AIMING for it to be a superset OF THE SYNTAX. This is a subtle yet enormous difference to being a superset of the language itself.
ELI5
I get in concept what the GIL is.
But what's the impact of this change?
Packages will now break, for the hope of better overall performance?
If any package depends on the GIL, it will be enabled. Packages won't break
What packages might depend on the GIL and why would they need it?
Almost all packages depend on the GIL (at least at first).
They assume they can "take the GIL", and then go and look at the various Python datastructures you passed them without worrying about them changing as they are being read.
The later, when writing out their answer (which might involve editing something they were given), they can assume they are not changing. For example, you could write code which extends the length of a list to 100, then fill in all the members, not worrying half way through your loop another thread shrinks the list back down.
This only applies to native extensions.
Any C code that uses global variables casually.
Packages with native components that have not been updated for the semantics changes.
Previously people basically didn't bother to write multithreaded Python at all due to the GIL. Threads were primarily used when you had multiple pieces of work to do which could end up blocked on independent I/O. Which is common and useful of course, but doesn't help with the performance of CPU-bound Python code.
Even outside of high-intensity CPU work, this can be useful. A problem lately is that a lot of code is written using Python's native asyncio language features. These run single-threaded with async/await to yield execution, much like in NodeJS, and can achieve pretty good throughput even with a single thread (thousands of reqs/second).
However, a big problem is that any time you do _any_ CPU work, you block all other coroutines, which causes all kinds of obscure issues and ruins your reqs/second. For example, you might see random IO timeouts in one coroutine which are actually caused by a totally different coroutine hogging the CPU for a bit. It can be very hard to get observability into why this is happening. asyncio provides a `asyncio.to_thread()` function [1] which can help to take blocking work off the main thread, but because of the GIL it doesn't truly allow the CPU-bound to avoid interfering with other coroutines.
[1] https://docs.python.org/3/library/asyncio-task.html#asyncio....
Would GIT affect how Python runs across multiple Python processes?
I'm asking because I encountered a weird phenomenon before.
I use a simple Python lib called "schedule" which is to run some tasks periodically (not precise). And I often run a script multiple times (with different arguments) to monitor something say, every 30 seconds. So they're in three separate Python Interpreter processes.
What I've noticed is that while when I initiated them, they were something like 5 seconds apart, they eventually will end up running in sync. Probably not related to GIL at all, but I guess do no harm to ask.
The GIL is per interpreter/process, so this wouldn't be related as far as I know.
The GIL only really kicks in if you use threads in a single process. Then, the GIL will only let one single thread do actual work at a time, and will trade off which thread gets to do work. The other threads can wait on IO stuff (web requests, the file system, etc) but they can't do number crunching or data processing at the same time.
That's a really interesting observation though, I wonder what _is_ causing your separate processes to sync up?
Will this also support threads spanning multiple cores, or is that unrelated?
Edit: https://peps.python.org/pep-0703/ suggests it will support multiple cores, unless the current work does not yet achieve that.
First I read the news of tranched bread, and now this?! What a time!
I was a bit disheartened when the Unladen Swallow project [1] fizzled out. Great to see Python back on the core optimization track.
tranched bread?
I could be wrong, but I think it’s a clever alternative to the expression “best thing since sliced bread”.
ahahah. I was thinking that it was a new python library or something that I hadn't heard of and was coming up short with Google.
"tranced bread" is a fun name for some sort of library that breaks up files into pieces for better resilience for sending, like over BitTorrent.
tranced bread sounds like what you go home with after a goa festival
OP is a subprime mortgage seller.
GP is joking that this is the best thing since sliced (tranched) bread.
It's neat that this will eventually improve some python code, but at the end of the day, it's still a badly typed language, which will still be slower and less safe than more modern languages.
Learn golang or rust instead. Impressive that they are managing this though!
Learn golang or rust instead.
Bad take. Learn golang and rust and python.
You should use the language which is suited to the task, sometimes that's golang sometimes that's python and sometimes it's rust.
It's impressive that the python team as a whole continues to improve in such big ways after more than 30 years of development. It's more impressive that the python team managed to navigate 2to3 and come out stronger.
Learn golang
Having to "if err != nil" every single function call is a big put off - imagine having to "try catch" everything in a language like C#!
Python is a bad programming language, but a great scripting one. You use it when you either have to do scripting or need a framework that is only offered in Python (the entirety of ML domain). Otherwise, the only reason to pick it is if you have no choice, you don't want to do a rewrite or you are being held at a gun point.
Instead, pick C#/F#, Kotlin/Clojure or Rust depending on the use case.
Rust, sure, but typing in golang isn't superior to python's type annotations.
Also Rust became a quite popular tool for python extensions, where you can offload performance to rust and business logic to python.
Bad take. One doesn't just migrate massive ecosystems to "go or rust". Python, like it or not, is the lingua franca of ML/AI and science. Worse, as a user of Go/Rust it's very disingenuous to expect the same kind of iteration ability in those languages as you get in Python.
With tools like pyright now + the work on nogil everyone benefits from this using Python.
but Python has my libraries, and go and rust do not.
> 99% of my compute is offloaded to compiled BLAS or CUDA.
types are enough.
and memory and type safety are not terribly relevant for my use cases.
Does anyone know why the biased reference counting approach described in https://peps.python.org/pep-0703/ just has a single thread affinity requiring atomic increments/decrements when accessed from a different thread? What I’ve seen other implementations do (e.g. various Rust crates implementing biased reference counting) is that you only increment atomically when moving to a new thread & then that thread does non-atomic increments/decrements until 0 is hit again and then an atomic decrement is done. Is it because it’s being retrofitted into an existing system where you have a single PyObject & can’t exchange to point to a new thread-local object?
We could implement ownership transfer in CPython in the future, but it's a bit trickier. In Rust, "move" to transfer ownership is part of the language, but there isn't an equivalent in C or Python, so it's difficult to determine when to transfer ownership and which thread should be the new owner. We could use heuristics: we might give up or transfer ownership when putting an object in a queue.SimpleQueue, but even there it's hard to know ahead of time which thread will "get" the enqueued object.
I think the performance benefit would also be small. Many objects are only accessed by a single thread, some objects are accessed by many threads, but few objects are exclusively accessed by one thread and then exclusively accessed by a different thread.
I think you would do it on first access - “if new thread, increment atomic & exchange for a new object reference that has the local thread id affinity”. That way you don’t care about whether an object actually has thread affinity or not and you solve the “accessed by many threads” piece. But thanks for answering - I figured complexity was the reason a simpler choice was made to start with.
But this would now make the reference count increment require a conditional? It’s a very hot path, and this would cause a slowdown for single-threaded Python code.
It's already taking a conditional. Take a look at the PEP:
if (op->ob_tid == _Py_ThreadId())
op->ob_ref_local = new_local;
else
atomic_add(&op->ob_ref_shared, 1 << _Py_SHARED_SHIFT);
So you're either getting a correct branch prediction or an atomic operation which will dominate the overhead of the branch anyway. All this is saying is in the else branch where you're doing the atomic add, create a new PythonObj instance that has `ob_tid` equal to `_Py_ThreadId`. This presumes that Py_INCREF changes the return type from void to `PythonObj*` and this propagates out so that futher on-thread references use the newer affinity (branch condition is always taken to the non-atomic add instead of the atomic one). It's easier said than done and there may be technical reasons why that's difficult / not possible, but worth exploring eventually so that access by multiple threads of a single object doesn't degrade to taking atomic reference counts constantly.Extra links for the no gil work for anyone else curious about this [0], [1].
[0] Multithreaded Python without the GIL https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsD...
[1] Github repo https://github.com/colesbury/nogil
Further context on noGIL in general: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
Those links are both fairly old. See PEP 703 [0] and Sam’s nogil-3.12 repo [1] for more current versions.
Is there a good overview of the bigger picture here?
Related:
Intent to approve PEP 703: making the GIL optional - https://news.ycombinator.com/item?id=36913328 - July 2023 (499 comments)
Finally, looking forward to the benchmarks of many tools!
PEP-703 predicted in June 2023 an overhead of 15% when running with NoGIL: https://discuss.python.org/t/pep-703-making-the-global-inter...
A great video tour of the GIL - https://www.youtube.com/watch?v=Obt-vMVdM8s
If anyone wondered GIL = Global Interpreter Lock
This is exciting, can't wait
We are now one step closer to PythonOTP
I wish I had your optimism. Thoughtless bandwagon-y "criticism" is extraordinarily persistent.
There's no need to pretend Python has virtues which it lacks. It's not a fast language. It's fast enough for many purposes, sure, but it isn't fast, and this work is unlikely to change that. Faster, sure, and that's great.
Although true, it doesn't mean they can't improve its performance.
Working with threads is a pain in Python. If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.
Removing the GIL and refactoring some of the core will unlock levels of concurrency that are currently not feasible with Python. And that's a great deal, in my opinion. Well worth the trouble they're going through.
Working with threads is a pain regardless of which language you use.
Some might say: "Use Go!" Alas: https://songlh.github.io/paper/go-study.pdf
After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.
Where it doesn't work is in a generic worker pool where you need to put mutex locks around everything -- and then prod randomly deadlocks in ways the developer boxes can't recreate.
That's not true at all. F#, Elixir, Erlang, LabVIEW, and several other languages make it very easy. Python makes it incredibly tough.
I disagree, Python makes it incredibly easy to work with threads in many different ways. It just doesn't make threads faster.
The whole purpose of threads is to improve overall speed of execution. Unless you're working with a very small number of threads (single digits), that's a very hard to achieve goal in Python. I wouldn't count this as easy to use. It's easy to program, yes, but not easy to get working with reasonably acceptable performance.
And the python people would just point to multiprocessing...which works pretty well.
Which has its own set of challenges and yet another implementation of queue.
In what way? Threading, asyncio, tasks, event loops, multiprocessing, etc. are all complicated and interact poorly if at all. In other languages, these are effectively the same thing, lighter weight, and actually use multicore.
If I launch 50 threads with run away while loops in Python, it takes minutes to laumch and barely works after. I can run hundreds of thousands and even millions of runaway processes in Elixir/Erlang that launch very fast and processes keep chugging along just fine.
This is where Python's GIL bit me: I was more than familiar with how to shoot myself in the foot using threads in other languages, and careful to avoid those traps. Threads spun up only in situations where they had their own work to do and well-defined conditions for how both failure and success would be reported back to the thread that requested it, along with a pool that wouldn't exceed available resources.
Like every other language I've used this approach with, nothing bad happened - the program ran as expected and produced correct results. Unlike every other language, spreading calculations across multiple cores didn't appreciably improve performance. In some cases, it got slower.
Eventually scrapped it all, and went with an approach closer to what I'd have done with C and fork() decades ago... Which, to Python's credit, was fairly painless and worked well. But it caught me off-guard, because with asyncio for IO-bound stuff, it didn't seem like threads really have much of a purpose in Python, other than to be a tripwire for unwary and overconfident folks like myself!
Not disagreeing. The only case for threading in python is for spinning something to handle IO.
But now with async even that goes away.
It's not such a big pain in every language. And certainly not as hard to get working with acceptable performance in many languages.
Even if you have zero shared resources, zero mutexes, no communication whatsoever between threads, it's a huge pain in Python if you need +10-ish threads going. And many times the GIL is the bottleneck.
This may be a case of violent agreement, but there are a few clear cases where multithreading is easily viable. The best case is some sort of parallel-for construct, even if you include parallel reductions, although there may need to be some smarts around how to do the reduction (e.g., different methods for reduce-within-thread versus reduce-across-thread). You can extend this to heterogeneous parallel computations, a general, structured fork-join form of concurrency. But in both cases, you essentially have to forbid inter-thread communication between the fork and the join parameters. There's another case you might be able to make work, where you have a thread act as an internal server that runs all requests to completion before attempting to take on more work.
What the paper you link to is pointing out, in short, is that message passing doesn't necessarily free you from the burden of shared-mutable-state-is-bad concurrency. The underlying problem is largely that communication between different threads (or even tasks within a thread) can only safely occur at a limited number of safe slots, and any communication outside of that is risky, be it an atomic RMW access, a mutex lock, or waiting on a message in a channel.
Concurrency with rayon in Rust isn't pain, I'd say. It's basically hidden away from the user.
as you know thats mostly threads in general. Any optimisation has a drawback so you need to choose wisely.
I once made a horror of a thing that synced S3 with another S3, but not quite object store. I needed to move millions of files, but on the S3 like store every metadata operation took 3 seconds.
So I started with async (pro tip: its never a good idea to use async. its basically gotos with two dimensions of surprise: 1 when the function returns, 2 when you get an exception ) I then moved to threads, which got a tiny bit extra performance, but much easier debugability. Then I moved to multiprocess pools of threads (fuck yeah super fast) but then I started hitting network IO limits.
So then I busted out to airflow like system with operators spawning 10 processes with 500 threads.
it wasnt very memory efficient, but it moved many thousands of files a second.
You seem to be implying that there is something inherently slow to Python. What?
This topic is an example: a detail of one particular implementation, since GIL is definitely not inherent to the language. Just the usual worry about looseness of types?
CPython is slow. That's not really something you can dispute.
It is a non-optimizing bytecode interpreter and it makes no use of JIT compilation.
JavaScript with V8 or any other modern JIT JS engine runs circles around it.
Go, Java, and C# are an order of magnitude faster but they have type systems that make optimizing compilation much easier.
There's no language-inherent reason why Python can't be at least as fast as JavaScript.
I've read that it can't even be as fast as JS, because everything is monkey-patchable at runtime. Maybe they can optimize for that when it doesn't happen, but remains to be seen.
I've heard similar claims but I don't think it's true.
JavaScript is just as monkey-patchable. You can reassign class methods at runtime. You can even reassign an object's prototype.
Existing Python JIT runtimes and compilers are already pretty fast.
Python is probably much more monkey patchable. Almost any monkey patching that JavaScript supports also works in Python (e.g. modifying class prototype = assigning class methods), but there are a few things that only Python can do: accessing local variables as dict, access other stack frames, modifying function bytecode, read/write closure variables, patching builtins can change how the language works (__import__, __build_class__). Many of them can make a language hard to optimize.
You can always use optimistic optimization strategies where you profile the fast path and optimize that. When someone does something slow, you tell them to stop doing it if they want better performance.
JavaScript doesn't have to contend with a plethora of native extensions (which, to be fair, are generally a workaround for python slowness).
JavaScript, at least on the Node.JS side, make plenty use of native extensions written in C++ https://nodejs.org/api/addons.html
In any case, that should be irrelevant to getting a reasonably performant JIT running. Lots of AOT and JIT compiled languages have robust FFI functionality.
The native extensions are more relevant when we talk about removing the GIL, since lots of Python code may call into non thread safe C extension code.
There are worse hills to die on than this. But the Python ecosystem is very slow. It's a cultural thing.
The biggest impact would be completely redoing package discovery. Not in some straightforward sense of "what if PyPi showed you a Performance Measurement?" No, that's symptomatic of the same problem: harebrained and simplistic stuff for the masses.
But who's going to get rid of PyPi? Conda tried and it sucks, it doesn't change anything fundamental, they're too small and poor to matter.
Meta should run its own package index and focus on setuptools. This is a decision PyTorch has already taken, maybe the most exciting package in Python today, and for all the headaches that decision causes, look: torch "won," it is high performance Python with a vibrant high performance ecosystem.
These same problems exist in NPM too. It isn't an engineering or language problem. Poetry and Conda are not solutions, they're symptoms. There are already too many ideas. The ecosystem already has too much manic energy spread way too thinly.
Golang has "fixed" this problem as well as it could for non-commercial communities.
The "Python ecosystem" includes packages like numpy, pytorch & derivatives which are responsible for a large chunk of HPC and research computing nowadays.
Or did you mean to say the "Python language"?
The "& derivatives" part is the problem! Torch does not have derivatives. It won. You just use it and its extensions, and you're done. That is what people use to do exciting stuff in Python.
It's the manic developers writing manic derivatives that make the Python ecosystem shitty. I mean I hate ragging on those guys, because they're really nice people who care a lot about X, but if only they could focus all their energy to work together! Python has like 20 ideas for accelerated computing. They all abruptly stopped mattering because of Torch. If the numba and numpy and scikit-learn and polars and pandas and... all those people, if they would focus on working on one package together, instead of reinventing the same thing over and over again - high level cross compilers or an HPC DSL or whatever, the ecosystem would be so much nicer and performance would be better.
This idea that it's a million little ideas incubating and flourishing, it's cheerful and aesthetically pleasing but it isn't the truth. CUDA has been around for a long time, and it was obviously the fastest per dollar & watt HPC approach throughout its whole lifetime, so most of those little flourishing ideas were DOA. They should have all focused on Torch from the beginning instead of getting caught up in little manic compiler projects. We have enough compilers and languages and DSLs. I don't want another DataFrame DSL!
I see this in new, influential Python projects made even now, in 2024. Library authors are always, constantly, reinventing the wheel because the development is driven by one person's manic energy more than anything else. Just go on GitHub and look how many packages are written by one person. GitHub & Git, PyPi are just not adequate ways to coordinate the energies of these manic developers on a single valuable task. They don't merge PRs, they stake out pleasing names on PyPi, and they complain relentlessly about other people's stuff. It's NIH syndrome on the 1m+ repository scale.
yeah. like xkcd 927 to the nth degree.
Python is inherently slow. That’s why people tend to rewrite bits that need high performance in C/C++. Removing the GIL is a massively welcome change, but it isn’t going to make C extensions go away.
This is entirely fair, and I wish I'd been a little less grumpy in my initial reply (I assign some blame to just getting over an illness). Thank you for the gentle correction!
That said - I think it's fair to be irritated by people who write Python off as entirely useless because it is not _the fastest_ language. As you rightly say - it's fast enough for many purposes. It does bother me to see Python immediately counted out of discussions because of its speed when the app in question is extremely insensitive to speed.
It’s all about values.
I have been on teams where Python based approaches were discounted due to “speed” and “industry best practice” and then had the very same engineers create programs that are slow by design in a “fast” language and introduce needless complexity (and bugs) through “faster” database processes.
Like you said, it’s the thoughtless criticism. The meme. I am happy for Python to lose in a design analysis because it’s too slow for what we are building; I am loathe to let it lose because whoever is doing the analysis with me has heard it’s slow.
Which is to say, I get what you’re saying. I think people have been a little ungenerous with your comment.
Eh - I engaged with a fraught topic in a snarky way without clarifying that I meant the unintuitive-but-technically-literally-accurate interpretation of my words. Maybe some people have been less-generous than they could have been, but I don't begrudge it - if I look sufficiently like a troll, I won't complain when I get treated like one. Not everyone has the time and mental fortitude to treat everyone online with infinite patience and kindness - I know I sure don't.
Thank you for the support, though!
In some ways the weakness even was a virtue. Because Python threads are slow Python has incredible toolsets for multiprocess communication, task queues, job systems, etc.
Maybe it'll shut up "architects" who hack up a toy example in <new fast language hotness>, drop it on a team to add all the actual features, tests, deployment strategy, and maintain, and fly away to swoop and poop on someone else. Gee thanks for your insight; this API serves maybe 1 request a second, tops. Glad we optimized for SPEEEEEED of service over speed of development.
"Faster, sure" seems unnecessarily dismissive. That's the whole point of all this work.
It isn't thoughtless. I'm working in Python after having come from more designed languages, and concurrency in Python is an absolute nightmare. It feels like using a language from the 60s. An effectively single threaded language in 2024! That's really astonishing.
most software doesnt need multi threading. most times people cry about pythons performance then write trivial shit programs that take milliseconds to run in python as well
Nearly every time I've interactive with Python, its execution speed is absolutely an issue.
Please do give an example.
I see is people crying how python is slow and then use a proper fast programming language to write code that gets executed so few times that even if python was 100x slower it wouldn't matter or the program is so trivial that python's speed definitely isn't an issue.
I have even sometimes seen people stop using a tool when they find out they were written in python - now all of a sudden they are unusably slow. Then they try to justify it by writing some loop in their favourite proper fast language and tell me how fast that tight loop is or they claim that some function is X times faster, but when I actually compile it and run something like hyperfine on it and python version the difference is hardly ever X since there is already so much more over head in a real world.
Python being slow, and working to speed python programs up, helped me immensely to build a mental model for what makes programs slow. After learning C in school, when I first learned how python was implemented, I was shocked that it was even usable.
If your criticism isn't thoughtless, then that's not what I'm complaining about. Specifically, I'm annoyed about people who _just_ say "Python isn't fast enough, therefore it's not suitable to our use-case", when their use-case doesn't require significant speed or concurrency. If you thoughtfully discount Python as being unsuitable for a use-case that it's _actually_ unsuitable for, then good luck to you!
Python has been too often just a -bit- too slow for my use cases; the ability to throw a few cores at problems more easily is not going to eliminate this criticism from me but it's sure going to diminish it by a large factor.
I still hear the "java slow" meme from time to time... Memes are slow to die, sadly. Some people just won't catch on with the fact that java has had just-in-time compilation for like 15 years now (it was one of the first major platforms to get that), has had a fully concurrent garbage collector for a number of releases (zgc since java 11) and can be slimmed down a lot (jlink).
I work on low-latency stuff and we routinely get server-side latencies in the order of single to low double-digit microseconds of latency.
If python ever becomes fully concurrent (python threads being free of any kind of GIL) we'll see the "python slow" meme for a number of years... Also doesn't help that python gets updated very very slowly in the industry (although things are getting better).
I feel Java deserves better. When Python finally gets true thread concurrency, JIT (mamba and the like), comprehensive static analysis (type hints), and some sophisticated GC, and better performance, people will realise Java have had them all this time.
GraalVM is a pretty magical tool
I think java being slow has less to do with the implementation (which is pretty good) and more to do with the culture of overengineering (including in the standard library). Everything creates objects (which the JIT cannot fully eliminate, escape analysis is not magic), cache usage is abysmal. Framework writers do their best to defeat the compiler by abusing reflection. And all these abstractions are far from zero cost, which is why even the JDK has to have hardcoded special cases for Streams of primitives and ByteBuffers.
Of course, if you have a simple fastpath you can make it fast in any language with a JIT, latency is also generally not an issue anymore, credit where credit is due - java GCs are light years ahead of everything else.
Regarding jlink - my main complaint is that everything requires java.base which already is 175M. And thats not counting the VM, etc. But I don't actively work with java anymore so please correct me if there is a way to get smaller images.
I doubt that Python will ditch the meme. The fundamental model of dynamic dispatch using dictionaries on top of a byte code interpreter is pretty slow. I wouldn't expect it to get within 2x of JavaScript.
Javascript may not have an official bytecode, but is it not also based on the same concept of using dictionaries to dispatch code and slow as a result? I certainly had always filed it away as "about as fast as python" in my head. Why else would it rely on evented i/o?
You are correct, but they have (1) all of the money in the world as the fundamental programming language of the Internet and as a result (2) they have a state of the art tiered JIT for dynamic languages. The blood of countless PhD students flows through v8. I don't know if python will get the same treatment.
Ok but given enough cores even python code will run into memory bandwidth problems rather than be bottlenecked by memory latency.
Well, technically it still won't be able to use the full power of threads in many situations because (I assume) it doesn't have shared memory. It'll presumably be like Web Workers / isolates, so Go, C++, Rust, Zig, etc. will still have a fundamental advantage for most applications even ignoring Python's inherent slowness.
Probably the right design though.
Why would you think it's not shared memory? Maybe I'm wrong here but by default Python's existing threading implementation uses shared memory.
AFAIK we're just talking about removing the global interpreter lock. I'm pretty sure the threading library uses system threads. So running without the GIL means actual parallelism across system threads with shared memory access.
Yeah I think you're right actually. Seems like they do per-object locking instead.