return to table of content

gh-116167: Allow disabling the GIL

rnmmrnm
53 replies
1d1h

More than seeing it in main, I'm happy for the "python thread slow" meme officially going away now.

scubbo
41 replies
1d1h

I wish I had your optimism. Thoughtless bandwagon-y "criticism" is extraordinarily persistent.

samatman
33 replies
1d

There's no need to pretend Python has virtues which it lacks. It's not a fast language. It's fast enough for many purposes, sure, but it isn't fast, and this work is unlikely to change that. Faster, sure, and that's great.

rmbyrro
13 replies
1d

Although true, it doesn't mean they can't improve its performance.

Working with threads is a pain in Python. If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.

Removing the GIL and refactoring some of the core will unlock levels of concurrency that are currently not feasible with Python. And that's a great deal, in my opinion. Well worth the trouble they're going through.

bb88
11 replies
1d

Working with threads is a pain regardless of which language you use.

Some might say: "Use Go!" Alas: https://songlh.github.io/paper/go-study.pdf

After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.

Where it doesn't work is in a generic worker pool where you need to put mutex locks around everything -- and then prod randomly deadlocks in ways the developer boxes can't recreate.

bmitc
5 replies
22h53m

Working with threads is a pain regardless of which language you use.

That's not true at all. F#, Elixir, Erlang, LabVIEW, and several other languages make it very easy. Python makes it incredibly tough.

amethyst
4 replies
22h40m

Python makes it incredibly tough.

I disagree, Python makes it incredibly easy to work with threads in many different ways. It just doesn't make threads faster.

rmbyrro
2 replies
22h35m

The whole purpose of threads is to improve overall speed of execution. Unless you're working with a very small number of threads (single digits), that's a very hard to achieve goal in Python. I wouldn't count this as easy to use. It's easy to program, yes, but not easy to get working with reasonably acceptable performance.

bb88
1 replies
13h23m

And the python people would just point to multiprocessing...which works pretty well.

bmitc
0 replies
10h33m

Which has its own set of challenges and yet another implementation of queue.

bmitc
0 replies
22h30m

In what way? Threading, asyncio, tasks, event loops, multiprocessing, etc. are all complicated and interact poorly if at all. In other languages, these are effectively the same thing, lighter weight, and actually use multicore.

If I launch 50 threads with run away while loops in Python, it takes minutes to laumch and barely works after. I can run hundreds of thousands and even millions of runaway processes in Elixir/Erlang that launch very fast and processes keep chugging along just fine.

Shog9
1 replies
17h56m

This is where Python's GIL bit me: I was more than familiar with how to shoot myself in the foot using threads in other languages, and careful to avoid those traps. Threads spun up only in situations where they had their own work to do and well-defined conditions for how both failure and success would be reported back to the thread that requested it, along with a pool that wouldn't exceed available resources.

Like every other language I've used this approach with, nothing bad happened - the program ran as expected and produced correct results. Unlike every other language, spreading calculations across multiple cores didn't appreciably improve performance. In some cases, it got slower.

Eventually scrapped it all, and went with an approach closer to what I'd have done with C and fork() decades ago... Which, to Python's credit, was fairly painless and worked well. But it caught me off-guard, because with asyncio for IO-bound stuff, it didn't seem like threads really have much of a purpose in Python, other than to be a tripwire for unwary and overconfident folks like myself!

bb88
0 replies
13h29m

Not disagreeing. The only case for threading in python is for spinning something to handle IO.

But now with async even that goes away.

rmbyrro
0 replies
22h33m

It's not such a big pain in every language. And certainly not as hard to get working with acceptable performance in many languages.

Even if you have zero shared resources, zero mutexes, no communication whatsoever between threads, it's a huge pain in Python if you need +10-ish threads going. And many times the GIL is the bottleneck.

jcranmer
0 replies
23h25m

After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.

This may be a case of violent agreement, but there are a few clear cases where multithreading is easily viable. The best case is some sort of parallel-for construct, even if you include parallel reductions, although there may need to be some smarts around how to do the reduction (e.g., different methods for reduce-within-thread versus reduce-across-thread). You can extend this to heterogeneous parallel computations, a general, structured fork-join form of concurrency. But in both cases, you essentially have to forbid inter-thread communication between the fork and the join parameters. There's another case you might be able to make work, where you have a thread act as an internal server that runs all requests to completion before attempting to take on more work.

What the paper you link to is pointing out, in short, is that message passing doesn't necessarily free you from the burden of shared-mutable-state-is-bad concurrency. The underlying problem is largely that communication between different threads (or even tasks within a thread) can only safely occur at a limited number of safe slots, and any communication outside of that is risky, be it an atomic RMW access, a mutex lock, or waiting on a message in a channel.

heinrich5991
0 replies
23h42m

Concurrency with rayon in Rust isn't pain, I'd say. It's basically hidden away from the user.

KaiserPro
0 replies
20h37m

If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.

as you know thats mostly threads in general. Any optimisation has a drawback so you need to choose wisely.

I once made a horror of a thing that synced S3 with another S3, but not quite object store. I needed to move millions of files, but on the S3 like store every metadata operation took 3 seconds.

So I started with async (pro tip: its never a good idea to use async. its basically gotos with two dimensions of surprise: 1 when the function returns, 2 when you get an exception ) I then moved to threads, which got a tiny bit extra performance, but much easier debugability. Then I moved to multiprocess pools of threads (fuck yeah super fast) but then I started hitting network IO limits.

So then I busted out to airflow like system with operators spawning 10 processes with 500 threads.

it wasnt very memory efficient, but it moved many thousands of files a second.

markhahn
12 replies
1d

You seem to be implying that there is something inherently slow to Python. What?

This topic is an example: a detail of one particular implementation, since GIL is definitely not inherent to the language. Just the usual worry about looseness of types?

sneed_chucker
6 replies
1d

CPython is slow. That's not really something you can dispute.

It is a non-optimizing bytecode interpreter and it makes no use of JIT compilation.

JavaScript with V8 or any other modern JIT JS engine runs circles around it.

Go, Java, and C# are an order of magnitude faster but they have type systems that make optimizing compilation much easier.

There's no language-inherent reason why Python can't be at least as fast as JavaScript.

mixmastamyk
3 replies
23h32m

I've read that it can't even be as fast as JS, because everything is monkey-patchable at runtime. Maybe they can optimize for that when it doesn't happen, but remains to be seen.

sneed_chucker
1 replies
21h39m

I've heard similar claims but I don't think it's true.

JavaScript is just as monkey-patchable. You can reassign class methods at runtime. You can even reassign an object's prototype.

Existing Python JIT runtimes and compilers are already pretty fast.

maple3142
0 replies
17h45m

Python is probably much more monkey patchable. Almost any monkey patching that JavaScript supports also works in Python (e.g. modifying class prototype = assigning class methods), but there are a few things that only Python can do: accessing local variables as dict, access other stack frames, modifying function bytecode, read/write closure variables, patching builtins can change how the language works (__import__, __build_class__). Many of them can make a language hard to optimize.

imtringued
0 replies
9h24m

You can always use optimistic optimization strategies where you profile the fast path and optimize that. When someone does something slow, you tell them to stop doing it if they want better performance.

cozzyd
1 replies
14h48m

JavaScript doesn't have to contend with a plethora of native extensions (which, to be fair, are generally a workaround for python slowness).

sneed_chucker
0 replies
3h37m

JavaScript, at least on the Node.JS side, make plenty use of native extensions written in C++ https://nodejs.org/api/addons.html

In any case, that should be irrelevant to getting a reasonably performant JIT running. Lots of AOT and JIT compiled languages have robust FFI functionality.

The native extensions are more relevant when we talk about removing the GIL, since lots of Python code may call into non thread safe C extension code.

doctorpangloss
3 replies
1d

There are worse hills to die on than this. But the Python ecosystem is very slow. It's a cultural thing.

The biggest impact would be completely redoing package discovery. Not in some straightforward sense of "what if PyPi showed you a Performance Measurement?" No, that's symptomatic of the same problem: harebrained and simplistic stuff for the masses.

But who's going to get rid of PyPi? Conda tried and it sucks, it doesn't change anything fundamental, they're too small and poor to matter.

Meta should run its own package index and focus on setuptools. This is a decision PyTorch has already taken, maybe the most exciting package in Python today, and for all the headaches that decision causes, look: torch "won," it is high performance Python with a vibrant high performance ecosystem.

These same problems exist in NPM too. It isn't an engineering or language problem. Poetry and Conda are not solutions, they're symptoms. There are already too many ideas. The ecosystem already has too much manic energy spread way too thinly.

Golang has "fixed" this problem as well as it could for non-commercial communities.

pphysch
2 replies
1d

The "Python ecosystem" includes packages like numpy, pytorch & derivatives which are responsible for a large chunk of HPC and research computing nowadays.

Or did you mean to say the "Python language"?

doctorpangloss
1 replies
21h58m

The "Python ecosystem" includes packages like numpy, pytorch & derivatives which are responsible for a large chunk of HPC and research computing nowadays.

The "& derivatives" part is the problem! Torch does not have derivatives. It won. You just use it and its extensions, and you're done. That is what people use to do exciting stuff in Python.

It's the manic developers writing manic derivatives that make the Python ecosystem shitty. I mean I hate ragging on those guys, because they're really nice people who care a lot about X, but if only they could focus all their energy to work together! Python has like 20 ideas for accelerated computing. They all abruptly stopped mattering because of Torch. If the numba and numpy and scikit-learn and polars and pandas and... all those people, if they would focus on working on one package together, instead of reinventing the same thing over and over again - high level cross compilers or an HPC DSL or whatever, the ecosystem would be so much nicer and performance would be better.

This idea that it's a million little ideas incubating and flourishing, it's cheerful and aesthetically pleasing but it isn't the truth. CUDA has been around for a long time, and it was obviously the fastest per dollar & watt HPC approach throughout its whole lifetime, so most of those little flourishing ideas were DOA. They should have all focused on Torch from the beginning instead of getting caught up in little manic compiler projects. We have enough compilers and languages and DSLs. I don't want another DataFrame DSL!

I see this in new, influential Python projects made even now, in 2024. Library authors are always, constantly, reinventing the wheel because the development is driven by one person's manic energy more than anything else. Just go on GitHub and look how many packages are written by one person. GitHub & Git, PyPi are just not adequate ways to coordinate the energies of these manic developers on a single valuable task. They don't merge PRs, they stake out pleasing names on PyPi, and they complain relentlessly about other people's stuff. It's NIH syndrome on the 1m+ repository scale.

fragmede
0 replies
21h41m

yeah. like xkcd 927 to the nth degree.

oivey
0 replies
1d

Python is inherently slow. That’s why people tend to rewrite bits that need high performance in C/C++. Removing the GIL is a massively welcome change, but it isn’t going to make C extensions go away.

scubbo
3 replies
23h56m

This is entirely fair, and I wish I'd been a little less grumpy in my initial reply (I assign some blame to just getting over an illness). Thank you for the gentle correction!

That said - I think it's fair to be irritated by people who write Python off as entirely useless because it is not _the fastest_ language. As you rightly say - it's fast enough for many purposes. It does bother me to see Python immediately counted out of discussions because of its speed when the app in question is extremely insensitive to speed.

Affric
1 replies
18h54m

It’s all about values.

I have been on teams where Python based approaches were discounted due to “speed” and “industry best practice” and then had the very same engineers create programs that are slow by design in a “fast” language and introduce needless complexity (and bugs) through “faster” database processes.

Like you said, it’s the thoughtless criticism. The meme. I am happy for Python to lose in a design analysis because it’s too slow for what we are building; I am loathe to let it lose because whoever is doing the analysis with me has heard it’s slow.

Which is to say, I get what you’re saying. I think people have been a little ungenerous with your comment.

scubbo
0 replies
18h18m

I think people have been a little ungenerous with your comment.

Eh - I engaged with a fraught topic in a snarky way without clarifying that I meant the unintuitive-but-technically-literally-accurate interpretation of my words. Maybe some people have been less-generous than they could have been, but I don't begrudge it - if I look sufficiently like a troll, I won't complain when I get treated like one. Not everyone has the time and mental fortitude to treat everyone online with infinite patience and kindness - I know I sure don't.

Thank you for the support, though!

wongarsu
0 replies
23h38m

In some ways the weakness even was a virtue. Because Python threads are slow Python has incredible toolsets for multiprocess communication, task queues, job systems, etc.

nick238
0 replies
21h22m

Maybe it'll shut up "architects" who hack up a toy example in <new fast language hotness>, drop it on a team to add all the actual features, tests, deployment strategy, and maintain, and fly away to swoop and poop on someone else. Gee thanks for your insight; this API serves maybe 1 request a second, tops. Glad we optimized for SPEEEEEED of service over speed of development.

fragmede
0 replies
19h32m

"Faster, sure" seems unnecessarily dismissive. That's the whole point of all this work.

bmitc
6 replies
22h55m

It isn't thoughtless. I'm working in Python after having come from more designed languages, and concurrency in Python is an absolute nightmare. It feels like using a language from the 60s. An effectively single threaded language in 2024! That's really astonishing.

nextlevelwizard
3 replies
21h30m

most software doesnt need multi threading. most times people cry about pythons performance then write trivial shit programs that take milliseconds to run in python as well

bmitc
1 replies
21h3m

Nearly every time I've interactive with Python, its execution speed is absolutely an issue.

nextlevelwizard
0 replies
11h43m

Please do give an example.

I see is people crying how python is slow and then use a proper fast programming language to write code that gets executed so few times that even if python was 100x slower it wouldn't matter or the program is so trivial that python's speed definitely isn't an issue.

I have even sometimes seen people stop using a tool when they find out they were written in python - now all of a sudden they are unusably slow. Then they try to justify it by writing some loop in their favourite proper fast language and tell me how fast that tight loop is or they claim that some function is X times faster, but when I actually compile it and run something like hyperfine on it and python version the difference is hardly ever X since there is already so much more over head in a real world.

cwalv
0 replies
12h39m

Python being slow, and working to speed python programs up, helped me immensely to build a mental model for what makes programs slow. After learning C in school, when I first learned how python was implemented, I was shocked that it was even usable.

scubbo
1 replies
20h2m

If your criticism isn't thoughtless, then that's not what I'm complaining about. Specifically, I'm annoyed about people who _just_ say "Python isn't fast enough, therefore it's not suitable to our use-case", when their use-case doesn't require significant speed or concurrency. If you thoughtfully discount Python as being unsuitable for a use-case that it's _actually_ unsuitable for, then good luck to you!

mlyle
0 replies
17h45m

Python has been too often just a -bit- too slow for my use cases; the ability to throw a few cores at problems more easily is not going to eliminate this criticism from me but it's sure going to diminish it by a large factor.

znpy
3 replies
1d

I still hear the "java slow" meme from time to time... Memes are slow to die, sadly. Some people just won't catch on with the fact that java has had just-in-time compilation for like 15 years now (it was one of the first major platforms to get that), has had a fully concurrent garbage collector for a number of releases (zgc since java 11) and can be slimmed down a lot (jlink).

I work on low-latency stuff and we routinely get server-side latencies in the order of single to low double-digit microseconds of latency.

If python ever becomes fully concurrent (python threads being free of any kind of GIL) we'll see the "python slow" meme for a number of years... Also doesn't help that python gets updated very very slowly in the industry (although things are getting better).

RyEgswuCsn
1 replies
23h57m

I feel Java deserves better. When Python finally gets true thread concurrency, JIT (mamba and the like), comprehensive static analysis (type hints), and some sophisticated GC, and better performance, people will realise Java have had them all this time.

thorncorona
0 replies
21h46m

GraalVM is a pretty magical tool

PhilipRoman
0 replies
22h16m

I think java being slow has less to do with the implementation (which is pretty good) and more to do with the culture of overengineering (including in the standard library). Everything creates objects (which the JIT cannot fully eliminate, escape analysis is not magic), cache usage is abysmal. Framework writers do their best to defeat the compiler by abusing reflection. And all these abstractions are far from zero cost, which is why even the JDK has to have hardcoded special cases for Streams of primitives and ByteBuffers.

Of course, if you have a simple fastpath you can make it fast in any language with a JIT, latency is also generally not an issue anymore, credit where credit is due - java GCs are light years ahead of everything else.

Regarding jlink - my main complaint is that everything requires java.base which already is 175M. And thats not counting the VM, etc. But I don't actively work with java anymore so please correct me if there is a way to get smaller images.

agent281
3 replies
18h16m

I doubt that Python will ditch the meme. The fundamental model of dynamic dispatch using dictionaries on top of a byte code interpreter is pretty slow. I wouldn't expect it to get within 2x of JavaScript.

DinaCoder98
1 replies
17h28m

Javascript may not have an official bytecode, but is it not also based on the same concept of using dictionaries to dispatch code and slow as a result? I certainly had always filed it away as "about as fast as python" in my head. Why else would it rely on evented i/o?

agent281
0 replies
15h31m

You are correct, but they have (1) all of the money in the world as the fundamental programming language of the Internet and as a result (2) they have a state of the art tiered JIT for dynamic languages. The blood of countless PhD students flows through v8. I don't know if python will get the same treatment.

imtringued
0 replies
9h28m

Ok but given enough cores even python code will run into memory bandwidth problems rather than be bottlenecked by memory latency.

IshKebab
2 replies
23h53m

Well, technically it still won't be able to use the full power of threads in many situations because (I assume) it doesn't have shared memory. It'll presumably be like Web Workers / isolates, so Go, C++, Rust, Zig, etc. will still have a fundamental advantage for most applications even ignoring Python's inherent slowness.

Probably the right design though.

Difwif
1 replies
22h16m

Why would you think it's not shared memory? Maybe I'm wrong here but by default Python's existing threading implementation uses shared memory.

AFAIK we're just talking about removing the global interpreter lock. I'm pretty sure the threading library uses system threads. So running without the GIL means actual parallelism across system threads with shared memory access.

IshKebab
0 replies
21h14m

Yeah I think you're right actually. Seems like they do per-object locking instead.

protomikron
39 replies
1d1h

Although this is nice, the problems with the GIL are often blown out of proportion: people stating that you couldn't do efficient (compute-bounded) multi-processing, which was never the case as the `multiprocessing` module works just fine.

ynik
23 replies
1d1h

multiprocessing only works fine when you're working on problems that don't require 10+ GB of memory per process. Once you have significant memory usage, you really need to find a way to share that memory across multiple CPU cores. For non-trivial data structures partly implemented in C++ (as optimization, because pure python would be too slow), that means messing with allocators and shared memory. Such GIL-workarounds have easily cost our company several man-years of engineer time, and we still have a bunch of embarrassingly parallel stuff that we still cannot parallelize due to GIL and not yet supporting shared memory allocation for that stuff.

Once the Python ecosystem supports either subinterpreters or nogil, we'll happily migrate to those and get rid of our hacky interprocess code.

Subinterpreters with independent GILs, released with 3.12, theoretically solve our problems but practically are not yet usable, as none of Cython/pybind11/nanobind support them yet. In comparison, nogil feels like it'll be easier to support.

ebiester
17 replies
1d1h

And I guess what I don't understand is why people choose Python for these use cases. I am not in the "Rustify" everything camp, but Go + C, Java + JNI, Rust, and C++ all seem like more suitable solutions.

oivey
9 replies
1d1h

Notably, all of those are static languages and none of them have array types as nice as PyTorch or NumPy, among many other packages in the Python ecosystem. Those two facts are likely closely related.

abdullahkhalids
6 replies
1d

Python is just the more popular language. Julia array manipulation is mostly better (better syntax, better integration, larger standard library) or as good as python. Julia is also dynamically typed. It is also faster than Python, except for the jit issues.

znpy
4 replies
23h49m

It is also faster than Python, except for the jit issues.

I was intrigued by Julia a while ago, but didn't have time to properly learn it.

So just out of curiosity: what's the issues with jit and Julia ?

jakobnissen
2 replies
23h24m

Julia's JIT compiles code when its first executed, so Julia has a noticable delay from you start the program and until it starts running. This is anywhere from a few hundred milliseconds for small scripts, to tens of seconds or even minutes for large packages.

shiroiushi
1 replies
13h9m

I wonder why they don't just have an optional pre-compilation, so once you have a version you're happy with and want to run in production, you just have a fully compiled version of the code that you run.

aoanla
0 replies
10h12m

Effectively, it does - one of the things recent releases of Julia have done is to add more precompilation caching on package install. Julia 1.10 feels considerably snappier than 1.0 as a result - that "first time to plot" is now only a couple of seconds thanks to this (and subsequent plots are, of course, much faster than that).

cjalmeida
0 replies
23h1m

The "issue" is Julia is not Just-in-Time, but a "Just-Ahead-of-Time" language. This means code is compiled before getting executed, and this can get expensive for interactive use.

The famous "Time To First Plot" problem was about taking several minutes to do something like `using Plots; Plots.plot(sin)`.

But to be fair recent Julia releases improved a lot of it, the code above in Julia 1.10 takes 1.5s on my 3-year old laptop

oivey
0 replies
1d

Preaching to the choir here.

Julia’s threading API is really nice. One deficiency is that it can be tricky to maintain type stability across tasks / fetches.

samatman
1 replies
1d

If only there were a dynamic language which performs comparably to C and Fortran, and was specifically designed to have excellent array processing facilities.

Unfortunately, the closest thing we have to that is Julia, which fails to meet none of the requirements. Alas.

rmbyrro
0 replies
1d

If only there was a car that could fly, but was still as easy and cheap to buy and maintain :D

esafak
3 replies
1d

Why do people use python for anything beyond glue code? Because it took off, and machine learning and data science now rely on it.

I think Python is a terrible language that exemplifies the maxim "worse is better".

https://en.wikipedia.org/wiki/Worse_is_better

rmbyrro
1 replies
1d

Some speculate that universities adopted it as introductory language for its expressiveness and flat learning curve. Scientific / research projects in those unis started picking Python, since all students already knew it. And now we're here

spprashant
0 replies
22h49m

I have no idea if this is verifiably true in a broad sense, but I work at the university and this is definitely the case. PhD students are predominantly using Python to develop models across domains - transportation, finance, social sciences etc. They then transition to industry, continuing to use Python for prototyping.

nottorp
0 replies
18h52m

To quote from Eric Raymond's article about python, ages ago:

"My second [surprise] came a couple of hours into the project, when I noticed (allowing for pauses needed to look up new features in Programming Python) I was generating working code nearly as fast as I could type.

When you're writing working code nearly as fast as you can type and your misstep rate is near zero, it generally means you've achieved mastery of the language. But that didn't make sense, because it was still day one and I was regularly pausing to look up new language and library features!"

Source: https://www.linuxjournal.com/article/3882

It doesn't go for large code bases, but if you need quick results using existing well tested libraries, like in machine learning and data science, I think those statements are still valid.

Obviously not when you're multiprocessing, that is going to bite you in any language.

KaiserPro
1 replies
20h33m

but Go + C, Java + JNI, Rust, and C++ all seem like more suitable solutions.

apart from go (maybe java) those are all "scary" languages that require a bunch of engineering to get to the point that you can prototype.

even then you can normally pybind the bits that are compute bound.

If Microsoft had been better back in the say, then c# should have been the goto language of choice. It has the best tradeoff of speed/handholding/rapid prototyping. Its also statically typed, unless you tell it to not be.

snovv_crash
0 replies
11h52m

#pragma omp parallel for

gets you 90% of the potential performance of a full multithreaded producer/consumer setup in C++. C++ isn't as scary as it used to be.

zamadatix
0 replies
1d

People choose Python the use case, regardless what that is, because it's quick and easy to work with. When Python can't realistically be extended to a use case then it's lamented, when it can it's celebrated. Even Go, while probably the friendliest of that buch when it comes to parallel work, is on a different level.

pillusmany
3 replies
1d1h

"Ray" can share python objects memory between processes. It's also much easier to use than multi processing.

ptx
0 replies
1d

According to the docs, those shared memory objects have significant limitations: they are immutable and only support numpy arrays (or must be deserialized).

Sharing arrays of numbers is supported in multiprocessing as well: https://docs.python.org/3/library/multiprocessing.html#shari...

jononor
0 replies
17h34m

I think that 90 or maybe even 99% of cases has under 1GB of memory per process? At least it has been the case for me the last 15 years.

Of course, getting threads to be actually useful for concurrency (GIL removed) adds another very useful tool to the performance toolkit, so that is great.

liuliu
7 replies
1d1h

`multiprocessing` works fine for serving HTTP requests or do some other subset of embarrassingly-parallel problems.

skrause
6 replies
1d

`multiprocessing` works fine for serving HTTP requests

Not if you use Windows, then it's a mess. I have a suspicion that people who say that the multiprocessing works just fine never had to seriously use Python on Windows.

ptx
4 replies
1d

Why is it a mess? What's wrong with it on Windows?

colatkinson
2 replies
20h41m

Adding on to the other comment, multiprocessing is also kinda broken on Linux/Mac.

1. Because global objects are refcounted, CoW effectively isn't a thing on Linux. They did add a way to avoid this [0], but you have to manually call it once your main imports are done.

2. On Mac, turns out a lot of the system libs aren't actually fork-safe [1]. Since these get imported inadvertently all the time, Python on Mac actually uses `spawn` [2] -- so it's roughly as slow as on Windows.

I haven't worked in Python in a couple years, but handling concurrency while supporting the major OSes was a goddamn mess and a half.

[0]: https://docs.python.org/3.12/library/gc.html#gc.freeze

[1]: https://bugs.python.org/issue33725

[2]: https://docs.python.org/3.12/library/multiprocessing.html#co...

fulafel
1 replies
14h23m

Re (1), are there publicly documented cases with numbers on observed slowdowns with it?

I see this mentioned from time to time, but intuitively you'd think this wouldn't pose a big slowdown since the system builtin objects would have been allocated at the same time (startup) and densely located on smaller nr of pages. I guess if you have a lot of global state in your app it could be more significant.

Would also be interesting to see a benchmark using hugepages, you'd think this could solve remaining perf problems if they were due to large number of independent CoW page faults.

skrause
0 replies
1d

* A lack of fork() makes starting new processes slow.

* All Python webservers that somewhat support multiprocessing on Windows disable the IOCP asyncio event loop when using more than one process (because it breaks in random ways), so you're left with the slower select() event loop which doesn't support more than 512 connections.

rmbyrro
0 replies
1d

Probably a very small minority of Python codebases run on Windows, no? That's my impression. It would explain why so many people are unaware of multiprocessing issues on Windows. I've never ran any serious Python code on windows...

kroolik
4 replies
1d1h

Managing processes is more annoying than threads, though. Incl. data passing and so forth.

pillusmany
3 replies
1d1h

The "ray" library makes running python code on multi core and clusters very easy.

smcl
1 replies
1d1h

Interesting - looking at their homepage they seem to lean heavily into the idea that it's for optimising AI/ML work, not multi-process generally.

pillusmany
0 replies
1d

You can use just ray.core to do multi process.

You can do whatever you want in the workers, I parse JSONs and write to sqlite files.

kroolik
0 replies
23h37m

Although its great the library helps with multicore Python, the existence of such package shouldnt be an excuse not to improve the state of things in std python

vita7777777
0 replies
1d1h

On the other hand, this particular argument also gets overused. Not all compute-bounded parallel workloads are easily solved by dropping into multiprocessing. When you need to share non-trivial data structures between the processes you may quickly run into un/marshalling issues and inefficiency.

jcranmer
0 replies
1d

as the `multiprocessing` module works just fine.

Something that tripped me up when I last did `multiprocessing` was that communication between the processes requires marshaling all the data into a binary format to be unmarshaled on the other side; if you're dealing with 100s of MB of data or more, that can be quite some significant expense.

helsinki
32 replies
1d1h

While the title is correct, it is a bit misleading, because disabling the GIL breaks the asyncio tests. It's like saying the engine can be removed from my car. Sure, it can, but the car won't work.

Kranar
25 replies
1d1h

Well this release will break any code that uses threads. The goal of this particular release is to work for thread-free programs.

ollien
18 replies
1d1h

How do single-threaded programs benefit from a lack of GIL?

TylerE
6 replies
1d

Speed. Admittedly not quite as much so the way this patch is implemented, since it just short circuits the extra function calls, doesn’t omit them entirely.

0cf8612b2e1e
5 replies
1d

Removing the GIL results in slower execution. Without the guarantees of single thread action, the interpreter needs to utilize more locks under the hood.

TylerE
4 replies
22h12m

Not in single threaded code.

0cf8612b2e1e
3 replies
22h5m

Umm, yes it does? For the longest time, Guido’s defense for the GIL was that all previous efforts resulted in an unacceptable hit to single threaded performance.

Read PEP-703 (https://peps.python.org/pep-0703/#performance) where the performance hit is currently 5-8%

TylerE
2 replies
18h26m

That's to make it thread safe without the GIL.

If you only care about single thread there's all kinds of stuff you can do.

jononor
1 replies
17h29m

How to ensure there are no other threads, confidently enough that one can turn thread safety off?

Kranar
0 replies
16h42m

From when I was reading the proposal, the idea is that until a C extension is loaded, you can assume that there are no other threads. Then when a module is loaded, by default you assume that it uses threads but modules that are thread free can indicate that using a flag, so if a module indicates it's thread free then you continue running without the thread safety features.

kjqgqkejbfefn
4 replies
1d1h

Disabling the GIL can unlock true multi-core parallelism for multi-threaded programs, but this requires code to be restructured for safe concurrency, which isn't that difficult it seems:

When we found out about the “nogil” fork of Python it took a single person less than half a working day to adjust the codebase to use this fork and the results were astonishing. Now we can focus on data acquisition system development rather than fine-tuning data exchange algorithms.

https://peps.python.org/pep-0703/

kjqgqkejbfefn
3 replies
1d

We frequently battle issues with the Python GIL at DeepMind. In many of our applications, we would like to run on the order of 50-100 threads per process. However, we often see that even with fewer than 10 threads the GIL becomes the bottleneck. To work around this problem, we sometimes use subprocesses, but in many cases the inter-process communication becomes too big of an overhead. To deal with the GIL, we usually end up translating large parts of our Python codebase into C++. This is undesirable because it makes the code less accessible to researchers.
actionfromafar
2 replies
1d

Maybe they should look in to translating parts of their code base to Shedskin Python. It compiles (a subset of) Python to C++.

logicchains
1 replies
22h45m

How's it different from Cython, which compiles a subset of Python to C or C++?

actionfromafar
0 replies
21h58m

Shedskin has stricter typing, and about 10-100 times performance vs Cython.

gtirloni
3 replies
1d1h

It could remove the locking/unlocking operations.

Retr0id
1 replies
1d1h

Doesn't removing the GIL imply adding back new, more granular locks?

fiddlerwoaroof
0 replies
1d1h

Sort of, but the biased reference counting scheme they’re using avoids a lot of locks for the common case.

sapiogram
0 replies
1d1h

Removing the GIL requires more locking/unlocking operations. For single-threaded program, it's a performance penalty on average: https://peps.python.org/pep-0703/#performance

protomikron
0 replies
1d1h

They don't.

Kranar
0 replies
1d1h

They don't benefit much from a lack of GIL, perhaps a small reduction in overhead. This feature is a first step towards being able to disable the GIL completely. It is intended to be implemented in a very conservative manner, bit by bit and so for this first step it should work for thread free code.

neilkk
2 replies
1d1h

This isn't correct. TFA said that small threaded programs had been run successfully, but that the test suite broke in asyncio.

Async I/O and threads are two different things, and either can be present in real code without the other.

wolletd
0 replies
1d

"small threaded programs had been run successfully"

I have ran a lot of programs containing race conditions successfully many times until I ran into an issue.

Kranar
0 replies
1d

Not quite sure what your comment means exactly or how it implies what I said is incorrect.

At any rate, test_asyncio contains a lot of tests that involve threads and specifically thread safety between coroutines and those tests fail. As far as async I/O and threads being distinct, I mean sure that is true of a lot of features but people mix features together and mixing asyncio with threads will not work with this particular release.

OskarS
1 replies
1d1h

Really, any code? I thought they were adding fine-grained locks to the python objects themselves? Are you saying that if I share a python list between two threads and modify it on one and read it on the other, I can segfault python?

Kranar
0 replies
1d1h

With this particular release, yes it will segfault. But down the road what you state is correct, this is just a first step towards that goal.

thibaut_barrere
0 replies
1d1h

Couldn’t it work if each threads only touch thread-specific data structures?

pharrington
1 replies
1d

Being able to remove the engine from my car with the push of a button would be a pretty amazing feature!

jagged-chisel
0 replies
1d

Analogy breaking down and all, but …

Only as long at it’s as easy to put back in

znpy
0 replies
1d1h

While the title is correct, it is a bit misleading, because disabling the GIL breaks the asyncio tests. It's like saying the engine can be removed from my car. Sure, it can, but the car won't work.

You're not supposed to drive a car that hasn't got out of the research and development laboratory either, so there's that.

petters
0 replies
1d

You also need to compile Python with a special flag activated. It’s not only an environment variable or a command line option.

ollien
0 replies
1d1h

I mean, you're not wrong, but also it's a huge feat to provide a toggle for a major feature like the GIL. Though, if it's just asyncio that's broken, perhaps it's not like removing your engine, but rather your antilock brakes :)

EDIT:

[the test synchronous programs] all seem to run fine, and very basic threaded programs work, sometimes

Perhaps this is closer to removing the oil pan

TylerE
0 replies
1d1h

This comment feels very disingenuous because non-threaded programs do in fact work.

behnamoh
31 replies
1d1h

I've been programming in Python for over 6 years now and every week I learn something new. But recently I've been thinking about moving to a more capable language with proper concurrency for backend API requests (FastAPI sucked).

I also want types, so Elixir is not in the picture. I dabbled in Rust a bit. Although I was able to get the hang of things and build a CLI tool pretty quickly, I'm worried I'll have to deal with numerous quirks later if I keep using Rust (like numerous string types). Is that something to be worried about if all I want from Rust is Python+Types+Concurrency?

cmeacham98
9 replies
1d

This seems a bit like saying "JavaScript supports types!" because of typescript.

ralphist
4 replies
1d

It's not a separate language, you can just start typing your programs right now.

nextlevelwizard
3 replies
21h33m

except nothing enforces your types at run time, you can have typi hints all you want and everyone else cn ignore them

aunderscored
1 replies
20h27m

Same exists with laws. And in most languages if I want to avoid the type system and hand you a goldfish instead of an int, I can. It just may take more effort. Other language will blow up too if you hand them strange things. Just like python those languages have ways to verify you're not passing goldfish, You just may need more or less effort to use them

nextlevelwizard
0 replies
11h51m

Thats a lot of words for: yeah, you are right

ralphist
0 replies
7h26m

Everyone else can ignore every coding guideline your team sets up, that’s why code reviews and managers exist.

nick238
1 replies
21h9m

Python actually has type safety though, as you can't do `'1' + 1` like in JS (not that a linter wouldn't scream at you). If I hear another "I compile <insert language> so I know it will work, but you can't do that in Python" I'll lose it. Having the compiler not complain that the types match is not effing "testing".

scubbo
0 replies
19h52m

Having the compiler not complain that the types match is not effing "testing".

It absolutely is - it's just testing at a _very_ low level of correctness, and is not sufficient for testing actual high-level functionality.

wiseowise
0 replies
1d

Both true. What’s wrong with them?

posix_monad
0 replies
23h48m

Types are way more mainstream in the JS ecosystem than they are in the Python ecosystem. If you want a "scripty" language with types, then TypeScript is a reasonable choice.

willcipriano
8 replies
1d

What sucked about FastAPI?

behnamoh
7 replies
1d

Not fast enough! I used it to call llama.cpp server but it would crash if requests were "too fast". Calling the llama.cpp server directly solved the issue.

ralphist
2 replies
1d

Did you get "connection reset by peer" when you sent a bit too many requests perchance? I've never found the source of that in my programs. There's no server logging about it, connections are just rejected. None of the docs talk about this.

behnamoh
1 replies
1d

I don't remember the exact error name but the FastAPI server would just freeze.

dekhn
0 replies
23h14m

a freeze is not a crash.

jononor
0 replies
17h24m

You ran FastAPI app proper server like Uvicorn?

dumdedum
0 replies
23h30m

This sounds like a layer 8 problem

dekhn
0 replies
1d

Honestly, that doesn't sound right. I'm curious what you mean by crash thoguh.

KaiserPro
0 replies
20h27m

Interesting, I've used fastAPI to serve many thousands of requests a second (per process) for a production system. How were you buffering the requests?

jondwillis
1 replies
1d1h

Swift or Kotlin might be what you’re looking for but nobody uses Swift for backend really, and I’m not sure about Kotlin.

kuschku
0 replies
1d

Kotlin has basically replaced java for many spring shops. It's really common in backend nowadays.

spprashant
0 replies
22h35m

As someone in a similar boat, just pick Golang. People often dislike the basic syntax, explicit error checking and the lack of algebraic data types. I did. Rust just seems like it offers so much more and you fear missing out on something really cool.

But once you get over it, you realize Golang has a good type system, concurrency model, package manager that's not pip, fast compile times, and static binaries. For most cases it will also offer great performance.

It has everything you need to build APIs, CLI tools, web servers, microservices - pieces which will form the building blocks of your software infrastructure. I have heard numerous stories of people being productive in Go in a few days, sometimes even hours.

If Python is 0 quality of life and Rust is a 100, Golang gets you all they way up to 80-90. That last bit is something you might never need.

Rust is a great language and something I hope to be in proficient someday, but I ll save it for where I actually need that last microsecond of performance.

sparks1970
0 replies
1d1h

Golang? Pretty easy to pick up coming from Python and proper concurrency.

seabrookmx
0 replies
19h8m

GoLang as others suggested is a good pick. If you're in the "I don't like coloured functions" camp it's likely your best bet for web workloads.

If you're not a fan of GoLang's spartan syntax and are cool with async/await, C#/dotnet core is a great experience on all platforms. IMO it has the best async/await implementation (it originated there) on top of a multi-threaded event loop. ASP.NET is a great web framework and it has great library support for everything else. As someone who avoids "traditional" ORMs (Hybernate, Django) I really like Dapper.

rightbyte
0 replies
1d

Python+Types+Concurrency

Sounds like Groovy. But I wouldn't recommend it. Also the career padding hype is gone.

neonsunset
0 replies
17h59m

Strong recommendation to look into ASP.NET Core with C#. It gives static typing, easy first-class concurrency, it runs on all platforms and now can be AOT compiled if that's your thing, if not - can still produce just a single executable if you choose to. Also its CLI is nice and similar to cargo. Naturally, you won't be having performance issues.

imbusy111
0 replies
1d1h

Sounds like you want Julia. It looks like Python, but also has, what you ask for.

You can even run Python from Julia, so that alleviates the problem with a lack of libraries somewhat.

docc
0 replies
23h56m

i moved from python timo Rust and i really like it. moving to a staticly typed language is bit of a mind fuck but cargo is amazing and so is the speed and portability of rust.

agacera
0 replies
4h38m

DevX in Rust for backend development is quite good nowadays. Things are WAY easier than it was a couple of years ago.

I migrated a couple of projects from Java, TS and Go to Rust and honesly, I couldn't be happier.

Hugsun
22 replies
1d1h

It will be very exciting to see how much faster they'll be able to make vanilla python. The value proposition is being challenged by the plethora of tools aiming to alleviate those issues. Speed improvers like Mojo, pytorch, triton, numba, taichi come to mind.

There are so many different attempts at solving this problem that that the last time I wanted to try one of them, I found myself overwhelmed with options. I chose taichi which is pretty fun and easy to use, although somewhat limited in scope.

nerpderp82
21 replies
1d

Mojo should be viewed as an attack on the Python ecosystem due to it being a superset. It can consume Python, but it itself is not Python.

Taichi is really underrated, it works across all platforms (including Metal), has tons of examples and the code is easy to write. And lastly, it integrates with the ecosystem and doesn't displace it.

https://github.com/taichi-dev

great demo reel of what Taichi can do, https://www.youtube.com/watch?v=oXRJoQGCYFg

https://www.youtube.com/watch?v=WNh4Q7-OSJs

https://www.taichi-lang.org/

KerrAvon
14 replies
23h10m

I think "attack" is a bit much; C++ isn't an attack on C.

Kranar
8 replies
22h56m

While Ken Thompson never used the word attack, he certainly didn't have a positive opinion of the language or of Bjarne Stroustrup either in terms of his technical contributions or his handling of C++ adoption:

https://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-p...

hn_throwaway_99
5 replies
22h6m

Thanks for posting that, I thought it was a great read (as someone who last used C++ probably about 25 years ago...)

Given that so many of the criticisms were about C++ being over-complicated, I do worry about languages just becoming more and more difficult over time as everyone wants their pet feature added, but due to backwards-compatibility concerns old/obsolete features are rarely removed. For example, take Java. I think that a ton of goodness has been added to Java over the decades, and for people who have been working with it throughout, it's great. But it feels like the learning curve for someone just getting involved with Java would be really steep, not just because there is just a ton of stuff, but because without having the context of the history and how things were added over time (usually with an eye towards backwards-compatibility) it feels like it would be hard to wrap your head around everything. If you're writing your own new program that's not really a problem as you can just stick to what you know, but if you're getting into an existing codebase that could use lots of different features it feels like it could be daunting.

It's been quite a while since I've programmed in Java, so I'm just speculating, but would be curious how other folks relatively new to the language in production environments find the learning curve.

dimitrios1
2 replies
21h39m

I was doing primarily go development since it was first released up until a few years ago when the pandemic allowed me the opportunity to move into a full time remote gig doing primarily Java development, so I can answer this as I hadn't done Java at that point for over 10 years, so I felt completely new (what Java I did before that, I was mostly trying to not use Java by using play framework or jruby on rails)

As someone in the boat you mentioned (sort of) the short answer is modern Java development for 90% of tasks is not complicated at all: it's very much like any programming language used in a bizdev/corp environment -- you are mostly using a framework and a bunch of DSLs. Almost everyone uses Intellij and Gradle for IDE and build, and Junit5 or Spock for unit testing. I passed a technical interview mostly on Spring Framework concepts knowing almost nothing about it, nor having ever used it in production by simply just having the documentation open while I was being interviewed so I could look up the answers. Any language that is popular is going to have frameworks with decent documentation that help you be productive quickly, so I just jumped in doing Spring. The java stuff came as needed, or I referenced something like Effective Java (great book), or a Baeldung article. Java world has made some great strides since the 2000's and early 2010s of XML chaos. It took a while, but I feel like it's in a really good spot and getting better.

As an aside, if it hasn't been mentioned to you before, if you like simplicity in a language, but still incredibly productive, you might enjoy Go.

progmetaldev
1 replies
16h9m

I truly appreciate your answer, as someone that primarily does C#, but learned Java first back around 2003 to 2005. More so because my son is going to be learning Java next year in high school, and I know he'll be looking to me for help, and I'd like to at least be at the same language version as he's learning.

I've also been looking at Go quite a bit, especially from a lot of commentary even recently on YouTube, as well as plenty of time during work hours to experiment with languages I'm not familiar with. I've been deciding between Python and Go, and I picked up Python very quickly for the language, but ultimately I think it's the libraries and ecosystem around both that will decide it for me. Just good to see another vote for Go, especially when comparing to Java. I feel like I'm in the same boat as you seem to be, where it's all just syntax and learning frameworks, while having enough experience at this point to also pick up the smaller details while quickly being effective.

dimitrios1
0 replies
14h18m

No problem. I am also a huge Dream Theater fan.

progmetaldev
1 replies
16h36m

I haven't used Java since v1.5, roughly around 2005. I do use C# quite a bit, and have since v1.1 (I got into .NET through VB.NET at v1.0, as I had just learned VB through schooling). I look back at all the features that have been added over time, and since I have followed along as it has developed, I embrace the changes. They have made my code more concise and easier to read, with less boilerplate. When I think about someone brand new getting into the language, I truly hope they have a mentor. I can't imagine having to work with someone new to the language, and not being given plenty of time to get them up to speed with both current codebases, as well as newer ones (and I haven't even adopted the newest language versions unless it was suggested to me through Resharper, and I looked up the feature and found how it made my code easier to understand).

I feel like the comparison with Java is similar, as both Oracle and Microsoft are large corporations that largely control the language and ecosystem, while also having open source implementations. C# has made some more major changes to the language itself than Java, but both have diverged quite a bit from what they were back in 2005.

I'll find out next year, as my son goes into high school, and they are offering Java software development classes. I got Apple Basic on the Apple ][e, QuickBasic, and a bit of C++ in high school, and graduated in 1997. I wouldn't be surprised if he's going to be dealing with Java v1.5 instead of the latest features. At least I'll have a motivation to learn the latest features if he enjoys it and keeps working at it.

flakes
0 replies
11h10m

I wouldn't be surprised if he's going to be dealing with Java v1.5 instead of the latest features.

Well I certainly hope not… LTS for Java 5 probably hit end of life when he was 2. I’d say most likely the version being installed is Java 17, or 11 if theyre really dated. Java 8 would likely be the absolute minimum, mainly due to the industry being so slow to migrate away from it. Newer programmers are unlikely to ever know what Java felt like before generics and lambdas existed.

richrichie
1 replies
17h30m

Thanks for sharing.

perhaps Erik Naggum, scourge of Usenet, was right when he said: “life is too long to know C++ well.”

I feel similar about Rust. I have read “the book”. Did a couple of small projects. It is also sort of committee administered language that sucks up tiny features like a giant vacuum cleaner sweeping the streets and changes every hour.

kelnos
0 replies
12h53m

that sucks up tiny features like a giant vacuum cleaner sweeping the streets and changes every hour.

Can you provide some (preferably recent) examples? My experience has been the opposite. It feels like new features are given a lot of thought and deliberation, and stablizing even something small can take upwards of a year.

I will agree that today's Rust is different from Rust 1.0, but I don't see that as necessarily a bad thing. More that they were able to come to a stable 1.0-ready core fairly early on, and have been adding on the more tricky parts bit by bit since then.

android42
2 replies
22h30m

I wasn't sure whether to agree with this or not, so I finally took a slightly closer look at Mojo just now.

This depends on how they license it going forward, and whether they make it open, or use being a superset as a way to capture then trap python users in their ecosystem, and I don't think we have a certain answer which path they'll take yet.

The way they let you mix python compatible code with their similar but more performant code [1] looks interesting and provides a nice path for gradual migration and performance improvements. It looks like one of the ways they do this is by letting you define functions that only use typed variables which is something I would like to see make its way back to CPython someday (that is optionally enforcing typing in modules and getting some performance gains out of it).

[1] https://en.wikipedia.org/wiki/Mojo_(programming_language)#Pr...

oivey
0 replies
12h47m

The typing thing is really unnecessary and a step backwards, IMO. Added typing is maybe great for managing a code base, but it isn’t necessary for performance with a compiler that can do type inference.

Intralexical
0 replies
17h25m

It looks like one of the ways they do this is by letting you define functions that only use typed variables which is something I would like to see make its way back to CPython someday (that is optionally enforcing typing in modules and getting some performance gains out of it).

This is already how Cython and MYPYC work. You add standard PEP-484 type annotations, and they use that to infer where code can be compiled to native instructions.

https://cython.readthedocs.io/en/latest/src/tutorial/pure.ht...

nerpderp82
1 replies
22h38m

I didn't mention C++ at all. Was that my argument? This thread is about Python, the GIL. Mojo was brought up as a way to speed up Python code.

C++ predates on C in a similar way to how Mojo predates on Python. At least C++ has extern C.

https://docs.modular.com/mojo/manual/python/#call-mojo-from-...

As shown above, you can call out to Python modules from Mojo. However, there's currently no way to do the reverse—import Mojo modules from Python or call Mojo functions from Python.

One way street. Classic commons harvesting.

kelnos
0 replies
12h48m

I think the person you're replying to is just trying to use an analogous example; you didn't need to bring it up.

Regardless, I think it's a bit alarmist and overly aggressive to assume nefarious intent. Have the developers acted in ways such that this reputation is deserved?

Also, little OT, but it took me unreasonably long to understand that you meant "predates" as the verb form of "predator", not as in "comes before chronologically". The phrase "preys on" may be more clear.

objektif
1 replies
1d

Never heard of taichi before looks promising. Do you know any shop that uses it for prod code?

its-summertime
1 replies
19h58m

Cython is also a superset, is Cython also guilty of such crimes?

Intralexical
0 replies
17h30m

Cython is dependent on CPython. Cython outputs Python extension modules, which can only be used when imported into a standard CPython environment, and which interoperate cleanly with the rest of the Python ecosystem.

Mojo explicitly does the opposite, allowing Mojo to use Python but requiring Mojo to be in control, while making it hard/impossible for code written in Mojo to benefit code written in Python:

Our long-term goal is to make Mojo a superset of Python (that is, to make Mojo compatible with existing Python programs). […] Mojo lets you import Python modules, call Python functions and interact with Python objects from Mojo code. […]

As shown above, you can call out to Python modules from Mojo. However, there's currently no way to do the reverse—import Mojo modules from Python or call Mojo functions from Python. […]

This pattern doesn't work because you can't pass Mojo callbacks to a Python module.

Since Python can't call back into Mojo, one alternative is to have the Mojo application drive the event loop and poll for updates.

No comment on whether this should be viewed as an attack.

https://docs.modular.com/mojo/manual/python/

est
0 replies
8h12m

Mojo should be viewed as an attack on the Python ecosystem due to it being a superset

A superset of pure .py code, not the numpy, cython, ctypes and stuff.

But once you get "superset" of CPython's C bindings, congratulations, you get GIL.

boxed
0 replies
10h48m

Mojo is not a superset. Not even close. They say they are AIMING for it to be a superset OF THE SYNTAX. This is a subtle yet enormous difference to being a superset of the language itself.

tiffanyh
10 replies
22h46m

ELI5

I get in concept what the GIL is.

But what's the impact of this change?

Packages will now break, for the hope of better overall performance?

nextaccountic
5 replies
22h25m

If any package depends on the GIL, it will be enabled. Packages won't break

hsbauauvhabzb
4 replies
18h17m

What packages might depend on the GIL and why would they need it?

CJefferson
1 replies
13h51m

Almost all packages depend on the GIL (at least at first).

They assume they can "take the GIL", and then go and look at the various Python datastructures you passed them without worrying about them changing as they are being read.

The later, when writing out their answer (which might involve editing something they were given), they can assume they are not changing. For example, you could write code which extends the length of a list to 100, then fill in all the members, not worrying half way through your loop another thread shrinks the list back down.

imtringued
0 replies
9h30m

This only applies to native extensions.

whiterknight
0 replies
17h28m

Any C code that uses global variables casually.

mlyle
0 replies
17h47m

Packages with native components that have not been updated for the semantics changes.

dathery
3 replies
20h47m

Previously people basically didn't bother to write multithreaded Python at all due to the GIL. Threads were primarily used when you had multiple pieces of work to do which could end up blocked on independent I/O. Which is common and useful of course, but doesn't help with the performance of CPU-bound Python code.

Even outside of high-intensity CPU work, this can be useful. A problem lately is that a lot of code is written using Python's native asyncio language features. These run single-threaded with async/await to yield execution, much like in NodeJS, and can achieve pretty good throughput even with a single thread (thousands of reqs/second).

However, a big problem is that any time you do _any_ CPU work, you block all other coroutines, which causes all kinds of obscure issues and ruins your reqs/second. For example, you might see random IO timeouts in one coroutine which are actually caused by a totally different coroutine hogging the CPU for a bit. It can be very hard to get observability into why this is happening. asyncio provides a `asyncio.to_thread()` function [1] which can help to take blocking work off the main thread, but because of the GIL it doesn't truly allow the CPU-bound to avoid interfering with other coroutines.

[1] https://docs.python.org/3/library/asyncio-task.html#asyncio....

fireattack
1 replies
15h8m

Would GIT affect how Python runs across multiple Python processes?

I'm asking because I encountered a weird phenomenon before.

I use a simple Python lib called "schedule" which is to run some tasks periodically (not precise). And I often run a script multiple times (with different arguments) to monitor something say, every 30 seconds. So they're in three separate Python Interpreter processes.

What I've noticed is that while when I initiated them, they were something like 5 seconds apart, they eventually will end up running in sync. Probably not related to GIL at all, but I guess do no harm to ask.

Austizzle
0 replies
12h51m

The GIL is per interpreter/process, so this wouldn't be related as far as I know.

The GIL only really kicks in if you use threads in a single process. Then, the GIL will only let one single thread do actual work at a time, and will trade off which thread gets to do work. The other threads can wait on IO stuff (web requests, the file system, etc) but they can't do number crunching or data processing at the same time.

That's a really interesting observation though, I wonder what _is_ causing your separate processes to sync up?

hsbauauvhabzb
0 replies
18h17m

Will this also support threads spanning multiple cores, or is that unrelated?

Edit: https://peps.python.org/pep-0703/ suggests it will support multiple cores, unless the current work does not yet achieve that.

tommiegannert
6 replies
1d

First I read the news of tranched bread, and now this?! What a time!

I was a bit disheartened when the Unladen Swallow project [1] fizzled out. Great to see Python back on the core optimization track.

[1] https://en.wikipedia.org/wiki/CPython#Unladen_Swallow

fragmede
5 replies
23h38m

tranched bread?

KMnO4
2 replies
23h6m

I could be wrong, but I think it’s a clever alternative to the expression “best thing since sliced bread”.

fragmede
1 replies
20h38m

ahahah. I was thinking that it was a new python library or something that I hadn't heard of and was coming up short with Google.

"tranced bread" is a fun name for some sort of library that breaks up files into pieces for better resilience for sending, like over BitTorrent.

pas
0 replies
4h9m

tranced bread sounds like what you go home with after a goa festival

oblio
0 replies
15h3m

OP is a subprime mortgage seller.

maest
0 replies
21h31m

GP is joking that this is the best thing since sliced (tranched) bread.

purpleidea
6 replies
15h16m

It's neat that this will eventually improve some python code, but at the end of the day, it's still a badly typed language, which will still be slower and less safe than more modern languages.

Learn golang or rust instead. Impressive that they are managing this though!

AdamJacobMuller
2 replies
13h55m

Learn golang or rust instead.

Bad take. Learn golang and rust and python.

You should use the language which is suited to the task, sometimes that's golang sometimes that's python and sometimes it's rust.

It's impressive that the python team as a whole continues to improve in such big ways after more than 30 years of development. It's more impressive that the python team managed to navigate 2to3 and come out stronger.

thisismyswamp
0 replies
8h8m

Learn golang

Having to "if err != nil" every single function call is a big put off - imagine having to "try catch" everything in a language like C#!

neonsunset
0 replies
7h4m

Python is a bad programming language, but a great scripting one. You use it when you either have to do scripting or need a framework that is only offered in Python (the entirety of ML domain). Otherwise, the only reason to pick it is if you have no choice, you don't want to do a rewrite or you are being held at a gun point.

Instead, pick C#/F#, Kotlin/Clojure or Rust depending on the use case.

takeda
0 replies
14h34m

Rust, sure, but typing in golang isn't superior to python's type annotations.

Also Rust became a quite popular tool for python extensions, where you can offload performance to rust and business logic to python.

alfalfasprout
0 replies
14h55m

Bad take. One doesn't just migrate massive ecosystems to "go or rust". Python, like it or not, is the lingua franca of ML/AI and science. Worse, as a user of Go/Rust it's very disingenuous to expect the same kind of iteration ability in those languages as you get in Python.

With tools like pyright now + the work on nogil everyone benefits from this using Python.

Cacti
0 replies
15h0m

but Python has my libraries, and go and rust do not.

> 99% of my compute is offloaded to compiled BLAS or CUDA.

types are enough.

and memory and type safety are not terribly relevant for my use cases.

vlovich123
4 replies
1d1h

Does anyone know why the biased reference counting approach described in https://peps.python.org/pep-0703/ just has a single thread affinity requiring atomic increments/decrements when accessed from a different thread? What I’ve seen other implementations do (e.g. various Rust crates implementing biased reference counting) is that you only increment atomically when moving to a new thread & then that thread does non-atomic increments/decrements until 0 is hit again and then an atomic decrement is done. Is it because it’s being retrofitted into an existing system where you have a single PyObject & can’t exchange to point to a new thread-local object?

colesbury
3 replies
23h56m

We could implement ownership transfer in CPython in the future, but it's a bit trickier. In Rust, "move" to transfer ownership is part of the language, but there isn't an equivalent in C or Python, so it's difficult to determine when to transfer ownership and which thread should be the new owner. We could use heuristics: we might give up or transfer ownership when putting an object in a queue.SimpleQueue, but even there it's hard to know ahead of time which thread will "get" the enqueued object.

I think the performance benefit would also be small. Many objects are only accessed by a single thread, some objects are accessed by many threads, but few objects are exclusively accessed by one thread and then exclusively accessed by a different thread.

vlovich123
2 replies
23h49m

I think you would do it on first access - “if new thread, increment atomic & exchange for a new object reference that has the local thread id affinity”. That way you don’t care about whether an object actually has thread affinity or not and you solve the “accessed by many threads” piece. But thanks for answering - I figured complexity was the reason a simpler choice was made to start with.

orf
1 replies
22h28m

But this would now make the reference count increment require a conditional? It’s a very hot path, and this would cause a slowdown for single-threaded Python code.

vlovich123
0 replies
21h31m

It's already taking a conditional. Take a look at the PEP:

    if (op->ob_tid == _Py_ThreadId())
      op->ob_ref_local = new_local;
    else
      atomic_add(&op->ob_ref_shared, 1 << _Py_SHARED_SHIFT);
So you're either getting a correct branch prediction or an atomic operation which will dominate the overhead of the branch anyway. All this is saying is in the else branch where you're doing the atomic add, create a new PythonObj instance that has `ob_tid` equal to `_Py_ThreadId`. This presumes that Py_INCREF changes the return type from void to `PythonObj*` and this propagates out so that futher on-thread references use the newer affinity (branch condition is always taken to the non-atomic add instead of the atomic one). It's easier said than done and there may be technical reasons why that's difficult / not possible, but worth exploring eventually so that access by multiple threads of a single object doesn't degrade to taking atomic reference counts constantly.

https://peps.python.org/pep-0703/

Retr0id
2 replies
1d1h

Is there a good overview of the bigger picture here?

misoukrane
1 replies
1d1h

Finally, looking forward to the benchmarks of many tools!

pgraf
0 replies
21h29m

If anyone wondered GIL = Global Interpreter Lock

karmasimida
0 replies
1d

This is exciting, can't wait

Dowwie
0 replies
1d1h

We are now one step closer to PythonOTP