Some people on codeforces (the competitive programming platform that this was tested on) are discussing the model: https://codeforces.com/blog/entry/123035
Seems like they don't believe that it solved the 3200 rated problem (https://codeforces.com/contest/1810/problem/G) w/o data leakage
For context, there are only around 20 humans above 3200 rating in the world. During the contest, there were only 21 successful submissions from 25k participants for that problem.
It doesn't code like human so you would expect it to be better at some kinds of tasks. It brute forces the problems by generating a million solutions and then tries to trim that down, a few problems might be vulnerable to that style of approach.
Are you sure? "brute forces the problems by generating a million solutions and then tries to trim that down" isn't how I would describe the way a LLM works.
The original AlphaCode paper in Nature explains the approach, they generate many potential solutions with the LLM and do a lot of processing after to select candidates. Here's where the probabilistic nature of LLMs hurts, I think.
That is how it works, read the paper.
AdamantChicken2 (aka AlphaCode2) replied to the thread! https://codeforces.com/blog/entry/123035?#comment-1091379
I guess we'll know in a few months (whenever the model is available and the next competition is run)