return to table of content

Doom Captcha (2021)

SmartHypercube
28 replies
15h23m

Who else is clicking "click to start" like me? It turns out you have to choose one of the buttons. I thought they are there to allow me to enable/disable the sound, but they also both act as start buttons.

Didn't know a simple interface with a sound switch and a game start button can be designed this badly.

nycdatasci
10 replies
12h42m

Who else is missing the forest for the trees? It turns out you have to focus on the merit of the contribution instead of inconsequential UI design optimization.

Didn’t know a simple demo (with disclaimers) from someone who is clearly doing something novel could be commented on this badly.

Thorrez
3 replies
12h27m

inconsequential UI design optimization

I certainly was confused and had a hard time starting it. If a significant amount of people can't even figure out how to start the game, the problem isn't inconsequential.

nycdatasci
1 replies
12h14m

I agree with you, but this is distracting from the merits of the demo. Also, this is currently #2 on the front page so clearly many people are able to navigate the demo UI, even if it is suboptimal.

wruza
0 replies
10h43m

I decided to leave only a secondary comment at the bottom of the thread for the same reason as yours and still got 14 ups (i.e. thanks) in a short time before this branch bubbled up. People definitely get confused and that's worth talking about before the merits of the demo, cause you have to run it somehow. I almost left too thinking it's broken, hugged or something. It is distracting and we'll live through it :)

ikari_pl
0 replies
12h20m

maybe the bots won't know either

ryanjshaw
1 replies
7h30m

inconsequential UI design optimization

I tapped "click to start" on my phone a few times, saw nothing happened and assumed it didn't work on mobile and tapped back to come read the comments. I am neuroatypical, though, maybe I don't count as human.

littlestymaar
0 replies
5h49m

I tapped "click to start" on my phone a few times, saw nothing happened and assumed it didn't work on mobile

Same reaction here.

iopq
0 replies
11h57m

I couldn't get it started for a while because I clicked start to start like it says on the tin

ghnws
0 replies
11h1m

There is bad ui and then there is such bad ui that you lose focus on the actual thing and just wonder how an ui can be so bad. This is the latter.

JayNitram
0 replies
4h30m

Agreed, I really like this demo, seems like a fun concept that adds some sparkle to a typically mundane thing.

Getting so pedantic about a minor point seems like it does more to stifle creativity and innovation and that it does to help.

Arisaka1
0 replies
10h55m

I'd argue that if it confuses the user it's not inconsequential. And also, something can be both innovative and at the same time have room for improvement. Companies are literally chasing down user feedback.

A user's feedback is one of the best things that can ever happen to your program, the worst is to never ever get used by anyone, and the second worse is to have the users walk away with no idea why.

notpushkin
6 replies
15h8m

I think the easiest way to fix would be to add a colon, so that you see you have to pick an option:

    Click to start:
    [sound on] [sound off]

kqr
3 replies
11h43m

Or have the "click to start" text cliclable and start the game with sound. Anyone who wants it muted will make sure to first click the mute symbol and then the ambiguity resolves itself anyway.

wruza
2 replies
10h34m

MathDoku does that and I hate it, because sometimes cookies expire and it plays loud music in the middle of the night when I start it. What's wrong with

  [  CLICK TO START  ]
  [x] Allow sound
Keep it simple

kqr
1 replies
9h54m

I think most people would agree your solution is preferable, but the spirit of this subthread was "what's the smallest change that would improve things" rather than "how could it be redesigned from scratch?"

I would also argue the MathDoku problem is different. That sounds like a mode confusion type issue, where the user expects a certain level of automation but it has been disabled by the system without adequate feedback.

maxcoder4
0 replies
5h7m

What's wrong with "start with sound" and "start without sound"? That's a guaranteed single click, whereas with a checkbox you need either one or two clicks.

medstrom
1 replies
11h21m

Even better:

    [Start with sound] [Start without sound]

sublinear
0 replies
5h18m

100% this. Buttons represent verbs.

kfarr
2 replies
15h10m

Yeah it doesn't even need the option IMHO, I don't think sound is needed here...

taneq
0 replies
8h24m

E1M1 is absolutely a part of the experience.

ghnws
0 replies
11h1m

Doom without sound is not doom. Sound is absolutely needed

burrish
2 replies
9h58m

skill issue, literally filtered by two buttons on the screen

TeMPOraL
1 replies
9h31m

You mean the buttons are the real CAPTCHA?

burrish
0 replies
8h21m

That's a funny idea lmao

robofanatic
0 replies
4h20m

Human Intelligence eventually figures it out, no matter how bad the interface is.

SuchAnonMuchWow
0 replies
5h9m

Twist: the real capcha is detecting if the user first press on "click to start"

Kreutzer
0 replies
5h33m

Not me.

Carlseymanh
0 replies
5h5m

If I were you I'd change my name into Hypercube only

sugarkjube
20 replies
18h5m

Absolutely love it. Unusual captcha's are great.

Reminded me of this one: http://random.irb.hr/signup.php

evgpbfhnr
4 replies
16h5m

You don't actually need much, for a form I used to get spam in I just added a "write 42 here" so anyone who actually cares to read would be able to fill it. spam fell to 0.

(for a site with a slightly higher profile this wouldn't be enough, but for a minor corner of the internet with no ill intent actually aimed at it that turned out to be enough to block the fuzzing "fill all the forms" spam)

kqr
3 replies
11h2m

Similarly an empty input field that is css'd to be outside the viewport is often filled by spambots but not humans. But I like the edge case UX of your idea more.

jeffhuys
2 replies
10h59m

Just watch out that Chrome’s autofill doesn’t fill it in. Cost us a huge chunk of new signups until we found out. Chrome ignores autofill directives under some circumstances.

kqr
1 replies
10h8m

It's also visible for users with CSS overrides and/or other browser inpairments. The more I think about it the more strongly I prefer the "type 42" explicit input field.

thih9
0 replies
3h51m

You can label it “leave this field empty”, with a placeholder or similar - then it’s the same explicit instruction as “type 42”.

koito17
3 replies
15h37m

The question I got was surprisingly simple: it asked to find "the least real root of the polynomial p(x) = (x+5)(x-4)(x+1)". A determined attacker can quickly hack together something with Tesseract and feed it into even GPT-3.5 to get the correct answer to questions like these.

I guess that means the captcha is doing its job, since running LLMs isn't very cheap or scalable. But any harder problem means you start filtering a significant chunk of human users. Based on the other replies to your comment, it seems that the questions at their current difficulty already stop a lot of human users, yet allow a determined attacker with the setup I described pass through easily.

explaininjs
2 replies
14h45m

I'm not sure how you'd determine the least real root to that, given all three have equally zero imaginary component.

wnoise
0 replies
13h2m

They of course the minimum out of the set of the real roots.

cwillu
0 replies
14h4m

I suppose the square root of negative infinity has the property of being unreal in several distinct ways, but yeah, the least real? I dunno /s

em-bee
2 replies
17h33m

after reloading a dozen times i finally got one that i could solve:

-3 * 3 + (-3) = ?

jakderrida
1 replies
17h12m

I just got one I think I can solve: 0 + 7 + 0 = ?

Where's my calculator?

defrost
0 replies
17h5m

Bond, Jim Bond ?

onlyrealcuzzo
1 replies
16h28m

Can I play by an audio call if I'm visually impared?

Keyframe
0 replies
11h51m

Yes, when you hear a monster roar you say BANG!

esaym
1 replies
17h43m

Funny. I made a captcha challenge of calculus problems for a comment section on my personal blog page. But 5 years after college, I couldn't remember how to even do them myself so I changed it :-/

iopq
0 replies
11h56m

wolfram alpha can do it for you

nottorp
0 replies
3h46m

I got "find the last real zero of the polynomial..." but what does last mean? Largest? Last as the polynomial's factors are given? Something else?

Edit: oh wait. It's "least". I really have no idea then :)

marvinborner
0 replies
7h33m

Or the one on esolangs.org where you need to evaluate some random Befunge code.

baud147258
0 replies
9h0m

I remember an old (and now defunct) fan site who hit you with lore questions as a captcha. Though I'd guess a LLM could answer

Kwpolska
0 replies
3h7m

The first one I got was 7 * 7 + (-3). That’s trivial, elementary-school-level math, and did they really need LaTeX to render that?

Then I refreshed the page, and was hit with calculus involving trig functions.

pushedx
11 replies
18h23m

wouldn't do much to prevent bots

frozenlettuce
7 replies
17h20m

If they switch to canvas rendering and include some twist (eg. shoot x but not y, limit input rate, etc), then I think that a considerable computing effort would be necessary to break the lock

enlyth
3 replies
16h38m

I don't think it's that considerable, I made a script to defeat it with vision in only a few minutes:

https://gist.github.com/enlyth/a177e4102b0da37a73587e15dbd68...

This could be further optimized to not scan the whole screen, and faking some human like mouse movements shouldn't be that hard too

Reubend
2 replies
16h15m

Wow, that's pretty impressive to me and I think it's awesome that you were able to put this together quickly. I admit that I don't have a CV background, so maybe this is easier for a programmer who's already experienced in that area.

lloeki
1 replies
9h24m

To be fair I don't think you need CV in this specific case where the problem space is very limited.

1. There's no lighting, so the enemies have specific, fixed pixel colours that don't appear in any of the backgrounds. Scan and target these.

2. Enemies appear in a specific zone in the canvas. Makes scan faster, combines with below.

If there's expected ambiguity one can a. detect a few interesting background properties by looking at pixels where enemies never appear (e.g corners), and/or b. use a couple of other pixels relative to the candidate match (maybe neighbours, maybe not, could just as well be 20px down, 10 left) to discriminate.

Side story: one day my team was tasked with doing textual document content recognition for some biz. Everyone was like "oh it's going to be $$$ to pull out CV+OCR and have the OCR learn the specific font".

Turns out the document in question was:

    - an extremely standardised gov format
    - produced only by gov administration
    - of a known fixed, overall size with clear identifiable boundaries
    - printing known, standardised list of fields at fixed position
    - with a known, standard font specifically made for quick automatic recognition
    - containing only /[A-Za-z0-9]/ chars (plus a few I can't recall, but essentially dash, plus, slash...)
    - on a known, standardised background
    - the only variable is the quality of the scan and the size parameters
So I put a file upload form, piped the image through some reasonable imagemagick filter sequence to turn it into a no-background monochrome, look for corners/borders, resize+rotate, scan through the image til I hit a black pixel, then look at pixel-lit/unlit patterns (think 7 segment display in reverse).

Cobbled the thing in a couple afternoons, with a quick, simple UI to have the user crop/rotate the doc (putting it mostly upright). It was stupidly fast to run and success rate was very high. Interestingly enough the failure mode was very good as it could reliably tell "ok I can't make any sense out of this" vs OCR which claimed success but outputted gibberish.

You can get surprisingly far with very little when you have known knowns.

justsomehnguy
0 replies
6h9m

Nah, a proper anecdote should end with 'and you could check a one checkbox at the gov site and instead of the scan you would receive the 'printed' PDF/A with the text layer intact'.

But yeah, there is always a way to optimize. Even if making a clean room implementation (ie not looking at the source of that DOOM captcha) you can easily narrow down a recognition to a couple of 2x2 blocks and just pattern match them against a known background (ie not a monster).

duskwuff
1 replies
17h2m

And if you analyzed the user's cursor movements (on desktop), reaction time, and positional accuracy, it could be a genuinely decent CAPTCHA.

RockRobotRock
0 replies
15h29m

I'm in awe at the late stages of this cat and mouse game. I write a lot of bots and scrapers, and I feel thoroughly out-gunned against a bunch of PhD data scientists.

DataDome talk about detection: https://youtu.be/xJGBfSGIsjw

RockRobotRock
0 replies
15h28m

I know this is just for fun, but I think this could be a genuinely good solution if it was heavily obfuscated, and the enemy positions were streamed from the server.

seattle_spring
0 replies
15h10m

This comment made me vividly think about that "no silly hats!" cartoon by Don Hertzfeld from 20 ish years ago.

darby_eight
0 replies
17h45m

...what are you comparing to?

brink
0 replies
17h48m

The author knows, it's just a bit of fun. Read the page.

brcmthrowaway
7 replies
15h51m

How could this possibly be in the training set?

corysama
6 replies
15h41m

It’s not. The fine tuning taught the LLM how to give single-character responses (move/fire keyboard controls) in response to a sequence of ASCII-art-ized frames of the game being played.

Zambyte
5 replies
14h13m

Is it actually ASCII art or just a textual encoding? The art representation is nice for looking under the hood and seeing something pretty, but I feel like that is a very far from optimal way to textually encode Doom for a language model to process. Especially since there is no pitching the camera, you can encode all of the information you need to represent a frame in a single line of ASCII. It they are actually using an ASCII art representation, I bet they would get way better performance encoding the frame as a single line of text.

kqr
2 replies
11h14m

I never realised you could encode each column of Doom as a single character, but of course you can! I suppose the one thing missing would be distance, but if you get 8 bits per character I you could reserve the upper bits to represent approximate distance.

That's weirdly inspiring! What other games can I make where the visuals are conceptually no more than a line of characters, but which can get macroexpanded into immersive graphics?

firewolf34
0 replies
4h20m

I suppose the save states of a game are a compressed representation of the world to a degree.

Zambyte
0 replies
2h13m

Another point to note is that you aren't stuck with a single character to encode a column of Doom as text. You could also do something like a letter to represent the content, followed by a number to represent the distance.

I think the only weird part about that is that certain letter-number pairs may be a single token with some other semantics in the model, and other letter-number pairs would be a pair of tokens. I think that could impact the performance of the model (but probably not by a huge amount).

corysama
1 replies
13h41m

If you just click through the links you’ll see the actual input to the LLM https://twitter.com/SammieAtman/status/1772075251297550457

Nothing you are saying is technically incorrect. But, optimal performance was not the goal. The goal was to see if this crazy stupid concept would actually work. And, it does!

Zambyte
0 replies
1h35m

Ah, I think I clicked the actual post link and saw nothing, and backed out. Thanks for the direct link to the video.

And yeah I totally get not aiming for optimal performance. I think it would be interesting to see how a language model could perform with a format that is less visually catered though. Like, textually there is little association between columns, it's just a string of characters, and some of them happen to be newlime characters. A more densely packed encoding would play more into the logic and reasoning encoded into the model, rather than just trying to parse out ASCII art.

wahnfrieden
0 replies
15h53m

any models fine tuned for playing an open src game that is non-GPL so that it can be deployed to the app store for interesting bot play ideas?

paulryanrogers
0 replies
15h19m

For very modest definitions of playing. Perhaps it'd be more impressive if they recorded a demo file and let that play back without the realtime overhead? Even so it can only move in forward, back, turn, and fire. And only knows to face away from the wall it's collided with. This is so far below even basic Doom bots that I'd be afraid to call it playing.

The ASCII intermediate interpretation also seems unnecessary and very limiting. But perhaps that's to keep it near realtime, looks like 1 FPS?

And why run on a Mac? Why not a beefy PC with a GPU that can do the calculations faster?

Still, does seem like a fun challenge. Maybe with further tuning or training it can level up

modeless
5 replies
16h28m

Why isn't it actually Doom? Surely there are multiple JS Dooms to choose from.

kadoban
2 replies
14h2m

Doom is still under copyright protection last I knew. The source is GPL, but have the assets ever been liberally licensed? I think they're more abandonware.

I'm sure you could still do it, but personally I try to respect copyright strictly for any projects I'm going to share. It just feels annoying to have copyright nonsense hanging over me otherwise.

modeless
1 replies
13h55m

Well certainly we don't need the full game assets for a captcha. The shareware version would do just fine and that's always been free.

chungy
0 replies
13h10m

Even better, Freedoom.

tiltowait
0 replies
15h28m

"Finish UV Hangar in < 13 seconds."

Easily achievable[0], thoroughly obnoxious[1]. Just like all captchas.

[0] God help you if you're on a touchscreen. [1] For most people. Especially after the novelty wears off.

Solvency
0 replies
15h49m

Yeah kind of bummed me out.

explaininjs
5 replies
18h3m

Now I want Men In Black mode, where your job is to identify the threat posed by the popup and shoot accordingly:

Alien doing pull ups? Fine. 8 year old girl holding a Quantum Physics book in a dark alley? That's sus...

girvo
4 replies
16h49m

Having re-watched that movie recently, he's not wrong -- that's a deeply odd book for an apparent 8 year old girl to be holding. And with the amount of aliens that look like humans across the movies...

cwillu
1 replies
14h2m

Typical cop assuming any behaviour they can't explain must be malevolent.

explaininjs
0 replies
11h17m

They call it entrapment - the officials put him in a position where be believes he's required to shoot in order to pass a test, but he sees no reason to. So finally he has to go with his gut and shoot the most probable target, even if he would have if not placed in that situation with those expectations.

canjobear
1 replies
13h31m

I always thought he passed the test there, and the guys that just mindlessly shot failed.

explaininjs
0 replies
11h19m

Well of course he passed - they immediately after offer him the job and neuralize everyone else.

Apreche
4 replies
18h33m

This is a fun idea, but it doesn't seem to work in any browser I tried. Maybe adblock is breaking it?

wruza
2 replies
18h28m

You have to click on "ON" or "OFF" to start. Unintuitive.

Apreche
1 replies
17h21m

Thanks. That was the issue. I was clicking on the text that says "click to start"

binary132
0 replies
17h7m

I did that a few times myself :)

nntwozz
0 replies
18h28m

Works for me iOS Safari with AdGuard.

wanderer2323
2 replies
14h16m

Absolute banger. But the auto-aim on vertical axis is missing. You should be able to have the crosshair under an enemy and still hit them. But in any case, nicely done!

evrimoztamur
0 replies
5h22m

Here's the real Doom player!

daveslash
0 replies
3h19m

Funny enough, when I've tried to introduce (indoctrinate) friends to DOOM, "how do I aim up" has consistently been the biggest hangup.

This makes sense when I try to indoctrinate my teenager who grew up on Halo and Call of Duty. But I began noticing this hangup in the late 90s with friends my own age.

paulryanrogers
2 replies
15h36m

Not really Doom, a few years old, and now broken apparently. IIRC it was basically just a mouse only shooting gallery mini-game.

EDIT: Not broken, just not obvious one must click the sound options to start. Still just a mouse gallery mini-game. Doubtful you'd even need AI to solve it.

justinator
1 replies
14h0m

Well let's be honest, a human (YOU I assume) couldn't even figure out how to start the game, so if AI can solve it, we're in real trouble.

paulryanrogers
0 replies
12h47m

So a CAPTCHA that keeps humans out? Sadly that is all too common

jml7c5
2 replies
18h22m

You should try for a full 3D implementation of Doom! I'm sure it's been ported to JavaScript at least a dozen times.

taneq
1 replies
8h2m

Wny stop there when you could just use a webassembly port of the actual game with hacked-in portal to the actual site somewhere... :P

nottorp
0 replies
3h42m

For bonus points fire up a Windows VM that will run the original Doom files...

Or maybe a remote desktop into an OS with a sandboxed browser that runs a Windows VM that ...

jgalt212
1 replies
16h15m

this crashed my firefox. anyone else?

NamTaf
0 replies
15h53m

Nope, worked fine for me on 124.0.1 w/ several extensions

edpichler
1 replies
7h45m

Amazing. I wish it was claimed to be secure!

internetter
0 replies
1h55m

I'm not sure it's possible to make secure. To render the positions of the enemies, the browser receives 4 coords. To submit the capcha, the browser submits 4 coords – the same ones it received. Perhaps you could track the variance between the exact position and the position the user selected, as well as timing. But would it be enough?

colonwqbang
1 replies
8h8m

Don't take this too seriously, this is a little project for fun, if do you know how to code it's pretty easy to break the security of this.

As opposed to standard "click the traffic light" type captchas which are almost impossible for modern AI to break.

I think the doom captcha is probably more secure than standard captchas simply by virtue of its obscurity.

nottorp
0 replies
3h38m

which are almost impossible for modern AI to break.

... and for humans, sometimes :)

"Standard" captchas sometimes also bring up major philosophical questions like "what is a bicycle?".

wutwutwat
0 replies
2h21m

Google has been contracting for the military doing AI for over a decade, I'm pretty sure targeting objects w/ a computer in a combat type situation isn't going to stop anyone. They have aim bots for most FPS games too

Still cool and unique though

wmil
0 replies
14h21m

Can you make one based on the WoW fishing minigame? ie they need to click on the bobber at the right time.

I'm not expecting it to last longer, but there really should be some decent fishing bots at this point.

sunnybeetroot
0 replies
16h46m

Missing 2021 tag

pkrefta
0 replies
17h9m

Best captcha I've ever seen <3

major505
0 replies
6h11m

love the super shotgun code.

jelder
0 replies
6h0m

It let me through despite trying to attack a cacodemon with a pistol.

With it being so famously portable, I was expecting this to actually run Doom in the browser and complete a simple map.

deadbabe
0 replies
13h21m

There needs to be hostages or barrels that you shouldn’t shoot because you’ll die.

avsteele
0 replies
18h30m

This is fun. I have been having trouble with Google capchas recently, so Ii;d be happy if more where like this.

airtonix
0 replies
18h27m

just spam click... autowin.

Dowwie
0 replies
4h51m

I want a doom progress window that allows a user to play doom while waiting for a task to complete