return to table of content

Dogbolt Decompiler Explorer

danielwmayer
26 replies
22h2m

The name of this is a reference to the incredibly useful godbolt compiler explorer. If you are interested in this you will likely enjoy the other as well:

https://godbolt.org/

riffraff
17 replies
21h49m

and for those who don't know it, that one is named after the author, Matt Godbolt.

I thought for a longtime it was some joke I wasn't getting related to deities smithing people.

Waterluvian
10 replies
21h29m

Damn. To just name something your last name.

I thought it was the sibling part to the Jesus Nut. https://en.wikipedia.org/wiki/Jesus_nut

mattgodbolt
6 replies
20h57m

It's never been called anything but either "GCC Explorer" or "Compiler Explorer", by me, anyway... The URL it's accessible for is an accident of the one I had hanging around :) (it's now available at compiler-explorer.com too, but...the name other people use has stuck so I'll never be able to reclaim my own domain...)

joemi
3 replies
20h31m

I think you _could_ reclaim your own domain if you wanted. You'd want to have a banner at the top with a clear note directing people to the new domain for the compiler explorer, so that people realize immediately that you're not domain squatting. A few people might put up a stink, but I'm pretty confident that most people wouldn't mind, especially since the tool itself is so useful. The name, for those who don't know it as your last name, is fun, but it isn't the reason people use the tool. Eventually, over enough time, people would start remembering the new URL, and you could shrink or remove the banner (and/or put a note elsewhere on the page).

johannes1234321
1 replies
17h31m

Even then the internet (and even books) are full of "godbolt" links, to the tool itself, to specific code samples. Till all those became irrelevant will take quite some time.

As a data point: Search on stack overflow yields "500" hits. https://stackoverflow.com/search?q=godbolt

account42
0 replies
5h38m

Links to specific examples are less of a problem as he could redirect those to compiler-explorer.com and just keep that redirect up forever. Really the only URL that would need to be "reclaimed" is https://godbolt.org/ and having a prominent link to compiler-explorer.com thee would solve that issue.

OTOH the godbolt domain is at least not actively used for a number of other TLDs getting one of those might be an easier option.

bombcar
0 replies
18h47m

Honestly "godbolt" is so memorable I can find it instantly even though I rarely use it; but "compiler-explorer" sounds like some generic SEO spam site that I'd probably never click on.

nhatcher
0 replies
19h31m

It is fantastic name of an otherwise fantastic tool. The day I found it was your last name made me chuckle and liked it even more. And since I am here, thank you very much for it!

I always call it the compiler explorer but the url, as a sibling comment says, is memorable.

Waterluvian
0 replies
20h44m

It’s such a memorable name for a tool like that. Other than losing your domain name to the topic, how do you feel about the de facto name?

To a far far lesser degree, I’ve experienced many examples of “you named it X but everyone at work calls it Y and now you have to live with that.” It used to really irk me for some reason.

jchw
1 replies
21h22m

Could be misremembering, but IIRC it was called Compiler Explorer and used to live only on a subdomain of godbolt.org. But, it was so useful that it became presumably vastly higher traffic than the personal homepage part and people often referred to it as just "Godbolt" probably because it sounds cooler and is shorter than saying "Compiler Explorer" (and it may not be obvious the domain name is a last name rather than just a cool name for something.)

Waterluvian
0 replies
21h17m

Now that’s a pretty cool origin story for a name. What a compliment!

jjoonathan
0 replies
19h9m

To be fair it's an amazing last name and it feels like there probably is a story, it just has to do with this guy's ancestors rather than the assembler tool we all know and love.

insulanus
5 replies
21h20m

deities smithing people.

That's "deities smiting people.", but I really like the idea of deities smithing people :)

pjmorris
2 replies
21h1m

There's a joke about Adam and Eve in here somewhere. Genesis 2 for reference.

reactordev
1 replies
19h47m

Sculpty terracotta would be a fitting choice. It's pretty easy to sculpt when kneaded, bakes in a traditional oven, keeps it's details. Perfect for silicone mold making.

TeMPOraL
0 replies
18h7m

bakes in a traditional oven

Now that reminds me of a verse from a song I heard on the radio as a teenager:

  Had a meeting with my maker
  The superhuman baker
  He popped me in the oven
  And set the dial to lovin'

stcredzero
0 replies
19h51m

This happens in the Norse myths.

WJW
0 replies
17h13m

The Dwarven god in DnD is so good at crafting he can literally make new souls in his forge. :)

reaperman
4 replies
21h52m

It might also be a bit of a portmanteau with a second reference to dogpile.com which was a pre-Google "search engine" that compiled search results from multiple search engines. Back in the day you often had to separately search altavista.com, lycos.com, askjeeves.com, yahoo.com, etc. because some of them would work for your query but others would not and it was difficult to predict the performance of any particular search engine, but usually at least one of them would have the result you wanted/needed.

Dogpile was an automated way to search all of the search engines at the same time with one query.

https://web.archive.org/web/19990429194414/http://dogpile.co...

psifertex
2 replies
21h42m

I do remember dogpile, but as one of the folks who named it, nope, that wasn't a conscious influence!

borski
1 replies
19h0m

Oh, it you! Hi Jordan I miss you let’s hang out sometime :)

psifertex
0 replies
16h29m

Yes, lets! And before hacker summer camp when we're way way too busy! :-)

codetrotter
0 replies
21h33m

Look no further than https://dogbolt.org/faq

It's meant to be the reverse of the amazing Compiler Explorer.

With a link to https://godbolt.org/

It’s very obvious that Dogbolt Decompiler Explorer is primarily named after Godbolt Compiler Explorer.

29athrowaway
2 replies
18h1m

There's also RMSbolt, which is a Compiler explorer for Emacs, where Richard M. Stallman is regarded as the "creator".

extraduder_ire
0 replies
13h1m

It makes for a nice parallel, since the original version of godbolt was just a split tmux session with vim running on one side, and "watch 'gcc -S -o /dev/stdout'" on the other. The main advantage of putting it online is not needing all of the compilers locally.

account42
0 replies
5h33m

Richard M. Stallman

That's St IGNUcius to you.

[0] https://stallman.org/saint.html

Carbocarde
10 replies
21h4m

All submitted binaries are saved and made available to any of the authors of the tools used so they may improve their decompilers. If you're such an author who would like access, let us know!.

oof

einpoklum
4 replies
19h52m

If you believe that content you submit to websites is not examined by interested parties associated with that website, then - I have a bridge to sell you... or perhaps I should say a Google account to give you, free of charge.

Carbocarde
3 replies
17h51m

Compare this policy to godbolt’s policy:

In short: your source code is stored in plaintext for the minimum time feasible to be able to process your request. After that, it is discarded and is inaccessible. In very rare cases your code may be kept for a little longer (at most a week) to help debug issues in Compiler Explorer.
saagarjha
1 replies
16h37m

Pretty sure links work basically forever

extraduder_ire
0 replies
13h10m

I think they changed it recently, but all of the code you submit is embedded in the URL. (after an anchor) So, it's stored by google's link shortening service, but is resubmitted to the site every time you load it.

boneitis
0 replies
2h22m

My bias may be showing, being a ctf-scene enthusiast. Most of these (tools on dogbolt) look like foss utilities you can run yourself. The rest, I'd imagine you are welcome to pay for licenses. Binary Ninja in particular, while maybe not cheap for everybody, isn't sky-high.

smegsicle
0 replies
18h33m

so like vscode?

saagarjha
0 replies
18h17m

Sweet, free file hosting

marcellus23
0 replies
15h33m

They make it very clear. If you don't notice that before uploading some private binaries, that's on you.

account42
0 replies
5h45m

Yep. Remember that that means you are not allowed to submit any binaries for which you don't have the license to redistribute.

CaliforniaKarl
0 replies
20h57m

Good that this is clearly mentioned up-front on their site.

Arch-TK
8 replies
22h29m

HexRays online? Is that allowed?

sonicanatidae
3 replies
22h23m

Not anymore!

angrily writes a letter to his congressman who won't understand a word of it

quickthrower2
2 replies
20h10m

Your congressman doesn’t yet have hexrays to decompile your letter

sonicanatidae
0 replies
2h37m

From what I can tell in observation, they don't parse English either.

exikyut
0 replies
19h29m

His brain is relegated to spewing out the Matrix unparsed as he receives it. He gets none of the blondes, brunettes or redheads.

alright2565
2 replies
22h19m

From the FAQ, Hex-Rays actually sponsors the project:

Vector 35 and Hex-Rays jointly sponsor the hosting on Digital Ocean as a community service.
cristeigabriel
1 replies
22h16m

It makes sense, it's a perfect advertisement of their superiority.

Fabricio20
0 replies
21h39m

Indeed, looking at the samples HexRays really did a great job compared to the others, much more readable code.

rychco
0 replies
22h20m

When this first came out a year(ish?) ago, I remember seeing somewhere that they had received permission from Hexrays/Ilfak Guilfanov.

hoosieree
7 replies
21h51m

Wow, I really could have used this for my Ph.D. research (deep learning for obfuscated code).

I ditched Ghidra in my experiments in favor of angr early on because Ghidra did not play nicely with multiprocessing and I had a lot of data to process. Well maybe it does but it was much easier for me to achieve the same thing with angr.

Love the name! Although I feel compelled to point out that Compiler Explorer is the name of the project and Godbolt is its author's last name, but I suppose if people are to the point of using Godbolt as a verb the ship has sailed.

mvelbaum
2 replies
12h17m

Has there been any good progress in deobfuscating/decompiling machine code using Machine Learning techniques?

hoosieree
1 replies
2h10m

Short answer: not where it counts.

My work focuses on recognizing known functions in obfuscated binaries, but there are some papers you might want to check out related to deobfuscation, if not necessarily using ML for deobfuscation or decompilation.

My take is that ML can soundly defeat the "easy" and more static obfuscation types (encodings, control flow flattening, splitting functions). It's low hanging fruit, and it's what I worked on most, but adoption is slow. On the other hand, "hard" obfuscations like virtualized functions or programs which embed JIT compilers to obfuscate at runtime... as far as I know, those are still unsolved problems.

This is a good overview of the subject, but pretty old and doesn't cover "hard" obfuscations: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1566145.

https://www.jinyier.me/papers/DATE19_Obf.pdf uses deobfuscation for RTL logic (FGPA/ASIC domain) with SAT solvers. Might be useful for a point of view from a fairly different domain.

https://advising.cs.arizona.edu/~debray/Publications/generic... uses "semantics-preserving transformations" to shed obfuscation. I think this approach is the way to go, especially when combined with dynamic/symbolic analysis to mitigate virt/jit types of transformations.

I'll mention this one as a cautionary tale: https://dl.acm.org/doi/pdf/10.1145/2886012 has some good general info but glosses over the machine learning approach. It considers Hex-rays' FLIRT to be "machine learning", but FLIRT just hashes signatures, can be spoofed (i.e. https://siliconpr0n.org/uv/issues_with_flirt_aware_malware.p...), and is useless against obfuscation.

Eventually I think SBOM tools like Black Duck[1] and SLSA[2] will incorporate ML to improve the accuracy of even figuring out what dependencies a piece of software actually has.

[1]: https://www.synopsys.com/software-integrity/software-composi...

[2]: https://slsa.dev/

mvelbaum
0 replies
34m

Very cool - thank you very much!

My take is that ML can soundly defeat the "easy" and more static obfuscation types (encodings, control flow flattening, splitting functions). It's low hanging fruit, and it's what I worked on most, but adoption is slow.

If I wanted to implement my own toy HexRays-like decompiler using a few of these techniques to decompile x86-64 binaries is there any high quality up-to-date paper/resource you would recommend?

Or do you think that "A Generic Approach to Automatic Deobfuscation of Executable Code" paper is a good enough start?

Also, what do you think about https://tigress.wtf/ ?

tomcam
1 replies
15h45m

Sometimes we must look back in angr

hoosieree
0 replies
2h9m

That better be a Bowie reference and not an Oasis reference.

psifertex
1 replies
21h41m

We know! Similarly, the GH repo is actually the Decompiler Explorer:

https://github.com/decompiler-explorer/decompiler-explorer/

account42
0 replies
5h31m

I like the name, it's cute and a nice homage.

aidenfoxivey
4 replies
13h3m

Speaking of decompilers, would Binary Ninja be a safe bet to pick? I've been told IDA is the gold standard, but it's also expensive for someone who wants to recreationally reverse engineer.

IAmLiterallyAB
2 replies
12h41m

Honestly just use Ghidra. It has it's quirks but it's pretty good. And open source. If it's good enough for the NSA it's probably good enough for recreational use.

codedokode
1 replies
7h49m

If Ghidra is made by NSA, does it mean that it can have backdoors for non-US users?

dddnzzz334
0 replies
5h0m

The code is open source and has been looked at by several people over the years. It would be quite hard for the NSA to sneak in a backdoor but it is never out of the question. However, the risk is so extremely minuscule when compared to other alternatives since they are not even open source.

kdbg
0 replies
9h52m

Binja decompiler is more-or-less fine. Its not as mature as IDA or Ghidra but its not a bad decompiler.

Though for me the big selling point on Binja is the Intermediate Languages (ILs). HIgh-level IL is the decompiler but you also get Low-level and Medium-level ILs as steps between assembly and source. If the decompiler is a bit funky you can look at the ILs to get a better idea of what is happening. the ILs are also just much nicer to read than plain assembly so I tend to use them a lot.

Its a feature that isn't really matched on any other platform. Ghidra and IDA both have a single IL that is more machine readable compared to Binja's human-readable ones.

rixtox
3 replies
21h59m

I really wish a similar tool for exploring binary lifting to different IRs. Like Ghidra p-code with sleigh, LLVM Machine IR, Qemu TCG etc

JonChesterfield
1 replies
17h53m

Qemu works by translating a binary to an IR then doing stuff with it. Valgrind likewise. There's an optimiser called bolt (associated with facebook) which has the same idea.

psifertex
0 replies
16h35m

Yup, I'm aware of both of those, but none of those tools listed so far are intended for the IR to be for human-consumable unlike disassemblers and decompilers. You think disassembly is verbose compared to a decompiler? Go look at the equivalent Vex (Valgrind's IR) for any non-trivial disassembly. It's suuuper verbose.

As far as I know, BNIL (https://docs.binary.ninja/dev/bnil-overview.html) is the only one that is designed to be readable and it still wouldn't make sense to include it in an IL comparison such as the one done here for decompilation in my opinion.

psifertex
0 replies
21h40m

IRs aren't generally suited toward small snippets of examination by human when you're starting with a full binary. I would imagine something like that would only work well when done for very small bits of assembly. Likewise, you might be interested in BNIL which is an entire stack of ILs that Binary Ninja is based on. (You can see it exposed in the cloud.binary.ninja UI or the demo)

29athrowaway
3 replies
18h0m

Now take the output of dogbolt and feed into godbolt.

staunton
1 replies
17h28m

And reinforcement-train an LLM to reconstruct the original code...

29athrowaway
0 replies
17h0m

That would be dogebolt

userbinator
0 replies
12h25m

Machine translation, for machine code.

Theoretically, a fixed point should be reached.

w10-1
1 replies
21h14m

OMG I am so happy

Of note: HexRays is not only cleaner, but right now their queue is mostly empty while others are backed up.

psifertex
0 replies
16h31m

Binary Ninja likewise is empty and keeps up just fine as well. It's not a coincidence that the two commercial products that are funding it are both confident enough to put their stuff online like this.

And it's no conspiracy theory or intentional sandbagging, you can see the implementation: https://github.com/decompiler-explorer/decompiler-explorer

and if anyone can improve the other tools performance we'd be happy to accept it. We reached out to the Ghidra devs: https://github.com/NationalSecurityAgency/ghidra/issues/5228 but they didn't have any silver bullets for us either.

iBotPeaches
1 replies
22h0m

Love this - I can almost imagine the convincing for other companies wasn't even needed when they realized a small binary size and comparison to competitors would net them more business. A perfect little solution for triaging issues between services and comparing solutions.

psifertex
0 replies
21h38m

That was indeed the logic. The two main commercial solutions included (Binary Ninja made by Vector 35, where I'm one of hte founders) and Hex-Rays both pay for all the hosting costs. And it's not particularly cheap -- there's a fair amount of compute to drive the decompilers especially as some of them are... not very efficient.

cristeigabriel
1 replies
22h18m

Very nice. A parallel, I've been working on an emulator project recently, implementing my own disassembler, and I keep thinking about how I would turn patterns of machine code into a generalized form, which could then be turned into something like C-like pseudo-code, so it's been really compelling me lately to implement my own toy decompiler

withzombies
0 replies
2h24m

BinaryNinja does this. They have several layers of intermediate representations[1], which they build their compiler on top of. Ghidra does something similar with their PCode. They disassemble to PCode and then decompile the PCode[2].

[1] https://docs.binary.ninja/dev/bnil-overview.html [2] https://riverloopsecurity.com/blog/2019/05/pcode/ (an example)

psifertex
0 replies
16h27m

Can I just say, thanks to the person who posted this for waiting until this week to do so. (Side note: I suspect it was due to the recent coverage from C++ Weekly which is a great resource: https://www.youtube.com/watch?v=h3F0Fw0R7ME)

As recently as last week we had some horrible performance problems but it looks like the queue (https://dogbolt.org/queue) is mostly still fine! Other than the long pole of a few of the decompilers being backed up, things are humming along quite smoothly! Josh + Glenn have done some great work on it! (https://github.com/decompiler-explorer/decompiler-explorer/c...)

fritzo
0 replies
17h50m

Is there a similar project for javascript? That is, de-obfuscating large javascript codebases?

dang
0 replies
22h11m

Related:

Decompiler Explorer - https://news.ycombinator.com/item?id=32079227 - July 2022 (82 comments)

costco
0 replies
15h13m

I wish I saw this when it was posted last year. This is awesome and really convenient.

T3RMINATED
0 replies
22h11m

nice

DrNosferatu
0 replies
1h25m

Any good and thorough decompiler tutorials for non-expert users?