The name of this is a reference to the incredibly useful godbolt compiler explorer. If you are interested in this you will likely enjoy the other as well:
The name of this is a reference to the incredibly useful godbolt compiler explorer. If you are interested in this you will likely enjoy the other as well:
All submitted binaries are saved and made available to any of the authors of the tools used so they may improve their decompilers. If you're such an author who would like access, let us know!.
oof
If you believe that content you submit to websites is not examined by interested parties associated with that website, then - I have a bridge to sell you... or perhaps I should say a Google account to give you, free of charge.
Compare this policy to godbolt’s policy:
In short: your source code is stored in plaintext for the minimum time feasible to be able to process your request. After that, it is discarded and is inaccessible. In very rare cases your code may be kept for a little longer (at most a week) to help debug issues in Compiler Explorer.
Pretty sure links work basically forever
I think they changed it recently, but all of the code you submit is embedded in the URL. (after an anchor) So, it's stored by google's link shortening service, but is resubmitted to the site every time you load it.
My bias may be showing, being a ctf-scene enthusiast. Most of these (tools on dogbolt) look like foss utilities you can run yourself. The rest, I'd imagine you are welcome to pay for licenses. Binary Ninja in particular, while maybe not cheap for everybody, isn't sky-high.
so like vscode?
Sweet, free file hosting
They make it very clear. If you don't notice that before uploading some private binaries, that's on you.
Yep. Remember that that means you are not allowed to submit any binaries for which you don't have the license to redistribute.
Good that this is clearly mentioned up-front on their site.
HexRays online? Is that allowed?
Not anymore!
angrily writes a letter to his congressman who won't understand a word of it
Your congressman doesn’t yet have hexrays to decompile your letter
From what I can tell in observation, they don't parse English either.
His brain is relegated to spewing out the Matrix unparsed as he receives it. He gets none of the blondes, brunettes or redheads.
From the FAQ, Hex-Rays actually sponsors the project:
Vector 35 and Hex-Rays jointly sponsor the hosting on Digital Ocean as a community service.
It makes sense, it's a perfect advertisement of their superiority.
Indeed, looking at the samples HexRays really did a great job compared to the others, much more readable code.
When this first came out a year(ish?) ago, I remember seeing somewhere that they had received permission from Hexrays/Ilfak Guilfanov.
Wow, I really could have used this for my Ph.D. research (deep learning for obfuscated code).
I ditched Ghidra in my experiments in favor of angr early on because Ghidra did not play nicely with multiprocessing and I had a lot of data to process. Well maybe it does but it was much easier for me to achieve the same thing with angr.
Love the name! Although I feel compelled to point out that Compiler Explorer is the name of the project and Godbolt is its author's last name, but I suppose if people are to the point of using Godbolt as a verb the ship has sailed.
Has there been any good progress in deobfuscating/decompiling machine code using Machine Learning techniques?
Short answer: not where it counts.
My work focuses on recognizing known functions in obfuscated binaries, but there are some papers you might want to check out related to deobfuscation, if not necessarily using ML for deobfuscation or decompilation.
My take is that ML can soundly defeat the "easy" and more static obfuscation types (encodings, control flow flattening, splitting functions). It's low hanging fruit, and it's what I worked on most, but adoption is slow. On the other hand, "hard" obfuscations like virtualized functions or programs which embed JIT compilers to obfuscate at runtime... as far as I know, those are still unsolved problems.
This is a good overview of the subject, but pretty old and doesn't cover "hard" obfuscations: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1566145.
https://www.jinyier.me/papers/DATE19_Obf.pdf uses deobfuscation for RTL logic (FGPA/ASIC domain) with SAT solvers. Might be useful for a point of view from a fairly different domain.
https://advising.cs.arizona.edu/~debray/Publications/generic... uses "semantics-preserving transformations" to shed obfuscation. I think this approach is the way to go, especially when combined with dynamic/symbolic analysis to mitigate virt/jit types of transformations.
I'll mention this one as a cautionary tale: https://dl.acm.org/doi/pdf/10.1145/2886012 has some good general info but glosses over the machine learning approach. It considers Hex-rays' FLIRT to be "machine learning", but FLIRT just hashes signatures, can be spoofed (i.e. https://siliconpr0n.org/uv/issues_with_flirt_aware_malware.p...), and is useless against obfuscation.
Eventually I think SBOM tools like Black Duck[1] and SLSA[2] will incorporate ML to improve the accuracy of even figuring out what dependencies a piece of software actually has.
[1]: https://www.synopsys.com/software-integrity/software-composi...
[2]: https://slsa.dev/
Very cool - thank you very much!
My take is that ML can soundly defeat the "easy" and more static obfuscation types (encodings, control flow flattening, splitting functions). It's low hanging fruit, and it's what I worked on most, but adoption is slow.
If I wanted to implement my own toy HexRays-like decompiler using a few of these techniques to decompile x86-64 binaries is there any high quality up-to-date paper/resource you would recommend?
Or do you think that "A Generic Approach to Automatic Deobfuscation of Executable Code" paper is a good enough start?
Also, what do you think about https://tigress.wtf/ ?
Sometimes we must look back in angr
That better be a Bowie reference and not an Oasis reference.
We know! Similarly, the GH repo is actually the Decompiler Explorer:
I like the name, it's cute and a nice homage.
Speaking of decompilers, would Binary Ninja be a safe bet to pick? I've been told IDA is the gold standard, but it's also expensive for someone who wants to recreationally reverse engineer.
Honestly just use Ghidra. It has it's quirks but it's pretty good. And open source. If it's good enough for the NSA it's probably good enough for recreational use.
If Ghidra is made by NSA, does it mean that it can have backdoors for non-US users?
The code is open source and has been looked at by several people over the years. It would be quite hard for the NSA to sneak in a backdoor but it is never out of the question. However, the risk is so extremely minuscule when compared to other alternatives since they are not even open source.
Binja decompiler is more-or-less fine. Its not as mature as IDA or Ghidra but its not a bad decompiler.
Though for me the big selling point on Binja is the Intermediate Languages (ILs). HIgh-level IL is the decompiler but you also get Low-level and Medium-level ILs as steps between assembly and source. If the decompiler is a bit funky you can look at the ILs to get a better idea of what is happening. the ILs are also just much nicer to read than plain assembly so I tend to use them a lot.
Its a feature that isn't really matched on any other platform. Ghidra and IDA both have a single IL that is more machine readable compared to Binja's human-readable ones.
I really wish a similar tool for exploring binary lifting to different IRs. Like Ghidra p-code with sleigh, LLVM Machine IR, Qemu TCG etc
Qemu works by translating a binary to an IR then doing stuff with it. Valgrind likewise. There's an optimiser called bolt (associated with facebook) which has the same idea.
Yup, I'm aware of both of those, but none of those tools listed so far are intended for the IR to be for human-consumable unlike disassemblers and decompilers. You think disassembly is verbose compared to a decompiler? Go look at the equivalent Vex (Valgrind's IR) for any non-trivial disassembly. It's suuuper verbose.
As far as I know, BNIL (https://docs.binary.ninja/dev/bnil-overview.html) is the only one that is designed to be readable and it still wouldn't make sense to include it in an IL comparison such as the one done here for decompilation in my opinion.
IRs aren't generally suited toward small snippets of examination by human when you're starting with a full binary. I would imagine something like that would only work well when done for very small bits of assembly. Likewise, you might be interested in BNIL which is an entire stack of ILs that Binary Ninja is based on. (You can see it exposed in the cloud.binary.ninja UI or the demo)
Now take the output of dogbolt and feed into godbolt.
And reinforcement-train an LLM to reconstruct the original code...
That would be dogebolt
Machine translation, for machine code.
Theoretically, a fixed point should be reached.
OMG I am so happy
Of note: HexRays is not only cleaner, but right now their queue is mostly empty while others are backed up.
Binary Ninja likewise is empty and keeps up just fine as well. It's not a coincidence that the two commercial products that are funding it are both confident enough to put their stuff online like this.
And it's no conspiracy theory or intentional sandbagging, you can see the implementation: https://github.com/decompiler-explorer/decompiler-explorer
and if anyone can improve the other tools performance we'd be happy to accept it. We reached out to the Ghidra devs: https://github.com/NationalSecurityAgency/ghidra/issues/5228 but they didn't have any silver bullets for us either.
Love this - I can almost imagine the convincing for other companies wasn't even needed when they realized a small binary size and comparison to competitors would net them more business. A perfect little solution for triaging issues between services and comparing solutions.
That was indeed the logic. The two main commercial solutions included (Binary Ninja made by Vector 35, where I'm one of hte founders) and Hex-Rays both pay for all the hosting costs. And it's not particularly cheap -- there's a fair amount of compute to drive the decompilers especially as some of them are... not very efficient.
Very nice. A parallel, I've been working on an emulator project recently, implementing my own disassembler, and I keep thinking about how I would turn patterns of machine code into a generalized form, which could then be turned into something like C-like pseudo-code, so it's been really compelling me lately to implement my own toy decompiler
BinaryNinja does this. They have several layers of intermediate representations[1], which they build their compiler on top of. Ghidra does something similar with their PCode. They disassemble to PCode and then decompile the PCode[2].
[1] https://docs.binary.ninja/dev/bnil-overview.html [2] https://riverloopsecurity.com/blog/2019/05/pcode/ (an example)
Can I just say, thanks to the person who posted this for waiting until this week to do so. (Side note: I suspect it was due to the recent coverage from C++ Weekly which is a great resource: https://www.youtube.com/watch?v=h3F0Fw0R7ME)
As recently as last week we had some horrible performance problems but it looks like the queue (https://dogbolt.org/queue) is mostly still fine! Other than the long pole of a few of the decompilers being backed up, things are humming along quite smoothly! Josh + Glenn have done some great work on it! (https://github.com/decompiler-explorer/decompiler-explorer/c...)
Is there a similar project for javascript? That is, de-obfuscating large javascript codebases?
Related:
Decompiler Explorer - https://news.ycombinator.com/item?id=32079227 - July 2022 (82 comments)
I wish I saw this when it was posted last year. This is awesome and really convenient.
nice
Any good and thorough decompiler tutorials for non-expert users?
and for those who don't know it, that one is named after the author, Matt Godbolt.
I thought for a longtime it was some joke I wasn't getting related to deities smithing people.
Damn. To just name something your last name.
I thought it was the sibling part to the Jesus Nut. https://en.wikipedia.org/wiki/Jesus_nut
It's never been called anything but either "GCC Explorer" or "Compiler Explorer", by me, anyway... The URL it's accessible for is an accident of the one I had hanging around :) (it's now available at compiler-explorer.com too, but...the name other people use has stuck so I'll never be able to reclaim my own domain...)
I think you _could_ reclaim your own domain if you wanted. You'd want to have a banner at the top with a clear note directing people to the new domain for the compiler explorer, so that people realize immediately that you're not domain squatting. A few people might put up a stink, but I'm pretty confident that most people wouldn't mind, especially since the tool itself is so useful. The name, for those who don't know it as your last name, is fun, but it isn't the reason people use the tool. Eventually, over enough time, people would start remembering the new URL, and you could shrink or remove the banner (and/or put a note elsewhere on the page).
Even then the internet (and even books) are full of "godbolt" links, to the tool itself, to specific code samples. Till all those became irrelevant will take quite some time.
As a data point: Search on stack overflow yields "500" hits. https://stackoverflow.com/search?q=godbolt
Links to specific examples are less of a problem as he could redirect those to compiler-explorer.com and just keep that redirect up forever. Really the only URL that would need to be "reclaimed" is https://godbolt.org/ and having a prominent link to compiler-explorer.com thee would solve that issue.
OTOH the godbolt domain is at least not actively used for a number of other TLDs getting one of those might be an easier option.
Honestly "godbolt" is so memorable I can find it instantly even though I rarely use it; but "compiler-explorer" sounds like some generic SEO spam site that I'd probably never click on.
It is fantastic name of an otherwise fantastic tool. The day I found it was your last name made me chuckle and liked it even more. And since I am here, thank you very much for it!
I always call it the compiler explorer but the url, as a sibling comment says, is memorable.
It’s such a memorable name for a tool like that. Other than losing your domain name to the topic, how do you feel about the de facto name?
To a far far lesser degree, I’ve experienced many examples of “you named it X but everyone at work calls it Y and now you have to live with that.” It used to really irk me for some reason.
Could be misremembering, but IIRC it was called Compiler Explorer and used to live only on a subdomain of godbolt.org. But, it was so useful that it became presumably vastly higher traffic than the personal homepage part and people often referred to it as just "Godbolt" probably because it sounds cooler and is shorter than saying "Compiler Explorer" (and it may not be obvious the domain name is a last name rather than just a cool name for something.)
Now that’s a pretty cool origin story for a name. What a compliment!
To be fair it's an amazing last name and it feels like there probably is a story, it just has to do with this guy's ancestors rather than the assembler tool we all know and love.
That's "deities smiting people.", but I really like the idea of deities smithing people :)
There's a joke about Adam and Eve in here somewhere. Genesis 2 for reference.
Sculpty terracotta would be a fitting choice. It's pretty easy to sculpt when kneaded, bakes in a traditional oven, keeps it's details. Perfect for silicone mold making.
Now that reminds me of a verse from a song I heard on the radio as a teenager:
This happens in the Norse myths.
The Dwarven god in DnD is so good at crafting he can literally make new souls in his forge. :)
It might also be a bit of a portmanteau with a second reference to dogpile.com which was a pre-Google "search engine" that compiled search results from multiple search engines. Back in the day you often had to separately search altavista.com, lycos.com, askjeeves.com, yahoo.com, etc. because some of them would work for your query but others would not and it was difficult to predict the performance of any particular search engine, but usually at least one of them would have the result you wanted/needed.
Dogpile was an automated way to search all of the search engines at the same time with one query.
https://web.archive.org/web/19990429194414/http://dogpile.co...
I do remember dogpile, but as one of the folks who named it, nope, that wasn't a conscious influence!
Oh, it you! Hi Jordan I miss you let’s hang out sometime :)
Yes, lets! And before hacker summer camp when we're way way too busy! :-)
Look no further than https://dogbolt.org/faq
With a link to https://godbolt.org/
It’s very obvious that Dogbolt Decompiler Explorer is primarily named after Godbolt Compiler Explorer.
There's also RMSbolt, which is a Compiler explorer for Emacs, where Richard M. Stallman is regarded as the "creator".
It makes for a nice parallel, since the original version of godbolt was just a split tmux session with vim running on one side, and "watch 'gcc -S -o /dev/stdout'" on the other. The main advantage of putting it online is not needing all of the compilers locally.
That's St IGNUcius to you.
[0] https://stallman.org/saint.html