HN comments for: Fuzzing Ladybird with tools from Google Project Zero

DustinBrett

23 replies

1d2h

2024-03-16 15:26:39 UTC

I love that this project keeps showing how possible it is for a small group to make something amazing. This would be very hard to do in a company with stakeholders.

pvg

22 replies

1d1h

2024-03-16 16:30:04 UTC

The project is cool but this post makes me wonder whether this particular approach - starting with something that "does an okay job with well-formed web content" and then trying to work backwards to fix spec and de facto browser behaviour and potential security issues can actually result in a production browser. Which is fine, one can always go back and redo things, especially in a hobby project but it's hard to escape the vague feeling some of this stuff might need to be architected in from the get go.

ramijames

15 replies

1d1h

2024-03-16 16:40:54 UTC

I don't know. It kind of feels like they are replicating real user (developer) behavior by producing lots and lots of weird, low-quality, and not-to-spec code that a parser will likely have to deal with. By doing so they are simply exposing bugs that real users (bad developers) would have done anyway. Seems like a totally legit way to test a complex product. No assumptions. Just lots of randomized nonsense that shows reality.

pvg

7 replies

1d1h

2024-03-16 16:48:18 UTC

I'm not talking about the fuzzing but the design approach. As in, can you make a real browser starting with a kind of 'happy path' implementation and then retrofitting it do be a real browser. That part I'm somewhat skeptical of. It's a totally sensible way to learn to make a real browser, no doubt.

Sammi

4 replies

23h41m

2024-03-16 18:41:41 UTC

"real browser" is doing a lot of work in your comment. Feels like you're about to make a no true scotsman argument.

After all what is a browser other than something that browses? What other characteristics make it "real"?

If Ladybird browses, then it must be a browser.

derefr

2 replies

22h18m

2024-03-16 20:05:34 UTC

I would say that a "real browser" — which I think is being used here to mean a "production-quality" browser, in contrast to a "toy" browser — would be a robust and efficient browser with a maintainable codebase.

jcelerier

1 replies

22h10m

2024-03-16 20:13:11 UTC

robust and efficient browser with a maintainable codebase.

i would say neither chrome or firefox score particularly high in any of these

refulgentis

0 replies

22h1m

2024-03-16 20:21:57 UTC

We're well past absurdity on this line of argument.

Given:

A = a goal of just implementing just the latest and most important specs

B = shipping something they want people to use

There is no browser team, Ladybird or otherwise, that is A and not B, or, A and B.

For clarity's sake: Ladybird doesn't claim A.

Let's pretend they do, as I think that'll be hard for people arguing in this thread to accept they don't.

Then, we know they most certainly aren't claiming B. Landing page says it's too unstable to provide builds for. Outside of that, we well-understand it's not "shipping" or intended to be seen as such.

pvg

0 replies

23h14m

2024-03-16 19:09:01 UTC

"real browser" is doing a lot of work in your comment.

It's not doing nearly as much work as real browsers do!

After all what is a browser other than something that browses? What other characteristics make it "real"?

A real browser is a browser that aspires to be a web browser that can reasonably be used by a (let's say even fairly technical) user to browse the real web. That means handling handling outright adversarial inputs and my point is this is so central to a real browser, it seems it might be hard to retrofit in later.

I gave one example with the null thing, another one would be the section on how the JS API can break the assumptions made by the DOM parser - it similarly sounds like a bug that's really a bug class and a real browser would need a systemic/architecture fix for.

viraptor

0 replies

13h53m

2024-03-17 04:29:48 UTC

The spec is so complex at this point, that I'm not sure you can go the other way. It would also force you to implement weird things nobody will ever use before letting people work with a basic page.

I'd love someone to prove me wrong, but I feel like you'd end up with "you can't display a paragraph of basic text, because we're not even done implementing JS interface to conic gradients in HSL space in a fully compliant way".

DontSignAnytng

0 replies

1d1h

2024-03-16 16:59:44 UTC

What a weird comment on their progress and being transparent. Better have a demo working and itterate on it right? By your way how one even finish anything?

l72

6 replies

22h12m

2024-03-16 20:11:36 UTC

As a developer I would love to have a browser that strictly follows specs and doesn’t deal with any historic compatibility issues. I would focus on making sure my web app works best there which _should_ give best compatibility across a wide range of browsers.

ramijames

4 replies

21h49m

2024-03-16 20:33:42 UTC

ABSOLUTELY.

But, and this is the crucial part, AS A USER YOU WOULD NOT because a large portion of the web is broken.

We don't live in a perfect, sanitary world, and the software we build and use reflects that.

geysersam

3 replies

16h53m

2024-03-17 01:29:43 UTC

I kind of don't buy that argument. The web is not fundamentally different from other programming environments, say Python or Java. It might sometimes be practical to have a python interpreter accept syntactically invalid input because it kinda knows what you mean anyway, but most programming languages don't work that way because it makes things harder in the long run, and the benefits are pretty miniscule.

shiomiru

0 replies

9h4m

2024-03-17 09:19:27 UTC

The problem is that this kind of philosophy is fundamentally incompatible with HTML5.

There was an attempt for a "strict-mode" HTML, it was XML, but it failed (on the web) for various reasons (including IE). HTML5 specifies the exact behavior of what every browser must do upon encountering tag-soup, which is useful because real-world HTML has been tag-soup for a very long time.

I guess the strictest thing you can do is to die upon encountering "validation errors", but I don't think this would help much to simplify your job. (Maybe you can drop the adoption agency?) But now your parser chokes on a lot of websites - likely on hand-written HTML, which has a greater potential for validation errors but also typically simpler layout.

And HTML parsing is still the easy part of writing a browser! Layout is much harder to do, partly because layout is hard, but also because it's under-specified. Implement "undefined behavior" in a way that other browsers don't, and your browser won't work on a lot of pages.

(There have been improvements, but HTML is still miles ahead. e.g. CSS 2 has no automatic table layout algorithm, and AFAICT the CSS 3 version is still "not yet ready for implementation".)

ramijames

0 replies

16h19m

2024-03-17 02:04:26 UTC

I think of the web like I think about Windows. Decades of backwards compatibility. Dubious choices that get dragged along because it is useful for people who can't or won't let go of stuff that works for them. It's a for better or for worse situation.

CJefferson

0 replies

13h51m

2024-03-17 04:32:28 UTC

Why would you want a web browser which can't open Facebook, X, or half of the other top websites?

And why would they bother to "fix" their websites when they work fine in Chrome, Edge and Firefox, but not in your very unpopular but super-strict browser?

csande17

0 replies

21h11m

2024-03-16 21:11:46 UTC

These days, a lot of the historic compatibility issues are either baked directly into the spec (eg https://dom.spec.whatwg.org/#concept-document-quirks) or hard-coded to only apply on specific websites (eg https://github.com/WebKit/WebKit/blob/main/Source/WebCore/pa...). Unless you work for a company that's too big to fail, you're unlikely to encounter the latter.

trashburger

3 replies

2024-03-16 17:27:09 UTC

it's hard to escape the vague feeling some of this stuff might need to be architected in from the get go.

When I'm developing something, work or otherwise, I find that I often write my worst code when I'm writing something bottom-up i.e. designed, because it usually turns out that the user of that particular code has completely different needs, and the point of integration becomes a point of refactor. I think the top-down approach applied at the project level is much nicer because it allows you to _start from somewhere_ and then iteratively improve things.

That is not to say you shouldn't take precautions. In Ladybird, stuff like image decoding and webpage rendering/JS execution are isolated to their own processes, with OpenBSD style pledge/unveil sandboxing. They aren't perfect of course, but it allows for the kind of development that Ladybird has without much worry about those aspects.

pvg

2 replies

2024-03-16 17:47:51 UTC

I'm not really suggesting Ladybird is doing something "wrong" or should do something else. Reading something like:

The fix is to make Document::window() return a nullable value, and then handle null in a bajillion places.

makes me think you're going to find something like this and do this kind of fix maybe once, twice, five times and then probably decide you need a more fundamental fix of some sort. Another way of thinking about it is 'What would, say, the Google Chrome team, wish they could do were they starting from scratch?' i.e. aiming for the state of the art, rather than trying to catch up to it later which may turn out to be overwhelming.

UncleEntity

1 replies

20h50m

2024-03-16 21:33:20 UTC

Even if they did 'something else' and produced a bullet-proof implementation they are still dealing with a buggy spec in the first place.

If someone thought their dev chops were 100% infallible why would they bother to fuzz the spec?

pvg

0 replies

19h50m

2024-03-16 22:33:04 UTC

I think you're misunderstanding my point, it's not about implementation or spec bugs but design. Forget Ladybird for a moment and think of Firefox. Its core design was something along the lines of 'x-platform toolkit for making enterprise groupware apps' where one of the apps was a web browser. Kind of neat for 1998, by 2008 it was clear that's no longer a good fit for making a browser. Despite heroic efforts and many advances, Firefox has never really been able to close the gap to more recent browsers. And (statistically) nobody makes new browsers based on Firefox, it's effectively a design dead end.

It can be hard to retrofit 'complicated but decent parser with a js runtime attached' to something like 'safe parser of arbitrarily adversarial inputs connected to an open RCE' (i.e. something akin to a modern browser) if the latter wasn't a fundamental design goal to start with.

zimbatm

1 replies

6h29m

2024-03-17 11:53:48 UTC

Who said the goal was to create a production browser?

This seems like a pure passion project: to return to the pleasure of building something just for the sake of it. Design and explore. Hack.

Not every endeavour has to become a product. As soon as you get users, you get obligations, and this tends to destroy these feelings.

tptacek

0 replies

1h31m

2024-03-17 16:52:17 UTC

Nobody said that. It's an interesting conversation, not an adjudication.

yafetn

8 replies

1d2h

2024-03-16 15:33:46 UTC

A little off topic: what happened to the hacking videos on YouTube? Used to look forward to them but I haven’t seen a new one in a while.

awesomekling

7 replies

1d2h

2024-03-16 15:57:53 UTC

To be perfectly honest, after uploading well over 1000 videos, I got a little tired of it. I still post monthly update videos, but it's been months since the last hacking video.

I'm still working on Ladybird every day, and I also manage two full time engineers now, thanks to the generous sponsorships we got from Shopify & others last year. :)

slekker

1 replies

2024-03-16 18:12:46 UTC

I absolutely loved the JIT series, but fair enough!

bjc

0 replies

23h47m

2024-03-16 18:36:25 UTC

Me too. I'm currently watching the emulator hacking playlist https://www.youtube.com/playlist?list=PLMOpZvQB55bfk92aBKZ8p...

yafetn

0 replies

1d1h

2024-03-16 16:37:27 UTC

Fair enough, and that totally makes sense. I guess I just miss the “Well, hello friends…” :)

tredre3

0 replies

20h34m

2024-03-16 21:48:57 UTC

Thank you for all the videos! I particularly enjoyed the porting and profiling/optimization videos and I still occasionally rewatch them to this day. :)

Your overall pragmatism and no nonsense C++ style is something more developers should aim to replicate imho.

pixard

0 replies

2h39m

2024-03-17 15:44:14 UTC

Add me as another vote that misses them. I totally understand you need a break and other obligations take more time, but I hope you can still find the time to do them occasionally. :)

dsshakey

0 replies

23h28m

2024-03-16 18:54:47 UTC

Glad to hear you're doing the right thing by yourself. I regularly go back and watch some of the mini series, or porting videos. I refer many graduate engineers to learn from your high display of clarity and pragmatism that you constantly display.

If the hacking comes back some day, I'll be delighted, but just wanted to say thanks for the fact that we have such a wonderful backlog thanks to your long term efforts.

LeFantome

0 replies

6h30m

2024-03-17 11:53:19 UTC

That is totally understandable.

That said, I think those videos are a significant contributor to the project success. I hope they do not go away completely.

In fact, I think the videos are as important a contribution as the project itself. I remember seeing a quote once from a musician that said he was inspired by both the Beatles and The Rolling Stones. The Beatles showed him what a band could be. The Rolling Stones made him feel like he could do it too. I see that in Linux and Serenity. Your videos make me feel like I could solve any problem by just starting it and breaking it down into smaller, more solvable chunks. They are inspiration and I am not surprised SerenityOS has attracted people to contribute other ambitious aspects. The PDF browser, the GPU stack, and the RISC ports are examples of amazing projects in their own right. I think one of the reasons we see such ambitious contributions in such a young project is the inspiration provided by your leadership and the example set in those videos.

Regardless, thank you for the contribution so far. With the recent improvements to HTMLInputElement, I was able to use Ladybird to leave a comment on the OSnews site recently and it gave me a huge thrill.

classichasclass

7 replies

1d3h

2024-03-16 15:21:32 UTC

And thus demonstrated is the value of lots of different implementations of a spec. Already one hole found in the spec in just this article, and I'm sure there will be/were more.

spencerchubb

2 replies

19h0m

2024-03-16 23:23:01 UTC

Why couldn't the fuzzer be used to discover the bug in the popular browsers?

summerlight

0 replies

14h51m

2024-03-17 03:31:47 UTC

https://github.com/google/clusterfuzz

At least Chromium has integrated multiple different fuzzers into their regular development workflow and found lots of bugs even before going public.

TomNomNom

0 replies

4h30m

2024-03-17 13:53:08 UTC

A bug in the spec doesn't necessarily mean there will be a noticeable bug in the browsers; e.g. a crash.

The browsers may have been written to "work" / not crash over adhering strictly to the spec.

awesomekling

2 replies

1d2h

2024-03-16 15:44:47 UTC

Yes indeed! We've found and reported lots of issues in the various HTML, CSS and JS specs.

Multiple independent implementations are crucial for the long-term health of the web platform, so we're trying to do our part! :)

de4np

0 replies

20h37m

2024-03-16 21:45:40 UTC

Awesome! Thank you for being the change you want to see. Inspiring to say the least, great work!

Avamander

0 replies

1d1h

2024-03-16 16:38:09 UTC

Multiple independent implementations are crucial for the long-term health of the web platform, so we're trying to do our part! :)

It's really great that you're doing this work. This principle also applies to many other specs. I've implemented a few and found multiple issues with real-world impact.

dataflow

0 replies

14h50m

2024-03-17 03:32:54 UTC

And thus demonstrated is the value of lots of different implementations of a spec. Already one hole found in the spec

That's a bit of a... non-sequitur. Imagine if you had tweeted "eggplants are my favorite vegetables", someone corrected you "actually they're fruits", and then you declared: "And thus demonstrated is the value of Twitter! Someone already made me a better-informed citizen in response to my tweet." This feels kind of similar.

This isn't to say what they're doing isn't valuable, or that there isn't value in having lots of implementations of a spec. Just saying that implication isn't there (yet) with this particular example.

tetris11

4 replies

1d3h

2024-03-16 14:44:32 UTC

They've implemented SVG? This project is coming along faster than I thought. I watch enraptured

awesomekling

3 replies

1d3h

2024-03-16 14:58:08 UTC

Yes, we have implemented a decent chunk of the SVG specification, although lots of things are still missing (animations is a big one) :)

jancsika

2 replies

1d2h

2024-03-16 16:02:38 UTC

I'm curious how you handle the things that are between SVG specs 1.1 and 2. Because AFAICT both Chrome and Firefox decided not to implement SVG 2. Yet both have grabbed a common selection of changes from SVG 2 and implemented them.

E.g., myRect.style.x = '50px' will work in both Chrome and Firefox, even though SVG 1.1 doesn't allow for this because "x" isn't a presentation attribute (and only presentation attributes are supposed to have corresponding CSS properties).

Relevant to animations-- the fact that Chrome and Firefox allow most (all?) SVG attributes as css props lets the user do a nice end run around SVG animations. They can just treat the SVG objects as if they were HTML and use the web animations API to animate them.

awesomekling

1 replies

1d2h

2024-03-16 16:10:48 UTC

We're working based on SVG 2 and basically ignoring SVG 1.1.

I was unsure about the best approach here, so I asked Nikolas Zimmermann (original author of SVG support in WebKit) and his advice was to do exactly this. :)

jancsika

0 replies

2024-03-16 17:44:10 UTC

That makes sense.

I was going to ask if you were prioritizing the SVG 2 features that are already implemented in Chrome and Firefox. But it appears the W3C has removed a lot of the new ones I remember from the spec (path data bearings, mesh gradients), and that both Chrome and Firefox have implemented a good amount of the existing spec like tabindex and friends.

(Ok, here's one-- "inline-size" and others for doing auto-wrapping text in SVG. Looks to be unimplemented anywhere.)

tflol

1 replies

1d2h

2024-03-16 15:24:30 UTC

"fuzzing ladybird" is such a delightfully barbaric combination of words

riwsky

0 replies

1d2h

2024-03-16 15:39:54 UTC

Like some vaguely un-PC insult from an alternate-reality Scotland

efitz

1 replies

16h59m

2024-03-17 01:24:19 UTC

For issue #3, it might also be a good idea to have a maxdepth mechanism in gradients that point to other gradients; this would be a defense in depth control vs some error or limitation in your “have I seen this reference before” logic. I’m not familiar with SVG gradients; maybe there is a reason to have reference chains of these 1000 links long, but I’d bet that if you ever encounter this in the wild then it’s an attack or a fuzzer.

efitz

0 replies

16h58m

2024-03-17 01:25:37 UTC

Btw in the anti malware space I saw this type of structure abuse all the time and I never saw a legitimate case more than 5 units deep.

holsta

0 replies

1d3h

2024-03-16 15:21:59 UTC

I am secretly hopeful Ladybird can take over the world some day. Don't tell anyone.

beefnugs

0 replies

20h20m

2024-03-16 22:03:13 UTC

Interesting thanks. What bothers me though is that almost all developers do exactly what you see in issue #1: We found it! fix committed done! Nope, you should understand exactly what went wrong: assuming parents must exist... Now search the entire codebase for the same kind of mistakes. Use your creative brain to figure out where else same thing can happen. It will never be in just done place. All modern software is unreliable bug ridden nightmare, mostly because of capitalism constraints yes... but it is possible to do better

aapoalas

0 replies

2024-03-16 18:14:06 UTC

Will Ladybird make an appearance in Web Engines Hackfest this year?