HN comments for: Difftastic, a structural diff tool that understands syntax

mlavrent

15 replies

4h2m

2024-03-21 14:23:14 UTC

I’m almost not sure why tools like git don’t ship with this as default. Been using difft for about a year now, and my main complaint is that it makes it hard to go back and use other diff tools when I don’t have difft available :).

I am curious if there’s been any work on _semantic_ diff tools as well (for when eg the syntax changes but the meaning is the same). It seems like an intractable problem in the general but maybe it’s doable and/or useful for smaller DSLs or subsets of some languages?

ruined

8 replies

3h45m

2024-03-21 14:40:29 UTC

I am curious if there’s been any work on _semantic_ diff tools as well (for when eg the syntax changes but the meaning is the same).

if you do this your difftool becomes a compiler

mlavrent

3 replies

3h29m

2024-03-21 14:56:25 UTC

Sorry, I should've been clearer. I'm interested if there's any tool that does this kind of thing statically, without running the code. I guess a simple approach is to compile both programs and see if the generated code is the same, but I'd guess reasoning at the generated-code level will probably produce a lot more false positives (i.e. tool will report a change when there isn't one) than if you reason about the original program.

jerf

1 replies

1h54m

2024-03-21 16:30:49 UTC

This gets really hard, really fast. That is, yes, reasonably obviously doing this completely 100% accurately requires a solution to the halting problem, but even getting to "useful" is really really hard. Even the Haskell world doesn't try to solve the "equivalence of functions" problem, and it's even more complicated in imperative languages.

You probably have a mental image of catching something really simple, and, yeah, "1 + 1" -> "2" is reasonably easy, but in reality there aren't a lot of those super easy changes. Most of the time there is something confounding the situation.

Truly neutral refactorings are pretty uncommon in their own right. You can see that when someone is discussing semantic versioning and pointing out that if you define a "major version" as "there exists at least one possible use of the code whose behavior will be changed as a result of this library change", almost any API change is automatically a major version change, which isn't really what anyone wants. E.g., in Python, the mere fact that introspecting on an object's methods will show one more method than it used to isn't really what we want a major version change for. In general, proving refactorings are actually 100% safe is equally difficult; even simple arithmetic changes can result in things overflowing at different times or in different ways, it's virtually impossible to rewrite an expression involving floats without the change being witnessable somehow, extracting a function could make it so that code that previously didn't overflow the stack now does, memory allocation changes can be the difference between OOMing and not and may interact with GC in unpredictable ways if you get really precise, etc.

kstrauser

0 replies

1h0m

2024-03-21 17:24:50 UTC

Here's a fun related article on Indistinguishability Obfuscation: https://cacm.acm.org/research/indistinguishability-obfuscati...

TL;DR verifying that 2 functions have the same output is really freaking hard.

brabel

0 replies

55m

2024-03-21 17:29:59 UTC

The Unison language (https://www.unison-lang.org/) knows how to compute whether the semantic meaning of the code has changed (though I don't think it's possible to get the actual diff to visualize it).

You can edit a function you've committed into the Unison code repo, and if you didn't change the semantics of the function, it's actually stored under the exact same hash... All places using the function refer to it by its hash, so nothing needs to be recompiled either, and no tests need to be rerun.

Things like renaming variables, reordering code whose order doesn't matter (common in functional programming) and things like that do NOT change the hash.

I believe this is only possible because Unison is a Pure Functional Language. If it's not, it becomes a NP problem to decide if two programs are exactly equivalent, probably.

I wonder if Unison could provide the actual semantic diff you're thinking of, it's probably not much more complex than actually knowing the meaning of the code did change. Maybe create a Feature Request :) https://github.com/unisonweb/unison

hobs

2 replies

3h42m

2024-03-21 14:43:31 UTC

That's exactly what I have done with diffing SQL in lazy mode - just use a server and diff the AST/plan.

slotrans

1 replies

3h28m

2024-03-21 14:57:23 UTC

Two semantically equivalent SQL statements can plan differently...

rrrrrrrrrrrryan

0 replies

2h32m

2024-03-21 15:53:32 UTC

The exact same SQL statement can plan differently if table statistics change.

Chris_Newton

0 replies

3h24m

2024-03-21 15:01:07 UTC

if you do this your difftool becomes a compiler

Some linters and formatters are effectively compilers already, so that doesn’t seem completely implausible in itself. Finding canonical representations of common coding patterns so you can quickly and reliably determine that they are equivalent is a different question, though.

rob74

2 replies

2h50m

2024-03-21 15:34:49 UTC

I am curious if there’s been any work on _semantic_ diff tools as well (for when eg the syntax changes but the meaning is the same)

So when using such a diff tool you can spend hours refactoring something, and then git will refuse to commit your changes because your refactoring was successful in not changing the behavior of the code? I understand what you mean, but if we arrive at that point maybe we should stop calling it "diff", to avoid confusion...

kstrauser

1 replies

2h34m

2024-03-21 15:50:52 UTC

Git doesn't use the output of `diff` to determine whether anything has changed.

samatman

0 replies

38m

2024-03-21 17:47:00 UTC

True, although not widely known it would seem.

It does use diff to generate patches, however. I know in today's GitHub-dominated landscape, that's considered a bit of a dusty feature, but it would be a pity to break it.

otherjason

1 replies

3h29m

2024-03-21 14:56:23 UTC

Difftastic is a useful tool, but in my experience, it's far too slow to be suitable as the default selection for a ubiquitous tool like git.

drcongo

0 replies

22m

2024-03-21 18:03:33 UTC

I'm finding it instantaneous here on a large dirty codebase. In what way is it slow for you?

kstrauser

0 replies

3h15m

2024-03-21 15:10:12 UTC

I think shipping good ol' diff as the default makes sense. It's going to be there already on any system you might want to run git on, it's fast, it's tiny, and everyone knows the basics of how to use it.

But I'm glad it's easy to change that default.

kstrauser

8 replies

3h8m

2024-03-21 15:17:05 UTC

For those who don't already know, this is built on tree-sitter (https://tree-sitter.github.io/tree-sitter/) which does for parsing what LSP does for analysis. That is, it provides a standard interface for turning code into an AST and then making that AST available to clients like editors and diff tools. Instead of a neat tool like this having to support dozens of languages, it can just support tree-sitter and automatically work with anything that tree-sitter supports. And if you're developing a new language, you can create a tree-sitter parser for it, and now every tool that speaks tree-sitter knows how to support your language.

Those 2 massive innovations are leading to an explosion of tooling improvements like this. Now every editor, diff tool, or whatever can support dozens or hundreds of languages without having to duplicate all the work of every other similar tool. That's freaking amazing.

bfrog

2 replies

1h42m

2024-03-21 16:42:57 UTC

While I agree tree-sitter is an amazing tool, writing the grammar out can be incredibly difficult I found. I tried writing out a grammar and highlighting query set for vhdl with tree-sitter, and found that there were a lot of difficulties in expressing vhdl grammar in tree-sitter.

kstrauser

1 replies

1h40m

2024-03-21 16:45:43 UTC

No argument from me on that. The upside is that one person, somewhere, has to get it right one time and then we can all use it.

grub5000

0 replies

45m

2024-03-21 17:39:47 UTC

Seems like something LLMs should be useful for, if not now then soon enough

ievans

1 replies

1h41m

2024-03-21 16:44:23 UTC

Absolutely agreed, and copying from a comment I wrote last year: I think the fact that tree-sitter is dependency-free is worth highlighting. For context, some of my teammates maintain the OCaml tree-sitter bindings and often contribute to grammars as part of our work on Semgrep (Semgrep uses tree-sitter for searching code and parsing queries that are code snippets themselves into AST matchers).

Often when writing a linter, you need to bring along the runtime of the language you're targeting. E.g., in python if you're writing a parser using the builtin `ast` module, you need to match the language version & features. So you can't parse Python 3 code with Pylint running on Python 2.7, for instance. This ends up being more obnoxious than you'd think at first, especially if you're targeting multiple languages.

Before tree-sitter, using a language's built-in AST tooling was often the best approach because it is guaranteed to keep up with the latest syntax. IMO the genius of tree-sitter is that it's made it way easier than with traditional grammars to keep the language parsers updated. Highly recommend Max Brunsfield's strange loop talk if you want to learn more about the design choices behind tree-sitter: https://www.youtube.com/watch?v=Jes3bD6P0To

And this has resulted in a bunch of new tools built off on tree-sitter, off the top of my head in addition to difftastic: neovim, Zed, Semgrep, and Github code search!

drcongo

0 replies

44m

2024-03-21 17:40:54 UTC

Don't forget Zed! https://zed.dev

epistasis

1 replies

1h33m

2024-03-21 16:52:04 UTC

I'm imagining what I could have done in my compilers class with something like tree-sitter...

It feels kind of as foundational as YACC.

ivanjermakov

0 replies

1h28m

2024-03-21 16:56:50 UTC

It is literally an alternative to YACC and other parser generators.

duped

0 replies

2024-03-21 18:23:00 UTC

I don't believe this is correct - there's no such thing as "speaking tree-sitter." Every tree-sitter parser emits a different concrete syntax tree, not a standard abstract syntax tree.

bloopernova

8 replies

4h7m

2024-03-21 14:18:21 UTC

Related, updating difftastic and friends if you installed via cargo:

  cargo install cargo-update
  cargo install-update --list
  cargo install-update --all

Other fun Rust projects available via cargo:

https://mise.jdx.dev/ mise-en-place, a drop-in replacement for asdf https://asdf-vm.com/ that is really fast and flexible.

https://github.com/ajeetdsouza/zoxide is a fantastic cd replacement, which stores where you cd to, and you can then do a partial match like "z hel" might take you to "~/projects/helloworld".

https://github.com/bootandy/dust is a compliment to "du", shows which directories are using the most disk space.

IshKebab

2 replies

3h34m

2024-03-21 14:51:01 UTC

ncdu is the best du replacement by far.

polygamous_bat

1 replies

1h46m

2024-03-21 16:39:27 UTC

I've always used dust as a replacement, and so I am curious to know if you have tried both tools: do you have thoughts on what makes ncdu better?

IshKebab

0 replies

1h15m

2024-03-21 17:09:55 UTC

Dust is probably the best you can get without interactivity, so it's good for logs.

But ncdu is a fully interactive file browser that lets you navigate through the tree, and crucially it lets you delete things without requiring a full rescan. It's amazing for freeing up disk space by deleting things you don't need anymore, which is probably 95% of the reasons I run `du`.

qmmmur

1 replies

1h51m

2024-03-21 16:33:56 UTC

Wow, I installed mise-en-place. It's exactly what I wanted asdf to be.

bloopernova

0 replies

29m

2024-03-21 17:56:29 UTC

It's so much faster than asdf, the dev did a really great job.

kstrauser

1 replies

3h36m

2024-03-21 14:49:46 UTC

I love zoxide! Also for your list: lsd, a prettier ls.

bloopernova

0 replies

2h51m

2024-03-21 15:34:33 UTC

so... many... colours!

Looks great, thank you for the recommendation.

arlort

0 replies

2024-03-21 18:20:46 UTC

Another three very neat ones are

- https://github.com/eza-community/eza (ls with some added visual sugar)

- https://github.com/ClementTsang/bottom (htop but with graphs)

- https://github.com/sharkdp/bat (cat with syntax highlight)

hrdwdmrbl

2 replies

3h45m

2024-03-21 14:39:57 UTC

It seems like a major lapse in product innovation that Github has not come out with something like this. They don't even have something to help you when the indentation changes, they usually just show it as a giant add & remove. Their diff viewer can and should be smarter.

sroussey

1 replies

3h1m

2024-03-21 15:24:43 UTC

GitHub has the option to ignore whitespace in a diff.

mbork_pl

0 replies

1h8m

2024-03-21 17:17:30 UTC

Which is useful, but too crude.

sanxchit

1 replies

1h48m

2024-03-21 16:36:52 UTC

What an amazing tool, wish it had a GUI version as well.

layer8

0 replies

32m

2024-03-21 17:52:51 UTC

From the screenshot examples in the readme, I’m not sure how substantial the benefits are over GUI tools like Kdiff3 or WinMerge that have existed for ages.

sanity

1 replies

4h5m

2024-03-21 14:20:34 UTC

Interesting, I found Semantic Merge [1] years ago but it was never open source.

This just does diff but not merge, but at least it's open source - and the diffs look a lot nicer, I've already made it my default.

Any plans to extend it to merging?

[1] https://docs.plasticscm.com/semanticmerge

rideontime

0 replies

2h28m

2024-03-21 15:57:08 UTC

Was going to suggest this myself, this was a godsend when I was working with a big team on a C# project going through a messy refactor.

adamtaylor_13

1 replies

3h8m

2024-03-21 15:17:07 UTC

Does anyone know how to enable this for .html.erb files? I found it doesn't work properly in Ruby .erb files which makes it fallback to just regular ol diff behavior.

coldbrewed

0 replies

2h34m

2024-03-21 15:51:39 UTC

That may require a tree-sitter implementation for erb templated html; it may exist but if so it's less of a mainstream thing.

Some quick googling turns up https://github.com/tree-sitter/tree-sitter-embedded-template which may or may not meet your needs.

adamc

1 replies

2h50m

2024-03-21 15:35:11 UTC

Doesn't seem to have a Debian install.

pas

0 replies

2h47m

2024-03-21 15:37:59 UTC

https://github.com/Wilfred/difftastic/issues/560 help wanted :)

Night_Thastus

1 replies

2h41m

2024-03-21 15:44:12 UTC

No MSYS install, sadly. :(

quasarj

0 replies

22m

2024-03-21 18:03:31 UTC

It's just a cargo package. Is there a working rust/cargo toolchain under MSYS?

zokier

0 replies

3h21m

2024-03-21 15:03:49 UTC

There is also gumtree that does ast based diffing https://github.com/GumTreeDiff/gumtree

xyzelement

0 replies

1h3m

2024-03-21 17:21:47 UTC

I don't write enough code / write it professionally anymore to integrate it into my life BUT MAN this is a great idea.

In general, we're overflowing in TMI which makes it hard to suss out what matters. For example at work I often read docs that describe what we do for customer X vs customer Y and it takes a ton of work to suss out the 1% of text that is different between those two, which is really what you want to understand and validate.

So anything that makes just the impactful change stand out is beyond welcome.

pmayrgundter

0 replies

3h58m

2024-03-21 14:27:07 UTC

"Do you know how to read @@ -5,6 +5,7 @@ syntax? Difftastic shows the actual line numbers from your files, both before and after."

Preach!

Just dropped it in and did a git diff.. works like a charm!

pjturpeau

0 replies

2h14m

2024-03-21 16:11:18 UTC

It seems to be a great tool, however on the few checks I did on big XML files, it shows modified lines in normal green and modified attributes in bold green, which makes them difficult to detect visualy.

I didn't find in the documentation how it is possible to change the style of the diff, or to ask for another color in the bold case.

Any idea?

nibab

0 replies

2h6m

2024-03-21 16:19:07 UTC

This is great! I wish my PR review tools allowed me to plug in something like this. Hopefully one day we will go back to the world of customizable/plugin-based software. Most of my web tools are very prescriptive about the user experience and dont let me tailor my tools.

mnw21cam

0 replies

2h39m

2024-03-21 15:46:09 UTC

No package for Debian-like systems yet.

mihaigalos

0 replies

2h59m

2024-03-21 15:26:43 UTC

Nice tool. Also relevant: https://github.com/dandavison/delta

markrages

0 replies

29m

2024-03-21 17:56:36 UTC

Does the output work with patch(1)? Or does this use a different patch?

keybored

0 replies

1h28m

2024-03-21 16:57:13 UTC

I think I use this indirectly through the git-delta pager which is a great pager replacement for git.

jmholla

0 replies

57m

2024-03-21 17:28:25 UTC

I tried switching to this, but I found it noisy and use weird formatting for things that didn't change. I went back to using icdiff[0].

[0]: https://github.com/jeffkaufman/icdiff

drcongo

0 replies

26m

2024-03-21 17:58:50 UTC

I love this so much. I hate reading cli diffs, but this is instantly understandable.

blackfawn

0 replies

2h10m

2024-03-21 16:15:38 UTC

Difftastic seems really nice! Unfortunately it shows some changed binary files which makes it sort of unusable. `file` reports these files as "ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, stripped" and the MIME type/encoding is "application/x-sharedlib; charset=binary" so not sure why difftastic is trying to show them as thousands of changed lines of text...

aus10d

0 replies

2h25m

2024-03-21 15:59:52 UTC

Really cool idea!

asicsp

0 replies

3h23m

2024-03-21 15:02:43 UTC

Previous discussions:

https://news.ycombinator.com/item?id=27768861 (297 points | 3 years ago | 61 comments)

https://news.ycombinator.com/item?id=32746258 (698 points | 2 years ago | 90 comments)

https://news.ycombinator.com/item?id=30841244 (983 points | 2 years ago | 219 comments)

akkartik

0 replies

1h14m

2024-03-21 17:11:14 UTC

Is there a way to make the output more familiar to diff users? I've turned on --inline. I also mostly don't care enough about line numbers to want them on every line, so prefer the '<' and '>' leaders.

Also, on Arch there doesn't seem to be a man page.

airstrike

0 replies

2h16m

2024-03-21 16:09:25 UTC

Fantastic tool. Now we just need the vscode extension ;-)

abledon

0 replies

1h30m

2024-03-21 16:55:46 UTC

onnly found out about this because it was an option to view diffs when installing git using Nix