Can I use this to translate one language to another?
"a Rust CLI" - "C 96.5% Rust 1.4%"
Not that I'm complaining ;)
Is it a new way to market a C tool?
The C is all generated parsers, all our actual coding is done in Rust.
this audience will also enjoy(?) the tree-sitter discussion going on here: https://news.ycombinator.com/item?id=39762495
there are very similar complaints about "yikes that thing just vomits a massive amount of loosey goosey C" so I hope you've not experienced the crashes they're talking about
The generator is written in Rust, so all emitted C code is safe and blazingly fast (rocket emoji)
I think you missed an /s
This looks great, thanks for building and sharing it.
Interested folks may also want to check out ast-grep:
ast-grep is great too, we've learned a lot from it. If you like yaml, you should definitely try ast-grep.
The main area GritQL shines is in taking simpler transformation patterns and composing them into complex migrations. For example, the OpenAI migration was built by incrementally handling the different edge cases we've seen in the wild: https://github.com/getgrit/stdlib/blob/main/.grit/patterns/p...
Consider linking this and other complex examples in your readme, with before/after blocks to demonstrate the power of your software.
I've spent the last 6 months building a static analyzer that mostly uses ASTs and couldn't understand what you were offering after skimming the readme until I saw this link.
Now I'm thinking of how I could integrate something like this into my analyzer to simplify the implementation of some of the simpler rules.
Thanks for the feedback, we're going to link to more complex examples.
Let me know if I can be helpful with integrating GritQL.
thanks for mentioning ast-grep!
apologies if this should be a discussion/issue/whatever but:
Do you envision going up against CodeQL and/or <https://www.jetbrains.com/help/qodana/about-qodana.html> by making semantic information available to the ast nodes? OT1H, I can imagine it could be an overwhelming increase in project scope, but OTOH it could also truly lead to some stunning transformation patterns
e.g. https://github.com/github/codeql/blob/v1.27.0/java/ql/exampl... or even more "textual" semantics such as
var foo = "hello".substring(1); // knowing "foo" is a String
Since you already have GitLab on your docs page, if I could pay for CodeQL over on GL that would be a game changer IMHOOne of the specific design goals we had was to make sure that the query engine isn't tightly coupled to AST nodes.
We started with the AST, since that's ultimately the foundation for code but the goal is to add more graphs on top. So you could do a query like this to find all cases where a function called `unsafe` is queried with an unknown value:
`unsafe($val)` where $val <: type `unknown`
Is your interest in CodeQL primarily for security scans?
security is up there, but from reading the examples in CodeQL it just seemed like it would be possible to express some truly great versions of "don't do that" rules in it. I am a total JetBrains fanboi, and their introspections are world-class, but getting Qodana to run to completion before the heat death of the universe has proven to require more glucose than I have to offer it. Thus, I'm always interested in alternate implementations, even though I am acutely aware of the computational complexity of what I'm asking
I recalled another link I wish I had included in my question from the SourceGraph folks https://github.com/sourcegraph/scip#scip-code-intelligence-p... which started out life as "Language Server Indexing Protocol" and seems to solve some similar project-wide introspection questions but TBH since their rug pull I've been a lot less willing to hitch my wagon to their train
Makes sense, we internally use GritQL as a super linter for some codebase-specific patterns. It's pretty easy to set up: https://docs.grit.io/guides/ci
We've looked at both LSIF and SCIP from SourceGraph, though I expect we'll end up building our own index to allow for more complex queries. We're also incorporating some LLM functionality to express conditions you can't program deterministically.
If you're interested in being a design partner for some of our auto-review features feel free to email morgante@grit.io.
Congrats from ast-grep team! Nice job Morgante!
GritQL shares a lot of common ground with ast-grep, which is also a code rewriter based on tree-sitter.
Interested folks may wonder about their difference. The key difference is that their surface APIs are quite different. GritQL is more like a DSL (or SQL). while ast-grep is more like pattern language + embedded configuration in YAML. Check it out here if you are interested https://github.com/ast-grep/ast-grep
This is great. Thanks for the link.
I'm just starting to learn about tree-sitter but my understanding is that it can also handle nested syntax, for example code in documentation html pages. Does sg support this a well?
What I would like to do is use sg to update my project's code samples. Is that possible? What part of the docs should I look at for this?
Hi! This is Grit's release news so please allow me to only briefly answer your question. > nested syntax sg does not support it natively now and you need API to extract relevant text out.
For update code, I recommend first read the quick start, pattern and rule essentials
https://ast-grep.github.io/guide/quick-start.html
Are either of these exposed conveniently to build on? One scenario I dealt with recently was replacing a pattern that required rudimentary type resolution to apply properly.
Specifically,
While($conn.hasrows()){
…
$conn.commit()
}
Where $conn needed to be of a specific connection class to be a match.I ended up taking a Java parser, adding simple type resolution myself, and then the rest of the find-replace logic from there. I wouldn’t expect a tree-sitter based tool to deal with types, but if there’s a hook to add metadata to the parse tree and replace utilizing it, then this might have saved me some trouble
Very happy Grit customer here. Wish I'd had this instead of jscodeshift et al back when I was doing oodles of migrations at Stripe, but glad I have it now, and very excited to see them go open-source!
Thanks Alex! Your feedback has been super helpful.
Your suggestion to add the `--interactive` flag was hugely helpful, it's become very commonly used.
Delighted to hear that!
Normally I wouldn't suggest this, but have you considered integrating with AIs?
One tool I've been looking for (and I wonder if this is such a tool?) is something that can do things like rename variables in functions to "more sensible names". Now, it's not hard to get ChatGPT to read a function and suggest new names for variables, but if you ask it to also do the substitutions, there is an annoyingly high chance it break the function in the process.
I suspect that ChatGPT would be quite challenged trying to write code in any language that wasn't part of its training data. So a DSL that was just released would be a challenge.
Worth trying out though. Perhaps rule creation can be turned into function calls so it wouldn't need to understand new syntax.
Stay tuned, we're working on some deep AI integrations.
This looks great! Thank you for building it. I'm looking forward to learning to use it.
I'm just starting to learn about tree-sitter but my understanding is that it can also handle nested syntax, for example code in documentation html pages. Does grit support this a well?
What I would like to do is use grit to update my project's code samples with new releases or syntax changes. Is that possible? What part of the docs should I look at for this?
Yes, Grit does support embedded languages. Right now we support Vue syntax: https://github.com/getgrit/gritql/blob/4ff989e8d665cc9ffa2cb...
We don't have support for HTML yet, but if you open an issue we're happy to explore it.
++ congratulations to the team!
We've tackled some pretty gnarly migrations with GritQL and it has saved us a lot of time.
We have used ts-morph in the past for similar types of migrations but found the barrier to entry felt high, which prevented adoption. We were impressed by how complicated of migrations we could use GritQL for—we have been using it to migrate from mobx to react hooks.
Thanks Paul! It's been great getting your feedback as we've evolved Grit over time.
This is a great example of an open-core business model that I hope others will follow. Make something useful for everyone, and sell additional features in a hosted version (grit.io). I actually didn't even know there was a hosted version until I clicked the outgoing links from the documentation.
I've experienced how much time these migrations take (and how error prone they are), especially once you have double-digit team sizes, so I definitely see the value proposition.
Thank you! We definitely want to give useful value to everyone.
To expand more on our business model, we see GritQL as a powerful but also relatively low-level tool. It can speed up your migration, but writing the query and generating/testing the PRs still takes considerable time.
If you'd prefer to have migrations handled entirely for you, you can pay us to tackle that work. In additional to providing the migration code, we generate pull requests and hook into your CI pipeline to fix downstream test failures. [0]
Sounds like a cool tool.
The ability to write rewrites as regular code fragments with metavariables feels like some ideas I've had in the past. However, it seems to me that your tool lacks the ability to do something like (in C/C++/etc. terms) "replace every use of S::method($arg1) with S::method2(arg1, nullptr, $arg1)", where you have to do some amount of name resolution to know that x->method is a call to S::method, which seems like it is somewhat less useful for several kinds of rewrites I'm interested in.
(One of the main things I keep playing around with in my head is converting a C codebase into C++ by converting its ad-hoc vtable implementation into regular C++ virtual methods.)
We don't have semantic information yet (it's on the roadmap), but in many cases it's sufficient to get that information from the AST by looking where the variable was originally defined.
For example: https://app.grit.io/studio?key=TteyWLNGZGWFIC9EOiCRi
`$x.old($arg)` => `$x.new("method", _, $arg)` where { $program <: contains `$x = new Thing($_)` }
---
Long term we want to incorporate more semantic analysis, but for quick rewrites this has been pretty effective for us.
very powerful approach. While I'm still grepping for stuff, and for big refactorings and renames I sometime write perl onelineners, I'll give your tool a try in a brief moment before we all do refactors instructing some LLM.
grep truly is the king of tools
I'd love to see one of these for migrating from Claude's prompt API to their messaging API
Congrats to the launch!
Managing tech debt is and will be an important issue in the world of AI software. I'm glad there's a company solving this. Additionally, this is something the current generation of LLMs are already helpful at solving. GritQL is very unique way of approaching this problem.
OT: We've been working with Grit and Morgante for the past months and I have only positive things to say about them. If GritQL looks useful to you, you should reach out.
The first tool like this I saw was [comby](https://comby.dev/). I remember it being a bit annoying to use for small-to-medium refactors. Next time I'll try `grit` and see how it does.
this definitly helped a tonne to migrate from openai <1.0 to 1.0
wild how this one also uses tree-sitter.
This really and truly looks fantastic.
Looks great, thanks for sharing.
In my words, build a tool that can trace the source of a variable's data, for every data variables in the source code. Lay them down in a timeline. Easy if it is single threaded language. Flatten the branches.
Applications
1. minification. 2. cross programming language code generation. 3. Source tracing complements control flow tracing.
Bringing in AST based solution will complicate the problem. What do you think ?
This looks really interesting, especially the GritQL side of things, I remember playing around with doing something simular on a much smaller and tailored use-case so I have an idea how much effort is involved here, but seems like a really great idea to introduce a query language to move things smoother. Great work, can't wait to play around.
In theory, you could write a full transpiler in GritQL.
In practice, it's not really the ideal tool for translating between languages.
It does work well for cases where the differences are minor (ex. converting from JavaScript to TypeScript, or between different SQL dialects).
I have the same question. I'm particularly interested in the SQL dialects case.
Does the rewrite code also have to be syntactically correct or is that a simple text replacement because that's what I see as the main impediment; if I'm using the sqlite grammar to parse and produce postgresql output, the output won't conform to the grammar.
Also how do I dynamically load custom tree-sitter grammars like in ast-grep?
No, we don't require the output code to be syntactically correct. You can use the raw`code` modified to indicate a raw text replacement: https://docs.grit.io/language/patterns#raw-output
We don't currently support dynamically loading grammars, mainly because our metavariable approach requires some modifications to the grammar.
Thank you. The raw"..." directive looks perfect for what I'm looking for.
@morgante Are you aware of any tools that do codegen from tree-sitter ASTs?
A simple application would be to produce a code formatter like Black or ruff for python but which could support any tree-sitter supported language.
More specifically to my use case though, if I produced tree-sitter grammars with identical node structure and node names, e.g. for sqlite and postgresql, then wouldn't this make it simple to do full transpilation between the two?
You should check out https://github.com/tweag/topiary
Yes, theoretically if you had ~identical grammars you could use it to do a full transpilation. There's a lot of challenges with that though. Writing a correct grammar for 1 language is complicated enough, but writing one for two where all your nodes and fields end up the same is likely insurmountable.
In practice, languages are either:
- Far enough apart that any pure AST transformation is insufficient and you need an AI component to produce usable output
- Close enough that you're better off just targeting the specific parts that differ and rewriting those while leaving the rest alone. I think GritQL can do well here.
Awesome, thanks. Big fan of the Tweag folks and Nix and Nickel so will definitely check it out.
But Grit should be able to handle translating a PHP dialect like Hacklang to a modern version of PHP?
In your opinion, what tools would be ideal for general purpose transpilation between languages?
Yeah, that should be doable (though GritQL doesn't have PHP support yet).
Writing a transpiler is quite complicated and requires a lot of development to get right. I don't know of any high-quality generalized transpilers—you're best looking for tools specific to the language pair you're targeting.