LSP is pretty ok. Better than the before times I suppose. Although C++ has long had good IDE support so it hasn't affected me too much.
I have a maybe wrong and bad opinion that LSP is actually at the wrong level. Right now every language needs to implement a from scratch implementation of their LSP server. These implementations are HUGE and take YEARS to develop. rust-analyzer is over 365,000 lines of code. And every language has their own massive, independent implementation.
When it comes to debugging all native language support common debug symbol formats. PDB for Windows and DWARF for Nixy things. Any compiled language that uses LLVM gets debug symbols and rich debugging "for free".
I think there should be a common Intellisense Database file format for providing LSP or LSP-like capabilities. Ok sure there will still be per-language work to be done to implement the IDB format. But you'd get like 95% of the implementation for free for any LLVM language. And generating a common IDB format should be a lot simpler than implementing a kajillion LSP protocols.
My dream world has a support file that contains: full debug symbols, full source code, and full intellisense data. It should be trivial to debug old binaries with full debugging, source, and intellisense. This world could exist and is within reach!
That has nothing to do with LSP.
Rust Analyzer is similar in scope to a second implementation of the Rust compiler.
I know. That’s really bad!
I disagree. A compiler for batch building programs and a compiler for providing as much semantic information about incomplete/incorrect/constantly changing programs are completely different tasks that require completely different architectures and design considerations.
I don’t think that’s true at all.
First of all, a compiler for a 100% correct program definitely has all the necessary information for robust intellisense. They don’t currently save all the data, but it should exist.
So the only real question is whether they can support the 0.01% of files that incomplete and changing?
I’ll readily admit I am not a compiler expert. So I’m open to being wrong. But I certainly don’t see why not. Compilers already need to support incorrect code so they can print helpful error messages. Including different errors spread through out a single file.
It may be that current compilers are badly architected for incremental intellisense generation. But I don’t think that’s an intrinsic difference. I see no reason that the tasks require “completely different architectures”.
It doesn't. Intellisense is supposed to work on 100% incorrect and incomplete programs. To the point that it should work in syntactically invalid code.
Correct. I literally discussed this scenario in my comment!
If the program compiles successfully then the compiler has all the information it needs for intellisense. If the program does NOT fully compile then the compiler may or may not be able to emit sufficient intellisense information. I assert that compilers should be able to support this common scenario. It is not particularly different from needing to support good, clear error messages in the face of syntactically invalid code.
Not necessarily. These are two very different tasks quite at odds with each other
Are they? I feel like intellisense is largely a subset of what a compiler already has to do.
I’d say the key features of an LSP are knowing the exact type of all symbols, goto definition, and auto-complete. The compiler has all of that information.
Compilers produce debug symbols which include some of the information you need for intellisense. I wrote a PDB-based LSP server that can goto definition on any function call for any language. Worked surprisingly well.
If you wanted to argue that intellisense is a subset of compiling and it can be done faster and more efficiently I could buy that argument. But if you’re going to declare the tasks are at odds with one another I’d love to hear specific details!
On the efficiency angle, I think a big difficulty here that isn’t often discussed is that many optimization strategies relevant to incremental compilation slow down batch compilation, and vice versa.
For example, arena allocation strategies (i.e internment of identifiers and strings, as well as for allocating AST nodes, etc) is a very effective optimization in batch compilers, as the arenas can live until the end of execution and therefore don’t need “hands on” memory management.
However, this doesn’t work in an incremental environment, as you would quickly fill up the arenas with intermediary data and never be deleting anything from them. This is one reason rust-analyzer reimplements such a vast amount of the rust compiler, which makes heavy use of arenas throughout.
As essentially every programming language developer writes their batch compiler first without worrying about incremental compilation, they can wind up stuck in a situation where there’s simply no way to reuse their existing compiler code for an IDE efficiently. This effect tends to scale with how clever/well-optimized the batch compiler implementation is.
I think the future definitely lies in compilers written to be “incremental first,” but this requires a major shift in mindset, as well as accepting significantly worse performance for batch compilation. It also further complicates the already very complicated task of writing compilers, especially for first-time language designers.
That's a great point about allocation/memory management. As an example, rust-analyzer needs to free memory, but rustc's `free` is simply `std::process::exit`.
If I remember correctly, the new trait solver's interner is a trait (https://doc.rust-lang.org/nightly/nightly-rustc/rustc_trait_...) that should allow rust-analyzer's implementation of it to free memory over time and not OOM people's machines.
I'm in strong agreement with you, but I will say: I've really grown to love query-based approaches to compiler-shaped problems. Makes some really tricky cache/state issues go away.
I thought that rust's compiler was indeed written to be incremental first. Check a sibling comment of mine for reasons why I thought so.
They are distinct! Well, not just intellisense, but pretty much everything. I'll paraphrase this blog post, but the best way to think about think about the difference between a traditional compiler and an IDE is that compilers are top-down (e.g, you start compiling a program from a compilation unit's entrypoint, a `lib.rs` or `main.rs` in Rust), but IDEs are cursor-centric—they're trying to compile/analyze the minimal amount of code necessary to understand the program. After all, the best way to go fast is to avoid unnecessary work!
Beyond the philosophical/architectural difference I mentioned above, compilers typically have a one-way mapping between syntax and mapping, but to support things like refactors or assists, you often need to do the opposite: go from semantics to syntax. For instance, if you want to refactor from struct to an enum, you often need to find all instances of said struct, make the semantic change, then construct the new syntax tree from the semantics. For simple transformations like a struct to an enum, a purely syntax-based based approach might work (albeit, at the cost of accuracy if you have two structs with same name), but you start to run into issues when you consider traits, interfaces (for example: think about how a type implements an interface in Go!), or generics.
It doesn't really make sense for a compiler to support above use cases, but they're are _foundational_ to an IDE. However, if a compiler is query-centric (as rustc is), then it's pretty feasible for rustc and rust-analyzer to share, for instance, the trait solver or the borrow checker (we're planning/scoping work on the former right now).
Nonsense. Given that the end user of both is a human, you want the compiler that builds program to know as much about semantics to aid in fixing buggy/incomplete/incorrect programs.
Other comments have addresses many always of your comment. The constantly changing part is also an important feature for recompilation being more efficient than recompiling from scratch each time. You can read about it here: https://rustc-dev-guide.rust-lang.org/queries/query-evaluati...
There is recording of a talk on YouTube from Niko Matsakis that goes into the motivation.
In conclusion, you don't really want to optimize for the batch use case, even outside of IDE support.
No. Actually "interactive" frontends in batch compilation mode generally have better error messages in this mode too. Yes, it may make the batch compilation (the frontend part) sligthly slower, but won't turn Go into Rust (or Haskell or C++).
And there always is the possibility to stop in batch mode when the first error occured.
I would blame Rust though. For example, Rust has macros which are way too powerful and make it very hard to write a LSP (https://rust-analyzer.github.io/blog/2021/11/21/ides-and-mac...)
Very interesting is how Roslyn/Typescript does it: https://www.youtube.com/watch?v=qnyOHY7AiZk
Rust-analýzer is an example of what not to do, which is reimplementing a compiler frontend. Ideálly it should be the samé as the "real" compiler is using. Of course this has it's own problems, which the Haskell LSP this post is about, shows. As compilers not written for being used "interactively".
That doesn't hold for C++ and much less for any language even "less C" than C++. Like languages using a GC, e.g. Roc https://www.roc-lang.org/
What do you mean? Why not? Clang PDBs for C++ work great. A GC isn’t particular disruptive to debug symbols afaik.
You need support for each language in the debugger, as symbols do not contain semantics. As users we for example know that `foo::bar` and `foo::baz` are methods of the same class `foo`, the debugger doesn't.
The problem with GCs is that pointers must contain some additional information (like an additional bit for mark and sweep), they are not "just" pointing to some memory. Without "knowing" that, the debugger cannot follow the pointer to its target. Or tricks with unboxed ints like making them 1 bit smaller and using the first 1 as a tag for "this is not a pointer, but an integer".
1. I use zero languages that use either PDB or DWARF that are not named "C".
2. You are either overestimating the level of detail available in PDB/DWARF or underestimating the massive amount of language-specific work needed for even basic features (e.g. methods, which lack any cross-language ABI) given just what PDB/DWARF give you.
3. What LSP provides and what PDB/DWARF offer are only very loosely related. Consider the case of writing function1, then (without compiling) writing function2 that calls function1. It is typical for an LSP to offer completion and argument information when writing out the call for function1. That's not something you get "for free" with PDB/DWARF.
Uhhh. I didn’t say PDB/DWARF already have the necessary information. In fact I even proposed a new file format! I suggest you re-read what I said.
What do you think LSP servers do in the background? They’re effectively compilers that are CONSTANTLY compiling the code.
Amusingly rust-analyzer takes longer to bootstrap than a full and complete clean and build. Maybe it’s not as parallel, I’m not sure.
I've responded on reddit before (https://www.reddit.com/r/rust/comments/1eqqwa7/comment/lhwwn...), but I'll restate and cover some other things here.
rust-analyzer a big codebase, but it's also less problematic than the raw numbers would make you think. rust-analyzer has a bunch of advanced functionality (term search https://github.com/rust-lang/rust-analyzer/pull/16092 and refactors), assists (nearly 20% of rust-analyzer!) and tests.
I think you might be describing formats like (https://code.visualstudio.com/blogs/2019/02/19/lsif) and SCIP (https://github.com/sourcegraph/scip). I personally like SCIP a bit more LSIF because SCIP's design makes it substantially easier to incrementally update a large index. We use SCIP with Glean (https://glean.software/) at work; it's pretty nice.
I wouldn't say 95%. SCIP/LSIF can do the job for navigation, but that's only a subset of what you want from an IDE. For example: - Intellisense/autocomplete is extremely latency sensitive where milliseconds count. If you have features like Rust/Haskell's traits/typeclasses that allow writing blanket implementations like `impl<T> SomeTrait for T`, it's often faster to try to solve that trait bound on-the-fly than storing/persisting that data. - It'd be nice to handle features like refactors/assists/lightbulbs. That's going to result in a bunch of de novo code needs to exist outside of a standard compiler, not counting all the supporting infrastructure.
Rust tried something similar in 2017 with the Rust Language Server (RLS, https://github.com/rust-lang/rls). It worked, but most people found it too slow because it was invoking a batch compiler on every keystroke.
That sounds similar to LSIF https://microsoft.github.io/language-server-protocol/specifi...
"any LLVM language" is a lot but also not that much. You're missing Python, JS, Go, Ruby, etc.