return to table of content

Greppability is an underrated code metric

db48x
152 replies
14h33m

Rust and Javascript and Lisp all get extra points because they put a keyword in front of every function definition. Searching for “fn doTheThing” or “defun do-the-thing” ensures that you find the actual definition. Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in. Some C coding conventions have you split the definition into two lines, first the return type on a line followed by a second line that starts with the function name. It looks ugly, but at least you can search for “^doTheThing” to find just the definition(s).

CGamesPlay
70 replies
14h13m

Not JavaScript. Cool kids never write “function” any more, it’s all arrow functions. You can search for const, which will typically work, but not always (could be a let, var, or multi-const intializer).

lispisok
26 replies
14h11m

Am I the only one who hates arrow functions?

crabmusket
8 replies
11h45m

I don't like using them everywhere, but they're very handy for inline anonymous functions.

But it really pains me when I see

export const foo = () => {}

instead of

export function foo() {}

jappgar
2 replies
7h20m

These aren't equivalent as function foo will be hoisted but const foo will not be.

cxr
0 replies
6h51m

Sure the food that this restaurant serves is pricey, but you have to remember that it also tastes terrible.

crabmusket
0 replies
4h16m

Yep, and that usually doesn't matter at the top level.

berkes
1 replies
11h6m

I wish javascript had a built-in or at least (defacto) default linter. Like go-fmt or rust fmt. Or clippy even.

One that could enforce these styles. Because not only is the export const foo = () {}

painful on itself, it will quite certainly get intermixed with the

function foo() {}

and then in the next library a

const foo = function() {}

and so on. I'd rather have a consistently irritating style, than this willy-nilly yolo style that the JS community seems to embrace.

NohatCoder
1 replies
6h5m

But do they make much of a difference? You have always been able to write:

    myArray.sort(function(a,b){return a-b})
People for some reason treat this syntactic sugar like it gives them some new fundamental ability.

marcosdumay
0 replies
3h48m

Oh Javascript would be much better if it could only be syntactic sugar...

`function(a,b){return a-b;}` is different from `(a,b) => a - b`

And `function diff(a,b) {return a-b;}` is different from `const diff(a,b) => a - b;`.

creesch
0 replies
11h40m

Thank you, that's something I also never have understood myself. For inline anonymous functions like callbacks they make perfect sense. As long as you don't need `this`.

But everywhere else they reduce readability of the code with no tangible benefit I am aware of.

spartanatreyu
7 replies
13h48m

I did, until I used them enough where I saw where they were useful.

The bad examples of arrow functions I saw initially were of:

1. Devs trying to mix them in with OOP code as a bandaid over OOP headahes (e.g. bind/this) instead of just not using OOP in the first place.

2. Devs trying to stick functional programming everywhere because they had seen a trivial example where a `.map()` made more semantic sense than a for/for-in/for-of loop. Despite the fact that for/for-in/for-of loops were easier to read for anything non-trivial and also had better performance because you had access to the `break`, `continue` and `return` keywords.

mewpmewp2
4 replies
10h59m

Another benefit of using for instead of array fns is that it is easy to add await keyword should the fn become async.

But many teams will have it as a rule to always use array fns.

jappgar
3 replies
7h22m

That gives you have the option of making it serially async but not parallel, which can be achieved easily using Promise.all in either scenario.

adregan
2 replies
5h22m

As an aside: It’s way less ergonomic, but you likely want `Promise.allSettled` rather than `Promise.all` as the first promise that throws aborts the rest.

wruza
1 replies
3h57m

It doesn’t really abort the rest, it just prioritizes the selection of a first catch-path as a current continuation. The rest is still thenable, and there’s no “abort promise” operation in general. There are abort signals, but it’s up to an async process to accept a signal and decide when/whether to check it.

adregan
0 replies
1h12m

Admittedly, I was being a bit hand-wavy and describing a bit more of how it feels rather than the way it is (I'm perpetually annoyed that promises can't be cancelled), but I was thinking of the code I've seen many times across many code bases:

    let results;
    try {
      results = await Promise.all(vals.map(someAsyncOp))
    } catch (err) {
      console.error(err)
    }
While you could pull that promises mapping into a variable and keep it thenable, 99% of the time I see the above instead. Promises have some rough edges because they are stateful, so I think it might be easier to recommend swapping that Promise.all for an Promise.allSettled, and using a shared utility for parsing the promise result.

I consider this issue akin to the relationship between `sort`, `reverse`, `splice`, the mutating operation APIs, and their non mutating counterparts `toSorted`, `toReversed`, `toSpliced`. Promise.all is kind of the mutating version of allSettled.

throwaway2037
1 replies
8h59m

    > also had better performance because you had access to the `break`, `continue` and `return` keywords.
This is a great point.

One more: Debugging `.map()` is also much harder than a for loop.

medstrom
0 replies
7h36m

I feel there are a few ways to invoke .map() in a readable way and many ways that make the code flow needlessly indirect.

Should be a judgment call, and the author needs to be used to doing both looping and mapping constructs, so that they are unafraid of the bit of extra typing needed for the loop.

turboponyy
4 replies
11h55m

I like them because it reinforces the idea that functions are just values like any other - having a separate keyword feels like it is inconsistent.

mewpmewp2
2 replies
10h57m

Why do you want to reinforce that idea?

To me arrow functions mostly just decrease readability and makes them blend in too much, when it should be important distinction what is a function and what is not.

turboponyy
0 replies
9h35m

Not to be dismissive, but because I like it - it just sits right with me.

benrutter
0 replies
16m

I'm not a javascript programmer, but I really like the arrow pattern from a distance exactly because it enforces that idea.

My experience is that newcomers are often thrown off and confused by higher order functions. I think partly because, well let's be honest they just are more confusing than normal functions, but I think it's also because languages often bind functions differently from everything else.

`const cool = () => 5`

Makes it obvious and transparent, that `cool' is just a variable where as:

`function cool() {return 5}`

looks very different from other variable bindings.

0xfffafaCrash
0 replies
11h23m

Moreover the binding and lexical scope aspects supported by classic functions are amongst the worst aspects of the language.

Arrow functions are also far more concise and ergonomic when working with higher order functions or simple expressions

The main thing to be wary of with arrow functions is when they are used anonymously inline without it being clear what the function is doing at a glance. That and Error stack traces but the latter is exacerbated by there being no actual standard regarding Error.prototype.stack

nsonha
0 replies
11h16m

why the need to pronounce arbitrary preferences, who cares?

nosianu
0 replies
11h59m

A simple heuristic I use is to use arrow functions for inline function arguments, and named "function" functions for all others.

One reason is exactly what the subject of discussion is here, it's easier to string-search with that keyword in front of the name, but I don't need that for trivial inline functions (whenever I do I make it an actual function that I declare normally and not inline).

Then there's the different handling of "this", depending on how you write your code this may be an important reason to use an arrow function in some places.

ndnxncjdj
0 replies
13h12m

I very much prefer the way scoping is handled in arrow functions.

ajuc
0 replies
7h15m

I'm of the opinion that giving a global name to an anonymous function should result in a compilation error.

supriyo-biswas
19 replies
14h10m

You can still search for `<keyword> = \(.*\) => `, albeit it's a bit cumbersome.

troupo
16 replies
12h23m

All you need is a tool that actually understands the language.

It's 2024 and HN still suggests using regular expressions to search through a code base.

lukan
9 replies
10h46m

Regex is a universal tool.

Your special tool might not work on plattform X, fails for edge case - and you generally don't know how it works. With regex or simple string search - I am in control. And can understand why results show up, or investigate when they don't, but should.

troupo
8 replies
10h15m

Your special tool might not work on plattform X

As always, people come out with the weirdest of excuses to not use actual tools in the 99.9999% of the cases when they are available, and work.

When that tools doesn't work, or isn't sufficient, use another one like fuzzy text search or regexps.

and you generally don't know how it works.

Do you know how your stove works? Or do you truly understand what the device you're typing this comment on truly works?

Only in programming I see people deliberately avoid useful tools because <some fringe edge case that comes up once in a millenium in their daily work>

lukan
3 replies
9h20m

When you specialize in one thing only, do what you want.

But I prefer tools, that I can use wherever I go. To not be dependant and chained to that environment.

"Do you know how your stove works? Or do you truly understand what the device you're typing this comment on truly works?"

Also yes, I do.

" people deliberately avoid useful tools because <some fringe edge case that comes up once in a millenium in their daily work>"

Well, or I did already changed tools often enough, to be fed up with it and rather invest in tech that does not loose its value in the next iteration of the innovation cycle.

troupo
2 replies
7h31m

When you specialize in one thing only, do what you want.

I specialize in one thing only: programming

But I prefer tools, that I can use wherever I go.

Do you always walk everywhere, or do you use a tool available at the time, like cars, planes, bycicles, public transport?

rather invest in tech that does not loose its value in the next iteration of the innovation cycle.

Things like "fund symbol", "find usages", "find implementation" have been available in actual tools for close to two decades now.

lukan
1 replies
6h46m

I did not say I do not use what is avaiable, but this debate is about in general having your code in a shape that simply searching for strings work.

troupo
0 replies
3h8m

Simply searching for strings rarely works well as the codebase grows larger. Because besides knowing where all things named X are, you want to actually see where X is used, or where it's called from, or where it is defined.

With search you end up grepping the code twice:

- first grepping for the name

We're literally in a thread where people invent regexes for how to search the same thing (a function) defined in two different ways (as a function or as a const)

- secondly, manually grepping through search results deducing if it's relevant to what you're looking for

It becomes significantly worse if you want to include third-party libs in your search.

There are countless times when I would just Cmd+B/Cmd+Click a symbol in IDEA and continue my exploration down to Java's own libraries. There are next to zero cases when IDEA would fail to recognise a function and find its usages if it was defined as a const, not as a function. Why would I willingly deny myself these tools as so many in this thread do?

wruza
2 replies
3h15m

It’s you who sees it as excuses. If I have a screwdriver multitool, I don’t need another one which is for d10 only. It simply creates unnecessary clutter in a toolbox. The difference between definition and mention search for a function is:

  gr<bs><bs>ion name<cr>
  vs
  grname<cr>
or for the current identifier, simply

  gr<m-w><cr>
I could even make my own useful tools like “\[fvm]gr” for function, variable or field search and brag about it watching miserable ide guys from the high balcony, but ain’t that unnecessary as well.

troupo
1 replies
3h12m

It simply creates unnecessary clutter in a toolbox.

And then you proceed to... invent several pale imitations of a symbol/usages search.

More here: https://news.ycombinator.com/item?id=41435862 so as not to repeat myself

wruza
0 replies
2h54m

Doesn’t really apply, ignores things just said.

kragen
0 replies
7h28m

if you think anything works in 99.9999% of cases, you’ve never programmed a computer

wruza
1 replies
3h33m

Its current year and IDEs still can’t remember how I just transformed the snippet of code and suggest to transform the rest of the files in the same way. All they can do in “refactor” menu is only “rename” and then some extract/etc nonsense which no one uses irl.

By using regexps I have an experience that opens many doors, and the fact that they aren’t automatic could make me sad, if only these doors weren’t completely shut without that experience.

troupo
0 replies
3h0m

No one is stopping you from using regexps in IDEs.

And you somehow manage to undersell the rename functionality in an IDE. And I've used move/extract functionality multiple times.

I do however agree that applicable transformations (like upgrading to new syntaxes, or ways of doing stuff as languages evolve) could be applied wholesale to large chunks of code.

throwaway2037
1 replies
8h56m

Not to move the goal posts too much, but when I am searching a huge Python or Java code base from IntelliJ, I use a mixture of symbol and text search. One good thing about text search, you get hits from comments.

troupo
0 replies
7h30m

Yup. I do, too.

I'm mostly ranting against this weird "we will never use great tools because full-text search" obsession

renox
1 replies
7h47m

The thing is: In large codebase the tool may become slow or crash, in a new language you may not have such tool.. Grep is far more robust!

troupo
0 replies
3h6m

When tools don't work or unsuitable, you use different tools.

And yet people are obsessed with never using useful tools in the first place because they can invent scenarios when this tool doesn't work. Even if these scenarios might never actually come up in their daily work.

post-it
1 replies
13h51m

All you need is `<keyword> =`

Really, all you need is `<keyword>` and if the first result is a call to that function, just jump to its definition.

spartanatreyu
0 replies
13h45m

Exactly.

Just search the definition.

Any time that a function doesn't have a definition, it's never the target of a search anyway.

pjerem
17 replies
11h18m

Yes but that’s an anti pattern. Arrow functions aren’t there to look cool, they’re how you define lambdas / anonymous functions.

Other than that, functions should be defined by the keyword.

wiseowise
16 replies
11h5m

How is that an anti-pattern?

Other than that, functions should be defined by the keyword.

Says who?

hansworst
12 replies
10h4m

Anonymous functions don't have names. This makes it much harder to do things like profiling (just try to find that one specific arrow function in your performance profile flame graph) and tracing. Tools like Sentry that automatically log stack traces when errors occur become much less useful if every function is anonymous.

medstrom
10 replies
8h25m

    const foo = () => {}
This function is not anonymous, it's called foo.

croes
4 replies
7h19m

But to call foo in bar you must define foo before bar.

function foo(){} is also callable if bar is defined before foo.

sestep
3 replies
6h0m

Not true at the top-level.

wruza
2 replies
4h6m

Not sure what you find not true about it. All named “function”s get hoisted just like “var”s, I use post-definitions of utility functions all the time in file scopes, function scopes, after return statements, everywhere. You’re probably thinking about

  const foo = function (){}
without its own name before (). These behave like expressions and cannot be hoisted.

MetaWhirledPeas
1 replies
3h37m

I use post-definitions of utility functions all the time in file scopes, function scopes, after return statements, everywhere

I haven't figured out if people consider this a best practice, but I love doing it. To me the list of called functions is a high-level explanation of the code, and listing all the definitions first just buries the high-level logic "below the fold". Immediately diving into function contents outside of their broader context is confusing to me.

wruza
0 replies
3h3m

I don’t monitor “best” practices, so beware. But in languages like C and Pascal I also had a habit of simply declaring all interfaces at the top and then grouping implementations reasonably. It also created a nice “index” of what’s in the file.

Hoisting also enables cross-imports without helper unit extraction headaches. Many hate js/ts at the “kids hate == and null” level but in reality these languages have a very practical design that wins so many rounds irl.

mrighele
2 replies
6h1m

Interesting, it seems that the javascript runtime is smart enough detect this pattern and actually create a named function (I tried Chrome and Node.js)

    const foo = () => {}
    console.log( foo.name );
actually outputs 'foo', and not the empty string that I was expecting.

   const test = () => ( () => {} );
   const foo = test();
   console.log( foo.name );
outputs the empty string.

Is this behavior required by the standard ?

Izkata
0 replies
1h28m

You're probably remembering how it used to work. This is the example I remember from way back that we shouldn't use because (aside from being unnecessary and weird) this function wouldn't have a name in stack traces:

  var foo = function() {};
Except nowadays it too does have the name "foo".

mapcars
0 replies
5h1m

Not really, its an anonymous function stored in a variable foo

BlarfMcFlarf
0 replies
7h23m

Does the function know it’s called foo for tracing/error logging/etc?

mostlylikeable
0 replies
6h5m

To me, arrow functions behave more like I would expect functions to behave. They don’t include all the magic bindings that the function keyword imparts. Feels more “pure” to me. Anonymous functions can be either function () {} or () => {}

lukan
1 replies
10h50m

All the wise ones. Well, except for you maybe.

Serious arguments would be:

- readability

- greppability

lukan
0 replies
1h23m

(It wasn't an insult, but a joke on the username)

tylerhou
0 replies
10h37m

As of a few years ago (not sure about now) the backtrace frame info for anonymous functions were far worse than ones defined via the function keyword with a name.

spartanatreyu
2 replies
14h3m

Yes JavaScript.

You can search for both: "function" and "=>" to find all function expressions and arrow function expressions.

All named functions are easily searchable.

All anonymous functions are throw away functions that are only called in one place so you don't need to search for them in the first place.

As soon as an anonymous function becomes important enough to receive a label (i.e. assigning it to a variable, being assigned to a parameter, converting to function expression), it has also become searchable by that label too.

CGamesPlay
1 replies
13h0m

The => is after the param spec, so you’re searching for foo.*=> or something more complex, but then still missing multiline signatures. This is very easy to get caught by in TypeScript, and also happens when dealing with higher-order functions (quite common in React).

spartanatreyu
0 replies
12h9m

Why are you searching for foo.=>

Are you searching through every function, or functions that have a very specific parameter?

And whatever you picked, why?

---------------------------------------------------------------

- If you're searching for every function, then there's no need to search for foo.

=>, you only need to search for function and =>.

- If you're searching for a specific parameter, then just search for the parameter. Searching for functions is redundant.

---------------------------------------------------------------

Arrow function expressions and function expressions can both be named or anonymous.

Introducing arrow functions didn't suddenly make JavaScript unsearchable.

JavaScript supported anonymous functions before arrow function expressions were introduced.

Anonymous functions can only ever be:

- run on the spot

- thrown away

- or passed around after they've been given a label

Which means, whenever you actually want to search for something, it's going to be labelled.

So search for the label.

albedoa
0 replies
10h49m

I want to talk to the developer who considers greppability when deciding whether to use the "function" keyword but requires his definitions to be greppable by distancing them from their call locations. I just have a few questions for him.

Pxtl
0 replies
2h58m

You can still look for `(funcname)\s*=` can't you? I mean it's not like functions get re-declared a lot.

koito17
30 replies
13h34m

Golang has a similar property as a side-effect of the following design decision.

  ... the language has been designed to be easy to analyze and can be parsed without a symbol table
Taken from https://go.dev/doc/faq

The "top-level declarations" in source files are exactly: package, import, const, var, type, func. Nothing else. If you're searching for a function, it's always going to start with "func", even if it's an anonymous function. Searching for methods implemented by a struct similarly only needs one to know the "func" keyword and the name of the struct.

Coming from a background of mostly Clojure, Common Lisp, and TypeScript, the "greppability" of Go code is by far the best I have seen.

Of course, in any language, Go included, it's always better to rely on static analysis tools (like the IDE or LSP server) to find references, definitions, etc. But when searching code of some open source library, I always resort to ripgrep rather than setting up a development environment, unless I found something that I want to patch (which in case I set up the devlopment environment and rely on LSP instead of grep to discover definitions and references).

eptcyka
18 replies
11h9m

Golang gets zero points from me because function receivers are declared between func and the name of the function. God ai hate this design choice and boy am I glad I can use golsp.

medstrom
11 replies
8h28m

Is it just hard to get used to, or does it fundamentally make something more difficult?

kragen
5 replies
7h32m

this thread is about using `grep` to find things, and this subthread is specifically about how the `func` keyword in golang makes it easy to distinguish the definition of a function from its uses, so yes, because `grep 'func lart('` will not find definitions of `lart` as a method. you might end up with something like `grep 'func .*) *lart('` which is both imprecise and enough noise that you will not want to type it; you'll have to can it in a script, with the associated losses of flexibility and transparency

medstrom
4 replies
7h10m

That's fair, I see many examples in this thread where people pass an exact string directly to grep, as you do. I'm an avid grepper, but my grep tool [1] translates spaces to ".*?", so I would just type "func lart(" in that example and it would work.

An incremental grep tool with just this one transformation rule gets you a lot more mileage out of grep.

[1] https://github.com/minad/consult/blob/screenshots/consult-li...

EDIT: Better demo https://jumpshare.com/s/zMENBSr2LwwauJVjo1wS

kragen
3 replies
6h59m

that's going to find all the functions that take an argument named lart or of a lart type too, but it also sounds like a thing i really want to try

vitus
2 replies
6h42m

Also, anything that contains "func" and "lart" as a substring, e.g. foobar(function), blart(baz).

It's not far off from my manually-constructed patterns when I want to make sure I find a function definition (and am willing to tolerate some false positives), but I personally prefer fine-grained control over when it's in use.

medstrom
1 replies
6h27m

Mmh, I type "func\ lart(" when I need the literal string. But it's less often, so it's fair that it's slightly more to type.

kragen
0 replies
6h19m

yeah!

eptcyka
3 replies
7h2m

I have to always add wildcards between func and the function name, because I can never know how the other developer has decided to specify the name of the receiver. This will always be a problem as far as grepping with primitive tools that don't parse the language.

medstrom
2 replies
6h14m

FYI, many people use thin wrappers like this, it's still a primitive tool that doesn't parse the language, but it can handle that problem: https://jumpshare.com/s/zMENBSr2LwwauJVjo1wS (GIF)

eptcyka
1 replies
5h59m

On machines where I control the tooling, this is not an issue. But I can’t take my config to my colleagues machine.

lanstin
0 replies
3h42m

If only AFS had succeeded. What would a modern version of this look like?

ljm
0 replies
8h4m

Can’t say I’ve ever had an issue with it, but it does get a bit wild when you have a function signature that takes a function and returns one, unless you clear it up with some types.

  func (s *Recv) foo(fn func(x any) err) func bar(y any) (*Recv, err)
As an exaggerated example. Easy to parse but not always easy to read at a glance.

kazinator
3 replies
3h30m

Receivers are utterly idiotic. Like how could anyone with two working brain cells sign off on something like that?

If you don't want OOP in the language, but want people to be able to write thing.function(arg), you just make function(thing, arg) and thing.function(arg) equivalent syntax.

Pxtl
2 replies
3h1m

C# did this for extension methods and it Just Works. You just add the "this" keyword to a function in a pure-static class and you get method-like calling on the first param of that function.

kazinator
1 replies
2h54m

If the function has to be modified in any way in order to grant permission to be used that way, then it is not quite "did this".

Equivalent means that there is no difference at the AST level between o.f(a) and f(o, a), like there is no difference in C among (a + i), a[i], i[a] and (i + a).

However, a this keyword is way better than making the programmers fraction off a parameter and move it to the other side of the function name.

sethammons
0 replies
6h44m

I search ") myFunc" to find member functions. It would be nice to search "c myFunc", but a parentheses works

executesorder66
0 replies
4h15m

How many God AI's have expressed their hate for this design? /s

madeofpalk
8 replies
9h50m

The culture of single letter variables in golang, at least in the codebases I've seen, undoes this.

lelanthran
5 replies
9h5m

The culture of single letter variables in golang, at least in the codebases I've seen, undoes this.

The convention, not just in Go, is that the smaller the scope, the smaller the variable reference.

So, sure, you're going to see single-letter variables in short functions, inside short block scopes, etc, but that is true of almost any language.

I haven't seen single-letter variables in Go that are in a scope that isn't short.

Of course, this could just mean that I haven't seen enough of other peoples Go source.

kazinator
1 replies
3h9m

E.g. food and art are very important in Japan, so stomach is i and a drawing/painting is e.

BrandoElFollito
0 replies
1h10m

Food is very very important in France so we call it nourriture :)

marcosdumay
0 replies
3h55m

that is true of almost any language

You'd be surprised how often language-local cultures break that rule on either side. And a few times it's even an improvement.

lanstin
0 replies
3h45m

Zipf's law, right - these rules are a formalization of our brain's functionality with language.

Of course, with enough code, someone does everything.

iudqnolq
0 replies
8h3m

I like using l for logger and db for database client/pool/handle even if there's a wider scope. And if the bulk of a file is interacting with a single client I might call that c.

alienchow
0 replies
7h47m

Single letter variables in Golang are to be used in small, local contexts. Akin to the throwaway i var in for loops. You only grep the struct methods, the same way no one greps 'this' or 'self'.

The code bases you've been reading, and even some of the native libraries, don't do it properly. Probably due to legacy reasons that wouldn't pass readability approvals nowadays.

VonGallifrey
0 replies
9h30m

The way I have seen this is that single letter variables are mostly used when declaration and (all) usages are very close together.

If I see a loop with i or k, v then I can be fairly confident that those are an Index or a Key Value pair. Also I probably don't need to grep them since everything interacting with these variables is probably already on my screen.

Everything that has a wider scope or which would be unclear with a single letter is named with a more descriptive name.

Of course this is highly dependent on the people you work with, but this is the way it works on projects I have worked on.

vitus
1 replies
6h50m

I'm not so sure about greppability in the context of Go. At least at Google (where Go originates, and whose style guide presumably has strong influence on other organizations' use of the language), we discourage "stuttering":

A piece of Go source code should avoid unnecessary repetition. One common source of this is repetitive names, which often include unnecessary words or repeat their context or type. Code itself can also be unnecessarily repetitive if the same or a similar code segment appears multiple times in close proximity.

https://google.github.io/styleguide/go/decisions#repetitive-...

(see also https://google.github.io/styleguide/go/best-practices#avoid-...)

This is the style rule that motivates the sibling comment about method names being split between method and receiver, for what it's worth.

I don't think this use case has received much attention internally, since it's fairly rare at Google to use grep directly to navigate code. As you suggest, it's much more common to either use your IDE with LSP integration, or Code Search (which you can get a sense of via Chromium's public repository, e.g. https://source.chromium.org/search?q=v8&sq=&ss=chromium%2Fch...).

klodolph
0 replies
30m

The thing about stuttering is that the first part of the name is fixed anyway, MOST of the time.

If you want to search for `url.Parse`, you can find most of the usages just by searching for `url.Parse`, because the package will generally be imported as `url` (and you won’t import Parse into your namespace).

It’s not as good as find references via LSP but it is like 99% accurate and works with just grep.

zarzavat
9 replies
11h39m

C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {. In order to get a complete list of symbols you need to preprocess it, this requires knowing what the compiler flags are. Nobody really knows what their compiler flags are because they are hidden between multiple levels of indirection and a variety of build systems.

skissane
6 replies
8h8m

C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {.

That’s not really C; that’s a C-based DSL. The same problem exists with Lisp, except even worse, since its preprocessor is much more powerful, and hence encourages DSL-creation much more than C does. But in fact, it can happen with any language - even if a language lacks any built-in processor or macro facility, you can always build a custom one, or use a general purpose macro processor such as M4.

If you are creating a DSL, you need to create custom tooling to go along with it - ideal scenario, your tools are so customisable that supporting a DSL is more about configuration than coding something from scratch.

kragen
4 replies
7h30m

the issue is that the c preprocessor is always available and usually used

skissane
3 replies
6h54m

Other languages have preprocessors or macro facilities too.

C's is very weak. Languages with more powerful preprocessors/macros than C's include many Lisp dialects, Rust, and PL/I. If you think everyone using a weak preprocessor is bad, wait until you see what people will do when you give them a powerful one.

Microfocus COBOL has an API for writing custom COBOL preprocessors in COBOL (the Integrated Preprocessor Interface). (Or some other language, if you insist.) I bet there are some bizarre abominations hidden in the bowels of various enterprises based on that ("our business doesn't just run on COBOL, it runs on our own custom dialect of COBOL!")

kragen
2 replies
6h31m

c's macro system is weak on purpose, based on, i suspect, bad experiences with m6 and m4. i think they thought it was easier to debug things like ratfor, tmg, lex, and (much later) protoc, which generate code in a more imperative paradigm for which their existing debugging approaches worked

i can't say i think they were wholly wrong; paging through compiler error messages is not my favorite part of c++ templates. but i have a certain amount of affection for what used to be called gasp, the gas macro system, which i've programmed for example to compute jump offsets for compiling a custom bytecode. and i think m4 is really a pathological case; most hairy macro systems aren't even 10% as bad as m4, due to a combination of several tempting but wrong design decisions. lots of trauma resulted

so when they got a do-over they eliminated the preprocessor entirely in golang, and compensated with reflection, which makes debugging easier rather than harder

probably old hat to you, but i just learned last month how to use x-macros in the c preprocessor to automatically generate serialization and deserialization code for record types (speaking of cobol): http://canonical.org/~kragen/sw/dev3/binmsg_cpp.c (aha, i see you're linking to a page that documents it)

skissane
1 replies
6h26m

C's is weak yet not weak – you can do various advanced things (like conditional expansion or iteration), but using esoteric voodoo with extreme performance cost. Whereas other preprocessors let you do that using builtins which are fast and easy to grok.

See for example https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,...

Poor C preprocessor performance has a negative real world impact, for example recently with the Linux kernel – https://lwn.net/Articles/983965/ – a more powerful preprocessor would enable people to do those things they are doing anyway much more cheaply

lanstin
0 replies
1h49m

I've always suspected the powerful macro facilities in Lisp are why it's never been very common - the ability to do proper macros means all the very smart programmers create code that has to be read like a maths paper. It's too bespoke to the problem domain and too tempting to make it short rather than understandable.

I like Rust (tho I have not yet programmed in it) but I think if people get too into macro generated code, there is a risk there to its uptake.

It's hard for smart programmers to really believe this, but the old "if you write your code as cleverly as possible, you will not be able to debug it" is a useful warning.

kazinator
0 replies
3h0m

If your Lisp macro starts with a symbol whose name begins with def, and the next symbol is a name, then good old Exuberant Ctags will index it, and you get jump to definition.

Not so with DEFINE_FUNCTION(foo) {, I think.

  $ cat > foo.lisp
  (define-musical-scale g)
  $ ctags foo.lisp
  $ grep scale tags
  g       foo.lisp        /^(define-musical-scale g)$/;"  f
Exuberant Ctags is not even a tool from the Lisp culture. I suspect it is mostly shunned by Lisp programmers. Except maybe for the Emacs one, which is different. (Same ctags command name, completely different software and tag file format.)

db48x
1 replies
7h25m

Yes, the usefulness of macros always has to be balanced against their cost. I know of only one codebase that does this particular thing though, Emacs. It is used to define Lisp functions that are implemented in C.

shadowgovt
0 replies
3h55m

It's a common pattern for just about any binding of C-implementation to a higher-level language. Python has a similar pattern, and I once had to re-invent it from scratch (not knowing any of this) for a game engine.

jampekka
4 replies
8h24m

Rust though does lose some of those points by more or less forcing[1] snake_case. It's really annoying to navigate bindings which are converted from camelCase.

I don't care which case is used. It's a trivial superficial thing, and tribal zealotry about such doesn't reflect well on the language and community.

[1] The warnings can be turned off, but in some cases it requires ugly hacks, and the community seems to be actively hostile to making it easier)

kibwen
3 replies
5h35m

The Rust community is no more zealous about naming conventions than any other language which has naming conventions. Perhaps you're arguing against the concept of naming conventions in general, but that's not a Rust thing, every language of the past 20 years suggests naming conventions if for no other reason than every language provides a standard library which needs to follow some sort of naming conventions itself. Turning off the warnings emitted by the Rust compiler takes two lines of code, either at the root of the crate or in the crate manifest.

jampekka
2 replies
5h12m

I've yet to encounter another compiler that warns about naming conventions, by default at least. So at least it's most enforced zealotry I've encountered.

Yes, it can be turned off. But for e.g. bindgen generated code it was not trivial to find out.

kibwen
1 replies
2h58m

The Rust compiler doesn't produce warnings out of zealotry, but rather as a consequence of pre-1.0 historical decisions. Note that Rust doesn't use any syntax in pattern matching contexts to distinguish between bindings and enum variants. In pre-1.0 versions of Rust, this created footguns where an author might think they were matching on an enum, but the compiler was actually parsing it as a catch-all binding that would cause any following match arms to never be executed. This was exacerbated by the prevailing naming conventions of the time (which you can see in this 2012 blog post: https://pcwalton.github.io/_posts/2012-06-03-maximally-minim... (note the lower-cased enum variants)). So at some point the naming conventions were changed in an attempt to prevent this footgun, and the lint was implemented to nudge people over to the new conventions. However, as time went on the footgun was otherwise fixed by instead causing the compiler to prioritize parsing enum variants rather than bindings, in conjunction with other errors and warnings about non-exhaustive patterns and dead code (which are all desirable in their own right). At this point it's mostly just vestigial, and I highly doubt that anybody really cares about it beyond "our users are accustomed to this warning-by-default, so they might be surprised if we stopped doing this".

jampekka
0 replies
1h5m

Ah, thanks for the info! I do think this default does have some ramifications, especially in that binding casings are typically changed due to it even for "non-native" wrappers which I find materially makes things more difficult.

sva_
3 replies
9h36m

People don't use LSP?

gregjor
1 replies
9h19m

That’s right, not everyone uses an LSP. Nothing wrong with LSPs, very useful tools. I use ripgrep, or plain grep if I have to, far more often than an LSP.

Working with legacy code — the scenario the author describes — I often can’t install anything on the server.

menaerus
0 replies
3h48m

LSP doesn't always work without issues with large C and C++ codebases which is why one needs to fallback to grep techniques.

semiinfinitely
2 replies
13h54m

python also!

jsjohnst
1 replies
6h47m

Python is the only one mentioned that “actually works” without endless exceptions to the rule in the normal case. The ones mentioned (Rust/Javascript/Lisp/Go) all have specific syntax that is commonly enough used which makes it harder to search. Possible, absolutely, but still harder.

zbentley
0 replies
5h17m

I'd say Python works well at greppability because community conventions generally discourage concealing certain kinds of definitions (e.g. function definitions are usually "def whatever").

However, that's just convention. Lots of modules do metaprogramming tricks that obscure greppability, which can be a pain. This is particularly acute when searching for code that is "import-time polymorphic"--that is, code which picks one of several implementations for a piece of functionality at import time at the module scope. That frequently ends up with some hanky-panky a la "exported_function_name = _implementation1 if platform_supported else _implementation2" at the module scope.

While sometimes annoying, that type of thing is usually done for understandable reasons (picking an optimized/platform-supported implementation of an interface--think select or selectors in the stdlib, or any pypi implementation of filesystem monitoring using fsnotify/fanotify/kqueue/fsevents/ReadDirectoryChangesW). Additionally, good type annotations help with greppability, though they can't fully mitigate this issue.

Much less defensible in Python is code that abuses locals/globals to indirect symbol access, or code that abuses star imports to provide interfaces/implementation switching.

Those, fortunately, are rare, but the elephant in the "no greppability ever" room is not: getattr bullshit in OO code is so often utterly obscure, unnecessary and terrible. And it's distressingly common on PyPi. At first I thought this was Ruby's encouragement of method_missing in the bad old days bleeding into the Python community, but the number of programmers for whom getattr magic is catnip seems to be disproportionate to the number of folks with Ruby experience, and, more concerningly, seems to me to be growing over time.

dan-robertson
2 replies
12h7m

Not sure this is very true for Common Lisp. Classic example are accessor functions where the generic function is created by whichever class is defined first and the method where the class is defined. Other macros will construct new symbols for function names (or take them from the macro arguments).

f1shy
0 replies
12h2m

Still you can extend the concept without a lot of work, couldn't you?

db48x
0 replies
7h29m

That’s true, but I regard it as fairly minor. Accessor functions don't have any logic in them, so in practice you don’t have to grep for them. But it can be confusing for new players, since they don't know ahead of time which ones are accessors and which are not.

wpollock
1 replies
2h24m

In the bygone days of ctags, C function definitions included a space before opening parenthesis, while function calls never had that space. I have a hard time remembering that modern coding styles never have that space and my IDE complains about it. (AFAIK, the modern gtags doesn't rely on that space to determine definitions.) Even without *tags, the convention made it easy to grep for definitions.

mzs
0 replies
1h37m

space after builtin was recommended instead:

  if (x == 0) { ...
  sizeof (buf);
  return (-1);
  exit(0);

hgomersall
1 replies
11h44m

Though glob imports in rust can hide a source, so those should be avoided.

andersa
1 replies
3h54m

Do people really use text search for this rather than an IDE that parses all of the code and knows exactly where each declaration is, able to instantly jump to them from a key press on any usage...? Wild.

iamwil
0 replies
3h53m

Yes. Not everyone uses or likes an IDE. Also, when you lean on an IDE for navigation, there is a tendency to write more complicated code, since it feels easy to navigate, you don't feel the pain.

wruza
0 replies
7h30m

Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in

That’s why in my personal projects I follow classic “type\nname” and grep with “^name\>”.

looks ugly

Single line definitions with long, irregular type names and unaligned function names look ugly. Col 1 names are not only greppable but skimmable. I can speedscroll through code and still see where I am.

veltas
0 replies
13h3m

For most functions ^\S.*name( will find declarations and definitions.

Most of us use exuberant ctags to allow jumping to definitions.

throwawayffffas
0 replies
8h41m

Meanwhile C lacks any such keyword

It's a hassle. But not the end of the world.

I usually search for "doTheThing\(.+?\) \{" first.

If I don't get a hit, or too many hits I move to "doTheThing\([^\)]*?\) \{" and so on.

suprjami
0 replies
6h6m

Meanwhile C lacks any such keyword, so the best you can do is...

...use source code tagging or LSP.

skywal_l
0 replies
12h52m

Yet you reply to an article that defines functions as variables, which I've seen a lot of developers do usually for no good reason at all.

To me, that's a much common and worse practice with regards to greppability than splitting identifiers using string which I haven't seen much in the wild.

mav3ri3k
0 replies
9h4m

Although in rust, function like macros make it super hard to trace code. I like them when I am writing the code and hate then when I have to read others macros.

marcosdumay
0 replies
3h58m

Those also make your language easier to parse, and to read.

Many people insist that IDEs make the entire point moot, but that's the kind of thing that make IDEs easier to write and debug, so I disagree.

leogout
0 replies
48m

Javascript is a bit trickier i think nowadays with the fat arrow notation : const myFunc = () => console. log("can't find me :p");

kazinator
0 replies
12h14m

C has "classical" tooling like Cscope and Exuberant Ctags. The stuff works very well, except on the odd weird code that does idiotic things that should not be done with preprocessing.

Even for Lisp, you don't want to be grepping, or at least not all the time for basic things.

For TXR Lisp, I provide a program that will scan code and build (or add to) your tags file (either a Vim or Emacs compatible one).

Given

  (defstruct point ()
    x
    y)
it will let your editor jump to the definition of point, x and y.

johannes1234321
0 replies
3h57m

One thing which works for C is to search something like `[a-z] foo\(.+\) \{` assuming that spacing matches the coding style, often the shorter form `[a-z] foo\(` works well, which tries to ensure there is a type definition and bin assignment or something before name. Then there is only a handful false positives.

gregjor
0 replies
9h28m

ctags.

fsckboy
0 replies
3h40m

C, starting with K&R, has all declarations and definitions on lines at the left margin, and little else. this is easy to grep for.

eddieh
0 replies
14h10m

I used to define functions as `funcname (arglist)`

And always call the function as `funcname(args)`

So definitions have a space between the name and arg parentheses, while calls do not. Seemed to work well, even in languages with extraneous keywords before definitions since space + paren is shorter than most keywords.

Now days I don’t bother since it really isn’t that useful especially with tags or LSP.

I still put the return type on a line of its own, not for search/grep, but because it is cleaner and looks nice to me—overly long lines are the ugliest of coding IMO. Well that and excessive nesting.

drewg123
0 replies
1h34m

In terms of C, that's one reason I prefer the BSD coding style:

int

foo(void) { }

vs the Linux coding style:

int foo(void) { }

The BSD style allows me to find function definitions using git grep ^foo.

darepublic
0 replies
32m

There is arrow syntax with js

bryanrasmussen
0 replies
13h9m

JavaScript has multiple ways to define a function so you sort of lose that getting the actual definition benefit.

on edit: I see someone discussed that you can grep for both arrow functions and named function at the same time and I suppose you can also construct a query that handles a function constructor as well - but this does not really handle curried functions or similar patterns - I guess at that point one is letting the perfect become the enemy of the good.

Most people grepping know the code base and the patterns in use, so they probably only need to grep for one type of function declaration.

bionsystem
0 replies
14h13m

Doesn't cscope fit this usecase ?

akritid
0 replies
10h36m

Looks fine (subjective) and there is also ctags

akira2501
0 replies
7h28m

so the best you can do is search for the name

This is why in C projects libs go in "lib/" and sources go in "src/". If your header files have the same directory structure as libs, then "include/" is a also a decent way to find definitions.

lucumo
96 replies
12h40m

Grepping for symbols like function names and class names feels so anemic compared to using a tool that has a syntactic understanding of the code. Just "go to definition" and "find usages" alone reduce the need for text search enormously.

For the past decade-plus I have mostly only searched for user facing strings. Those have the advantage of being longer, so are more easily searched.

Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.

laserbeam
18 replies
10h11m

Scenarios where an IDE with full syntactic understanding is better:

- It's your day to day project and you expect to be working in it for a long time.

Scenarios where grepping is more useful:

- Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

- You just opened the project for the first time.

- It's in a language you don't daily drive (you write backend but have to delve in frontend code, it's a 3rd party library, it's configuration files, random json/xml files or data)

- You're editing or searching through documentation.

- You haven't even downloaded the project and are checking things out in github (or some similar site for your project).

- You're providing remote assistance to someone and you are not at your main development machine.

- You're remoting via SSH and have access to code there (say it's a python server).

Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.

cxr
4 replies
6h24m

- You're fully aware that it would be better to be able to use tooling for $THING, but tooling doesn't exist yet or is immature.

kragen
3 replies
4h46m

you would not believe the amount of time i spent pretty-printing python dicts by hand last week

kragen
1 replies
4h24m

yeah, pprint is why i was doing it by hand ;)

lkbm
0 replies
3h12m

I used to pipe things through black for that. (a script that imported black, not just black on the command line.)

I also had `j2p` and `p2j` that would convert between python (formatted via black) and json (formatted via jq), and the `j2p_clip`/`p2j_clip` versions that would pipe from clipboard and back into clipboards.

It's worth taking the time to build a few simple scripts for things you do a lot. I used to open up the repl and import json to convert between json and python dicts multiple times a day, so spending a few minutes throwing together a simple script to do it was well worth the effort.

umanwizard
2 replies
2h20m

Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

Your other points make sense, but in this case, at least for C/C++, you can generate a compile_commands.json that will let clangd interpret your code accurately.

If building with make just do `bear -- make` instead of `make`. If building with cmake pass `-DCMAKE_EXPORT_COMPILE_COMMANDS=1`.

camel-cdr
1 replies
1h28m

Does it evaluate macros? Because macros allow for arbitrary computation.

umanwizard
0 replies
12m

The macros I see in the real world seem to usually work fine. I’m sure it’s not perfect and you can construct a macro that would confuse it, but it’s a lot better than not having a compilation db at all.

joe-six-pack
2 replies
4h33m

You forgot massive codebases. Language servers really struggle with anything on the order of the Linux kernel, FreeBSD, or Chromium.

umanwizard
0 replies
2h17m

clangd works fine for me with the linux kernel. For best results build the kernel with clang by setting LLVM=1 and KERNEL_LLVM=1 in the build environment and run ./scripts/clang-tools/gen_compile_commands.py after building.

Groxx
0 replies
21m

I honestly suspect that the amount of time spent dealing with the issues monorepos cause is net-larger than the gains most get from what a monorepo offers. It's just harder to measure because it tends to degrade slowly, happen to things you didn't realize you were relying on (until you need them), and without clear ways to point fingers at the cause.

Plus it means your engs don't learn how to deal with open source code concerns, e.g. libraries, forking, dependency management. Which gradually screws over the whole ecosystem.

If you're willing to put Google-scale effort into building your tooling, sure. Every problem is solvable. Only Google does that though, everyone else is getting by with a tiny fraction of the resources and doesn't already have a solid foundation to reduce those maintenance costs.

popinman322
1 replies
9h56m

Grep is also useful when IDE indexing isn't feasible for the entire project. At past employers I worked in monorepos where the sheer size of the index caused multiple seconds of delay in intellisense and UI stuttering; our devex team's preferred approach was to better integrate our IDE experience with the build system such that only symbols in scope of the module you were working on would be loaded. This was usually fine, and it works especially well for product teams, but it's a headache when you're doing cross-cutting work (e.g. for infrastructure projects/overhauls).

We also had a livegrep instance that we could use to grep any corporate repo, regardless of where it was hosted. That was extremely useful for investigating failures in build scripts that spanned multiple repositories (e.g. building a Go sidecar that relies on a service config in the Java monorepo).

cma
0 replies
5h53m

If running into this, make sure to enable 64-bit intellisense and increase the ram limit, by default it is 4gb.

jollyllama
1 replies
6h17m

It's your day to day project and you expect to be working in it for a long time.

Bold of everyone here to assume that everyone has a day to day project. If you're a consultant or for other reasons you're switching projects on a month to month basis, greppability is probably the top metric second to UT coverage.

switchbak
0 replies
2h8m

They said the scenario in which that would be useful was IF: "It's your day to day project and you expect to be working in it for a long time". The implication being that if neither of those hold then skip to the next section.

I don't think anyone is assuming anything here. I've contracted for most of my career and this didn't seem like an outlandish statement.

Also, if you're working in a project for a month, odds are you could set up an IDE in the first few hours. Not sure how any of this rises to the level of being "bold".

lolinder
0 replies
4h49m

It's your day to day project and you expect to be working in it for a long time.

I don't think we need to restrict the benefits quite that much—if it's a project that isn't my day-to-day but is in a language I already have set up in my IDE, I'd much prefer to open it up in my IDE and use jump to definition and friends than to try to grep and hope that the developers made it grepable.

Going further, I'd equally rather have plugins ready to go for every language my company works in and use them for exploring a foreign codebase. The navigation tools all work more or less the same, so it's not like I need to invest effort learning a new tool in order to benefit from navigation.

Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.

Certainly don't sabotage, but some of these suggestions are bad for other reasons that aren't about grep.

For example: breaking the naming conventions of your language in order to avoid remapping is questionable at best. Operating like that binds your business logic way too tightly to the database representation, and while "just return the db object" sounds like a good optimization in theory, I've never not regretted having frontend code that assumes it's operating directly on database objects.

gpderetta
0 replies
4h34m

- you just switched branch/rebased and the index is not up to date.

- the project is large enough that the IDE can't cope.

- you want to also match comments, commented out code or in-project documentation

- you want fuzzy search and match similarly named functions

I use clangd integration in my IDE all the time, but often brute force is the right solution.

emn13
0 replies
9h41m

Further important (to me) scenarios that also argue for greppability:

- greppability does not preclude IDE or language server tooling; there's often special cases where only certain e.g. context-dependant usages matter, and sometimes grep is the easiest way to find those.

- projects that include multiple languages, such as for instance the fairly common setup of HTML, JS, CSS, SQL, and some server-side language.

- performance in scenarios with huge amounts of code, or where you're searching very often (e.g. in each git commit for some amount of history)

- ease of use across repositories (e.g. a client app, a spec, and a server app in separate repos).

I treat greppability as an almost universal default. I'd much rather have code in a "weird" naming style in some language but have consistent identifiers across languages, than have normal-style-guide default identifiers in each language, but differing identifiers across languages. If code "looks weird", if anything that's often actually a _benefit_ in such cases, not a downside - most serialization libraries I use for this kind of stuff tend to do a lot of automagic mapping that can break in ways that are sometimes hard to detect at compile time if somebody renames something, or sometimes even just for a casing change or type change. Having a hint as to this fragility immediate at a glance even in dynamically typed languages is sometimes a nice side-effect. Very speculatively, I wouldn't be surprised if AI coding tools can deal with consistent names better than context-dependent ones too; greppability is likely not specifically about merely the tool grep.

And the best part is that there's almost no downside; it's not like you need to pick either a language server, IDE or grep - just use whatever is most convenient for each task.

gregjor
17 replies
9h12m

I abandoned VSCode and went back to vim + ctags + ripgrep after a year with the most popular IDE. I miss some features but it didn’t give me a 10x or even 1.5x improvement in my own work along any dimension.

I attribute that mostly to my several decades of experience with vi(m) and command line tools, not to anything inherently bad about VSCode.

What counts as “better” tools has a lot of subjectivity and circumstances implied. No one set of tools works for everyone. I very often have to work over ssh on servers that don’t allow installing anything, much less Node and npm for VSCode, so I invest my time in the tools that always work everywhere, for the work I do.

The main project I’ve worked on for the last few years has a little less than 500,000 lines of code. VSCode’s LSP takes a few seconds fairly often to maintain the LSP indexes. Running ctags over the same code takes about a second and I can control when that happens. vim has no delays at all, and ripgrep can search all of the files in a second or two.

wrasee
14 replies
8h43m

Did you consider Neovim? You get the benefit of vim while also being able to mix in as much LSP tooling as you like. The tradeoff is that it takes some time to set up, although that is getting easier.

That won’t make LSP go any faster though. There’s still something interesting in the fact that a ripgrep of every line in the codebase can still be faster than a dedicated tool.

gregjor
8 replies
8h31m

Considered it and have tried repeatedly to get it to work with mixed success. As you wrote, it takes "some time" to set up. In my case it would only offer marginal improvements over plain vim, since I'm not that interested in the LSP integration (and vim has that too, through a plugin).

In the environments I often work in I can't install anything or run processes like node. I ssh into a server and have to use whatever came with the Linux distro, which means sticking with the tools I will find everywhere. I can't copy the code from the server either. If I get lucky they used version control. I know not everyone works with those constraints. I specialize in working on abandoned and legacy code.

kragen
5 replies
4h44m

can you not upload executables over ssh, say for policy reasons or disk-space reasons? how about shell scripts?

i mean, i doubt i'm going to come up with some brilliant breakthrough that makes your life easier that you've somehow overlooked, but i'd like to understand what kinds of constraints people like you often confront

i'm just glad you don't have to use teamviewer

gregjor
4 replies
3h18m

I don't have to use TeamViewer, though I very occasionally have to use Windows RDP.

You can transfer any kind of file over ssh. scp, sftp, rsync will all copy binaries. Mainly the issues come down to policy and billable time. Many of my customers simply don't allow installing anything on their servers without a tedious approval process. Even if I can install things I might spin my wheels trying to get it to work in an environment I don't have root privileges on, with no one willing to help, and I can't bill for that time. I don't work for free to get an editor installed. I use the tools I know I can find on any Linux/BSD server.

With some customers I have root privileges and manage the server for them. With others their IT dept has rules I have to follow (I freelance) if I want to keep a good relationship. Since I juggle multiple customers and environments I find it simpler not having to manage different editors and environments, so I mostly stick with the defaults. I do have a .profile and .vimrc I copy around if allowed to, that's about it.

I can't lose time/money and possibly goodwill whining about not having everything just-so for me. I recently worked on a server over ssh that didn't have tmux installed. Fortunately it did have screen, and I can use that too, no big deal. I spent less than 60 seconds figuring that out and getting to work rather than wasting hours of non-billable time annoying someone about how I needed tmux installed.

kragen
3 replies
3h7m

i see, thanks!

wrt rdp, i feel like rdp is actually better than vnc or x11-over-ssh, but for cases where regular ssh works, i'd rather use ssh

i wasn't thinking in terms of installing tmux, more like a self-contained binary that doesn't require any kind of 'installation'

gregjor
2 replies
2h56m

I used the word "install" but the usual rule says I can't install, upload, or execute any non-approved software. Usually that just gets stated as a policy, but I have seen Linux home directories on noexec partitions -- government agencies and big corporations can get very strict about that. So copying a self-contained binary up and running it would violate the policy.

I pretty much live in ssh. Remote Desktop means a lot of clicking and watching a GUI visibly repaint. Not efficient. Every so often I have customers using applications that only run on Windows, no API, no command line, so they will enable RDP to that, usually through a VPN.

kragen
1 replies
2h28m

i see! but i guess your .profile and .vimrc don't count?

gregjor
0 replies
1h59m

They aren't executables.

wrasee
1 replies
8h3m

Yes ok. And legacy code might be a good example where grep works well, if it's fair to argue a greater propensity for things like preprocessors, older languages and custom builds that may not play as well with semantic-level tools, let alone be written with modern tooling in mind.

gregjor
0 replies
4h47m

Lol, I'm not working with COBOL or Fortran. Legacy code in my world means the original developers have left, not that it dates from the 1970s. Mostly I work with PHP, shell scripts, various flavors of SQL, Python, sometimes Rails or other stuff. All things modern LSPs can handle.

VHRanger
4 replies
6h31m

There's also helix now, which requires next to no setup, but requires learning new motions (subject is before the verb in helix)

gregjor
3 replies
4h45m

I looked at Helix but since I dream in vim motions at this point (vi user since it came out) I'd have to see a 10x improvement to switch. VSCode didn't give me a 10X improvement, I doubt Helix would.

VHRanger
2 replies
3h51m

Helix certainly won't give you a 10x improvement. It tends to convert a lot of people moving "up" from VS Code, and still a decent chunk, but certainly fewer neovim users moving "down".

Advantages of Helix are pretty straightforward:

1. Very little configuration bullshit to deal with. There's not even a plugin system yet! You just paste your favorite config file and language/LSP config file and you're good to go. For anything else, submit a pull request.

2. Built in LSP support for basically anything an LSP exists for.

3. There's a bit of a new generation command line IDE forming itself around zellij (tmux that doesn't suck) + helix + yazi (basically nnn or mc on crack, highly recommended).

That whole zellij+helix+yazi environment is frankly a joy to work in, and might be the 2-3x improvement over neovim that makes the switch worth it.

gregjor
1 replies
3h9m

Like I wrote, I looked at Helix. Seems cool but not enough for me to switch. And I would have to install it on the machines I work on, which very often I can't do because of company policies, or can't waste the non-billable time on.

I only recently moved from screen to tmux, and I still have to fall back to screen sometimes because tmux doesn't come with every Linux distro. I expect I will retire before I think tmux (or screen, for that matter) "sucks" to the point I would look at something else. And again I very often can't install things on customer servers anyway.

VHRanger
0 replies
37m

Tmux does suck pretty bad though?

It conflicts with the clipboard and a bunch of hotkeys, and configuring it never works because they have breaking change in how their config file works ever 6months or so.

These days I only use it to launch a long running job in ssh to detach the session it's on and leave.

joe-six-pack
1 replies
4h37m

VSCode is not an IDE, it's an extensible text editor. IDEs are integrated (it's in the name) and get developed as a whole. I'm 99% certain that if you were forced to spend a couple of months in a real IDE (like IDEA or Rider), you would not want to go back to vim, or any other text editor. Speaking as a long time user of both.

gregjor
0 replies
3h28m

I get your point, but VSCode does far more than text editing. The line between an advanced editor and an IDE gets blurry. If you look at the Wikipedia page about IDEs[1] you see that VSCode ticks off more boxes than not. It has integration with source code control, refactoring, a debugger, etc. With the right combination of extensions it gets really close to an IDE as strictly defined. These days advanced text editor vs. "real" IDE seems more like a distinction without much of a difference.

You may feel 99% certain, but you got it wrong. I have quite a bit of experience with IDEs, you shouldn't assume I use vim out of ignorance. I have worked as a programmer for 40+ years, with development tools (integrated or not) that I have forgotten the names of. That includes "real" IDEs like Visual Studio, Metrowerks CodeWarrior, Symantec Think C, MPW, Oracle SQL Developer, Turbo Pascal, XCode, etc. and so on. When I started programming every mainframe and minicomputer came with an IDE for the platform. Unix came along with the tools broken out after I had worked for several years. In high school I learned programming on an HP-2000 BASIC minicomputer -- an IDE.

So I have spent more than "a couple of months in real IDEs" and I still use vim day to day. If I went back to C++ or C# for Windows I would use Visual Studio, but I don't do that anymore. For the kind of work I do now vim + ctags + ripgrep (and awk, sed, bash, etc.) get my work done. At my very first real job I used PWB/Unix[2] -- PWB means Programmer's Work Bench -- an IDE of sorts. I still use the same tools (on Linux) because they work and I can always count on finding a large subset of them on any server I have to work with.

I don't dislike or mean to crap on IDEs. I have used my share of IDEs and would again if the work called for that. I get what I need from the tools I've chosen, other people make different choices, no perfect language, editor, IDE, what have you exists.

[1] https://en.wikipedia.org/wiki/Integrated_development_environ...

[2] https://en.wikipedia.org/wiki/PWB/UNIX

zarzavat
12 replies
12h20m

Go to definition and find usages only work one symbol at a time. I use both, but I still use global find/replace for groups of symbols sharing the same concept.

For example if I want to rename all “Dog” (DogModel, DogView, DogController) symbols to “Wolf”, find/replace is much better at that because it will tell me about symbols I had forgotten about.

turboponyy
5 replies
11h56m

There's no reason they have to work one symbol at a time - that's just a missing feature in your language server implementation.

Some language servers support modifying the symbols in contexts like docstrings as well.

setopt
4 replies
11h45m

I’ve never seen an LSP server that lets you rename “Dog” to “Wolf” where your actual class names are “Dog[A-Za-z]*”?

Do you have an example?

turboponyy
1 replies
9h37m

Neither have I; and no, I don't - I misinterpreted what you said.

But I don't see why LSP servers shouldn't support this, still. I'm not sure if the LSP specification allows for this as of current, though.

setopt
0 replies
1h32m

I would actually love a regexp search-and-replace assisted by either TreeSitter or LSP.

Something that lets me say that I want to replace “Dog\(.*\)” with “Wolf\1”, but where each substitution is performed only within single “symbols” as identified by TS or LSP.

Maxion
1 replies
11h25m

IntelliJ's refactor tool?

yen223
0 replies
4h21m

IntelliJ doesn't use LSP as far as I know.

It does usually make that kind of DogModel -> WolfModel refactoring.

gugagore
3 replies
12h15m

I am familiar with the situation you describe, and it's a good point.

However, it does suggest that there is an opportunity for factoring "Dog" out in the code, at least by name spacing (e.g. Dog.Model).

zarzavat
1 replies
10h21m

That gets to the core of the issue doesn’t it? There are two cultures: Do you prefer to refactor DogView into Dog.View, or do you prefer to refactor Dog.View into DogView.

Personally I value uniqueness/canonicalness over conciseness. I would rather have DogView because then there is one name for the symbol regardless of where I am in the codebase. If the same symbol is used with differently qualified names it is confusing - I want the least qualified name to be more descriptive than “View”.

The other culture is to lean heavily on namespaces and to not worry about uniqueness. In this case you have View and Dog.View that may be used interchangeably in different files. This is the dominant culture in Java and C#.

kccqzy
0 replies
5h20m

The second culture that you describe happens also to be how OCaml structures things in modules. It's quite a turnoff for me.

f1shy
0 replies
12h5m

That really depends on the context, and specific situation.

sandermvanvliet
0 replies
10h5m

Jetbrains ReSharper (and Rider) is smart enough to handle these things. It’ll suggest renames across other symbols even ones that have related names

f1shy
0 replies
12h3m

For that use case I think you can use treesitter[1] you can find Dog.* but only if it is a variable name, for example. Avoiding replacement inside of say literals.

[1] https://www.youtube.com/watch?v=MZPR_SC9LzE

jakub_g
9 replies
12h34m

Your observation does not help with the majority of the points in the article. How do you find all usages of a parameter value literal?

troupo
7 replies
12h25m

This is what the article starts with: "Even in projects exclusively written by myself, I have to search a lot: function names, error messages, class names, that kind of thing."

All of that is trivial to search for with a tool that understands the language.

nosianu
2 replies
12h13m

All of that is trivial to search for with a tool that understands the language.

Isn't string search, or grepping for patterns, even more trivial? So what is your argument? You found an alternative method, good, but how is it any better?

In my own case, I wrote a library that we used in many projects, and I often wanted to know where and how functions from my lib were used in those projects. For example, to be able to tell how much of an effort it would be for the users to refactor when I changed something. However, your method of choice at least with my IDE (Webstorm) only worked locally within the project. Only string search would let me reliably and easily search all projects.

I actually experimented creating a "meta" project of all projects, but while it worked that lead to too many problems, and the main method to find anything still was string search (CTRL-SHIFT-F Find dialog in IDEA IDEs is string search and it's a wonderful dialog in that IDE family). I also had to open that meta project. Instead, I created a gitignored folder with symlinks to the sources of all the other projects and created a search scope for that folder, in which the search dialog let me string-search all projects' sources at once right from within the library project and still being able to use the excellent Find dialog.

In addition, I found that sometimes the IDE would not find a usage even within the project. I only noticed because I used both methods, and string search showed me one or two places more than the method that relied on the underlying code-parsing. Unfortunately IDEs have bugs, and the method you suggests relies on much more work of the IDE in parsing and indexing compared to the much more mundane string or string pattern search.

troupo
1 replies
11h32m

Isn't string search, or grepping for patterns, even more trivial?

It's not trivial when you looking for symbols in context.

the method you suggests relies on much more work of the IDE in parsing and indexing compared to

...compared to parsing and indexing you have to do manually because a full-text search (especially in a large codebase) will return a lot of irrelevant info?

Funnily enough I also have a personal anecdote. We had a huge PHP code base based on Symfony. We were in the middle of a huge refactoring spree. I saw my colleagues switch from vim/emacs to Idea/WebStorm looking at how I easily found symbols in the code base, found their usages, refactored them etc. compared to the full-text search they were always stuck with.

This was 5-6 years ago, before LSP became ubiquitous.

nosianu
0 replies
32m

It's not trivial

Did you miss the comparison? The "more trivial"? The context of my response? Please read the parent comment I responded to, treating my comment as standalone and adding some new meaning makers no sense.

String search is more trivial than a search that involves an interpretation of the code structure and meaning. I have no idea why you wish to start a discussion about such trivial statement.

* because a full-text search (especially in a large codebase) will return a lot of irrelevant info?*

It doesn't do that for me but instead works very well. I don't know what you do with your symbol names, but I have barely any generic function names, the vast majority of them are pretty unique.

No idea how you use search, but I'm never looking for "doSomething(", it's always "doSomethingVerySpecific()", or some equally specific string constant.

I don't have the problems you tell me I should have, and my use case was the subject of my comment, as should be clear, as well as my comment being a response to a specific point made by the parent comment.

renewiltord
1 replies
12h17m

I actually don't think there's a tool that handles usages when using PHP varvars or when using example number one there which is parametrically choosing a table name.

When you string interpolate to build the name you lose searchability.

troupo
0 replies
11h30m

Yes, full-text search is a great fallback when everything else fails. But in the use cases listed at the beginning of the article it's usually not needed if you have proper tools

cma
1 replies
5h41m

All of that is trivial to search for with a tool that understands the language.

Some literal in a log message may come from the code or it might be remapped in some config file outside the language the LSP is looking at, or an environment variable etc.. I just go back and forth with grep and IDE tools, both have different tradeoffs.

troupo
0 replies
3h5m

The thing is, so many people are weirdly obsessed with never using any other tools besides full-text search. As if using useful tools somehow makes them a lesser programmer or something :)

CrimsonRain
0 replies
7h45m

By not using literals everywhere. All literals are defined somewhere (start of function, class etc) as enums or vars and used.

Just because I have 20 usage of 'shipping_address' doesn't mean I'll have this string 20 times in different places.

Grep has its place and I often need to grep code base which have been written without much thoughts towards DX. But writing it nicely allows LSP to take over.

aa-jv
8 replies
11h50m

On the flipside, IDE's can turn you into lazy, inefficient programmers by doing all the hand-holding for you.

If your feelings are anemic when tasked with doing a grep, its because you have lost a very valuable skill by delegating it to a computer. There are some things the IDE is never going to be able to find - lest it becomes the development environment - so keeping your grep fu sharpened is wise beyond the decades.

(Disclaimer: 40 years of software development, and vim+cscope+grep/silversearcher are all I really need, next to my compiler..)

throwaway2037
2 replies
8h26m

    > lazy... programmers
Since when was that a bad thing? Since time immemorial, it has been hailed as a universal good for programmers to be lazy. I'm pretty sure Larry Wall has lots of jokes about this on Usenet.

Also, I can clearly remember switching from vim/emacs to Microsoft Visual Studio (please, don't throw your tomatoes just yet!). I was blown away by IntelliSense. Suddenly, I was focusing more on writing business logic, and less time searching for APIs.

trashtester
1 replies
7h38m

This is the wrong type of lazy.

Command line tools like grep are force multipliers for programmers. GUI's come with the risk of not being able to learn how to leverage this power. In the end, that often leads to more manual work.

And today, bash is a lingua franca that you can bring with you almost everywhere. Even Windows "speaks" bash these days, with WSL.

In itself, there's nothing wrong with using the built-in features of a GUI. Right-clicking a method (or using a keyboard shortcut) to find the definition in a given code base IS nice for that particular operation.

But by knowing grep/awk/find/git command line and so on, combined with bash scripting and advanced regular expressions, you open up a new world of possibilities.

All those things CAN be done using Python/C#/Java or whatever your language is. But a 1-liner in bash can be 10-100 lines of C#.

lucumo
0 replies
6h12m

Where does this stupid notion come from that using powerful tools means you can't handle the less powerful ones anymore? Did your skills with a hand screwdriver atrophy when you learned how to use a powered screwdriver? Come on.

I use grep multiple times a day. I write bash scripts quite often. I'm not speaking from a position of ignorance of these tools. They have their place as a lowest common denominator of programming tools. But settling for the lowest common denominator is not a path to productivity.

Doesn't mean you should forget your skills, but it does mean you should investigate better tools. And leverage them. A lot.

But a 1-liner in bash can be 10-100 lines of C#.

Yes. And the reverse is also true. bash is fast and easy if there's an existing tool you can leverage, and slow and hard when there's not.

HdS84
1 replies
11h16m

Huh? I have an old hand-powered drill from my Grandpa in my workshop. I used it once for fun. For all other tasks I use a powered drill. Same for IDEs. They help your refactor and reason about code - both properties I value. Sure, I could print it and use a textmarker, but I'm not Grandpa

trashtester
0 replies
7h34m

Knowing the bash ecosystem translates better to how you use the knife in the kitchen.

Sure you can replace most uses of a knife with power tools, but there is a reason why most top chefs still rely on that knife for most of those tasks.

A hand powered drill is more like a hand powered meatgrinder. It has the same limitation as the powered versions, and is simply a more primitive version.

winwang
0 replies
9h58m

I count the IDE and stuff like LSP as natural extensions of the compiler. For sure I grep (or equivalent) for stuff, but I highly prefer statically typed languages/ecosystems.

At the end of the day, I'm here to solve problems, and there's no end to them -- might as well get a head start.

lucumo
0 replies
8h26m

If your feelings are anemic

I'm not feeling anemic. The tool is anemic, as in, underpowered. It returns crap you don't want, and doesn't return stuff you do want.

My grep-fu is fine. It's a perfectly good tool if you have nothing better. But usually you do have something better.

Using the wrong tool to make yourself feel cool is stupid. Using the wrong tool because a good tool could make you lazy shows a lack of respect for the end result.

high_na_euv
0 replies
9h22m

Leveraging technology is good thing

kragen
2 replies
7h18m

posts like this sound like the author routinely solves harder problems than you are, because the solutions you suggest don't work in the cases the post is about. we've had 'go to definition' since 01978 and 'find usages' since 01980, and you should definitely use them for the cases where they work

mjr00
1 replies
5h33m

From the article,

- dynamically built identifiers is 100% correct, never do this. Breaks both text search and symbol search, results in complete garbage code. I had to deal with bugs in early versions of docker-compose because of this.

- same name for things across the stack? Shouldn't matter, just use find usages on `getAddressById`. Also easy way to bait yourself because database fields aren't 1:1 with front-end fields in anything but the simplest of CRUD webshit.

- translation example: the fundamental problem is using strings as keys when they should be symbols. Flat vs nested is irrelevant here because you should be using neither.

- react component example: As I mentioned in another comment, trivially managed with Find Usages.

Nothing in here strikes me as "routinely solves harder problems," it's just standard web dev.

kragen
0 replies
4h49m

yes, i agree that standard web dev is full of these problems, which can't be solved with go-to-definition and find-usages. it's a mess. i wasn't claiming that these messy, hard problems where grep is more helpful than etags are exotic; they are in fact very common. they are harder than the problems lucumo is evidently accustomed to dealing with because they don't have correct, complete solutions, so we have to make do with heuristics

advice to the effect of 'you should not make a mess' is obviously correct but also, in many situations, unhelpful. sometimes i'm not smart enough to figure out how to solve a problem without making a mess, and sometimes i inherit other people's messes. in those situations that advice decays into 'you should not try to solve hard problems'

heisenbit
2 replies
12h17m

A good IDE can be so much better iff it understands the code. However this requires the IDE to be able to understand the project structure, dependencies etc. which can be considerable effort. In a codebase with many projects employing several different languages it becomes hard to get and maintain the IDE understands everything state.

carlmr
0 replies
10h29m

And especially in large monorepos anything that understands the code can become quite sluggish. While ripgrep remains fast.

A kind of in-between I've found for some search and replace action is comby (https://comby.dev/). Having a matching braces feature is a godsend for doing some kind of replacements properly.

amichal
0 replies
10h37m

And an IDE would also fail to find references for most of the cases described in the article: name composition/manipulation, naming consistency across language barriers, and flat namespaces in serialization. And file/path folder naming seems to be irrelevant to the smart IDE argument. "Naming things is hard"

underdeserver
1 replies
12h3m

Unfortunately in larger codebases or dynamic languages these tools are just not good enough today. At least not those I and my employers have tried.

They're either incomplete (you don't get ALL references or you get false references) or way too slow (>10 seconds when rg takes 1-2).

Recommendations are most welcome.

jimmaswell
0 replies
12h1m

Only thing I can recommend is using C# (obviously not always possible). Never had an issue with these functions in Visual Studio proper no matter how big the project.

mjr00
1 replies
5h47m

Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.

Completely agreed. The React component example in the article is trivial solvable with any modern IDE; right click on class name, "Find Usages" (or use the appropriate hotkey, of course). Trying to grep for a class name when you could just do that is insane.

I mainly see this from juniors who don't know any better, but as seen in this thread and the article, there are also experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.

gpderetta
0 replies
4h28m

I worked on codebases large enough where enabling autocomplete/indexing would lock the IDE and cause the workstation to swap hard.

db48x
1 replies
7h23m

True, but IDEs are fragile tools. Sometimes you want to fall back to simpler tools that will always work, and grep is not fragile.

cxr
0 replies
6h18m

The basis if this article (and its forebear "Too DRY - The Grep Test"[1]) is that grep is fragile. It's just fragile in a way that's different from the way that IDEs are fragile.

1. <http://jamie-wong.com/2013/07/12/grep-test/>

brooke2k
1 replies
59m

with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)

with a legacy codebase, or a fork of a dependency that had to be patched which uses an incompatible buildsystem, or any C/C++/obj-c/etc that heavily uses the preprocessor or nonstandard build practices, or codebases that mix lots of different languages over awkward FFI boundaries and so on and so forth -- there are so many situations where sometimes an IDE just can't get you 100% of the way there and you have to revert to grepping to do any real work

that being said, I don't fully support the idea of handcuffing your code in the name of greppability, but I think dismissing it as a metric under the premise that IDEs make grepping "obsolete" is a little bit hasty

lucumo
0 replies
26m

with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)

I wish, but no. I've found people will make a mess of everything. Which is why I don't trust solutions that rely on humans having more discipline, like what this article advocates.

In any situation where grep is your last saviour, you cannot rely on the greppability of the code. You'll have to check and double check everything, and still accept the risk of errors.

brain5ide
1 replies
12h30m

I think the first sentence of the author counters your comment. What you described works best in a familiar codebase where the organizing principles have been maintained well and are familiar to the reader and the tools are just the extension of those organizing principles. Even then a deviation from those rules might produce gaps in understanding of what the codebase does.

And grep cuts right through that in a pretty universal way. What the post describes are just ways to not work against grep to optimize for something ephemeral.

ricardo81
0 replies
12h16m

Agree. Not just because it's unfamiliar code, you can also get a feel for how the program/programmer(s) structured the whole thing.

sauercrowd
0 replies
7h47m

strongly disagree here. This works if - your IDE/language server is performant - all the tools are fully set up - you know how to query the specific semantic entity you're looking for (remembering shortcuts) - you are only interested in a single specific semantic entity - mixing entities is rarely supported

I dont map out projects in terms of semantics, I map out projects in files and code - That makes querying intuitive and I can easily compose queries that match the specificity of what I care about (e.g. I might want to find a `Server` but I want to show both classes, interfaces and abstract classes).

For the specific toolchain I'm using - typescript - the symbol search is also unusable once it hits a certain project size, it's just way too slow for it to be part of my core workflow

phyrex
0 replies
9h30m

This breaks down at scale and across languages. All the FAANGs make heavy use of the equivalent of grepping in their code base

leni536
0 replies
11h2m

I can't use an IDE on my entire git history, but git can grep.

k__
0 replies
10h37m

Honestly, in my 18 years of software development, I haven't "greped" code once.

I only use grep to filter the output of CLI tools.

For code, I use my IDE or repository features.

jmmv
0 replies
36m

Sure, if you have the luxury of having a functional IDE for all of your code.

You can't imagine how much faster I was than everybody else at answering questions about a large codebase just because I knew how to use ripgrep (on Windows). "Knowing how to grep" is a superpower.

hyperpape
0 replies
7h11m

I can run rg over my project faster than I can do anything in my IDE. Both tools have their places.

citrin_ru
0 replies
10h39m

Not everything you need to look for is a language identifier. I often grep for configuration option names in the code to see what the option actually does - sometimes it is easy to grep, sometimes there are too many matches, sometimes they cannot be found because option name composed in the code from separate unrepeatable (because of too many matches) parts. It's not hard to make config options greppable but some coders just don't care about this property.

a_e_k
0 replies
11h24m

I've come to really like language servers for big personal and work projects where I already have my tools configured and tuned for efficiently working with it.

But being able to grep is really nice when trying to figure out something out about a source tree that I don't yet have set up to compile, nor am I a developer of. I.e., I've downloaded the source for a tool I've been using pre-built binaries of and am now trying to trace why I might be getting a particular error.

PhilipRoman
0 replies
9h40m

IDEs are cool and all, but there is no way I'm gonna let VSCode index my 80GB yocto tmp directory. Ctags can crunch the whole thing in a few minutes, and so can grep.

Plus there are cases where grep is really what you need, for example after updating a particular command line tool whose output changed, I was able to find all scripts which grepped the output of the tool in a way that was broken.

IshKebab
0 replies
11h49m

Definitely true when you can use static typing.

Unfortunately sometimes you can't, and sometimes you can but people can't be arsed, so this is still a consideration.

EasyMark
0 replies
5h22m

It seems like the law of diminishing returns; while I'm sure in a few cases this characteristic of a code writing style is extremely useful, it cuts into other things such as readability and conciseness. Fewer lines can mean fewer bugs, within reason, if you aren't in lisp and are using more than 3 parentheses, you might want to split it up because the compiler/JIT/interpreter is going to anyway.

VoxPelli
35 replies
11h29m

I advocate for greppability as well – and in Swedish it becomes extra fun – as the equivalent phrase in Swedish becomes "grep-bar" or "grep-barhet" and those are actual words in Swedish – "greppbar" roughly means "understandable", "greppbarhet" roughly means "the possibility to understand"

sshine
13 replies
10h59m

How many other UNIX commands did the Swedes adopt into their language?

I know that they invented "curl". Do you tar xfz?

lukan
8 replies
10h53m

As far as I understood, it was part of the language before.

The german equivalent of the word would be probably "greifbar". Being able to hold something, usually used metaphorically.

kagevf
6 replies
10h49m

able to hold

Would "grasp" work?

octocop
4 replies
10h34m

It's closer to grip

trashtester
2 replies
8h14m

"zu greifen" may best translate to "to grip", but "grip" has different mental connotations in English (it refers to mental stability, not intellectual insight).

The best dual purpose translation of "zu greifen"/"gripe" (German/Scandinavian) meaning "zu begreifen"/"begripe"/"understand" would be "to grasp", which covers both physically grabbing into something and also to understand it intellectually.

All these words stem back to the Proto-Indo-European gʰrebʰ, which more or less completes the circle back to "grep".

lordgrenville
1 replies
6h37m

related to "grok"?

trashtester
0 replies
5h33m

grok /ɡrɒk/

Origin 1960s: a word invented by Robert Heinlein (1907–88), American author.

n_plus_1_acc
0 replies
9h51m

I've always related grep to grab

actionfromafar
0 replies
7h49m

Yes. "Grasping for straws."

ManuelKiessling
0 replies
10h40m

Which leads to "begreifbar", which I would explain/translate (badly) with "something is begreifbar if it can be understood".

scbrg
2 replies
10h19m

We do tar, for xfz I think you have to look to the Slavic languages :)

Anyway, to answer your question:

  $ grep -Fxf <(ls -1 /bin) /usr/share/dict/swedish 
  ack
  ar
  as
  black
  dialog
  dig
  du
  ebb
  ed
  editor
  finger
  flock
  gem
  glade
  grep
  id
  import
  last
  less
  make
  man
  montage
  pager
  pass
  pc
  plog
  red
  reset
  rev
  sed
  sort
  sorter
  split
  stat
  tar
  test
  transform
  vi
:)

[edit]: Ironically, grep in that list is not the same word as the one OP is talking about. That one is actually based on grepp, with the double p. grep means pitchfork.

pbhjpbhj
1 replies
7h45m

Pitchfork? As in something that might be used to search a haystack?? How delightful.

sshine
0 replies
3h24m

Yeah, that’s one type.

Another is for turning soil at a small scale by hand (also called a cultivator, I think).

But they all have somewhat long prongs.

tripzilch
0 replies
9h35m

I learned from bash.org that "tar -xzvf" is in German accent for "xtract ze vucking files".

vanschelven
6 replies
10h47m

Begreppelijk (begrijpelijk) in Dutch

Cthulhu_
5 replies
9h37m

or "Grijpbaar" (grabbable)

medstrom
4 replies
8h18m

So Dutch/German make "begreif" a verb, for Swedish it is just a noun (that means "concept").

But "begrijpelijk" has a clone: "begriplig". An adverb based on a verb in a foreign dictionary. There is no verb that goes "begreppa", it's just "greppa".

trashtester
1 replies
8h7m

"Jag kan inte begripa svenska."

medstrom
0 replies
7h58m

Oh, you're right.

jeroenhd
0 replies
5h10m

Dutch also has a noun ("begrip") meaning "notion" or "understanding".

fedder
0 replies
5h42m

The term concept itself suggests grasping or holding/taking hold of, see the latin verb concipio or adjective conceptus.

elygre
6 replies
11h20m

Could I suggest that greppbarhet is more precisely translated as “the ability of being understood”?

(Norwegian here. Our languages are similar, but we miss this one.)

medstrom
3 replies
8h1m

Norwegian still translates grep as "grip"/"grab". I always thought of grepping as reaching in with a hand into the text and grabbing lines. That association is close at hand (insert lame chuckle) for German and English speakers too.

pbhjpbhj
2 replies
7h47m

In English that association is going to depend a lot on one's accent; until now I've never associated grep-ing with anything other than using grep! (But, equally, that might just be a me thing.)

medstrom
0 replies
6h2m

What about groping? Groping around for text.

bee_rider
0 replies
6h49m

It doesn’t sound anything like grip in my accent but for some reason the association has always been there for me. Grabbing or ripping parts from the file.

psychoslave
1 replies
10h2m

So, at the extrem opposite of the esoteric "general regular expression print" that grep stands for with few ever knowing it?

johncoltrane
0 replies
8h48m

s/general/global

octocop
2 replies
10h31m

And we also have "begrepp", which is also a spin on content and understanding it's content.

majewsky
1 replies
8h41m

Oh, that's like German "begreifen", no? (Which means "to grok".)

medstrom
0 replies
8h34m

Grok is right! I'd translate Swedish "greppbar" directly as "grokkable"; "att greppa" as "to grok".

TeMPOraL
2 replies
10h15m

Which is ironic, given that the article is about making it easier to use grep in order to avoid having to understand anything.

bob88jg
1 replies
9h2m

Nah, you've got it backwards. The article isn't about dodging understanding - it's about making it way easier to spot patterns in your code. And that's exactly how you start to really get what's going on under the hood. Better searching = faster learning. It's like having a good map when you're exploring a new city

TeMPOraL
0 replies
4h35m

The article advocates making code harder to understand for the sake of better search. It's like forcing a city to conform to a nice, clean, readable map: it'll make exploring easier for you, at the cost of making the city stop working.

layer8
0 replies
7h54m

Graspability. ;)

More customarily: intelligibility.

skrebbel
20 replies
9h10m

The second point here made me realize that it'd be super useful for a grep tool to have a "super case insensitive" mode which expands a search for, say, "FooBar|first_name" to something like /foo[-_]?bar|first[-_]?name/i, so that any camel/snake/pascal/kebab/etc case will match. In fact, I struggle to come up with situations where that wouldn't be a great default.

hnben
5 replies
6h36m

"super case insensitive"

lets say someone would make a plugin for their favorite IDE for this kind of search. How would the details look like?

To keep it simple, lets assume we just do the super-case-insensitivity, without the other regex condition. Lets say the user searches for "first_name" and wants to find "FirstName".

one simple solution would be to have a convention where a word starts or ends, e.g. with " ". So the user would enter "first name" into the plugin's search field. The plugin turns it into "/first[-_]?name/i" and gives this regexp to the normal search of the IDE.

another simple solution would be to ignore all word boundaries. So when the user enters "first name", the regexp would become "/f[-_]?i[-_]?r[-_]?s[-_]?t[-_]?n[-_]?a[-_]?m[-_]?e[-_]?/i". Then the search would not only be super-case-insensitive, but super-duper-case-insensitive. I guess the biggest downside would be, that this could get very slow.

I think implementing a plugin like this would be trivial for most IDEs, that support plugins.

Am I missing something?

marcosdumay
1 replies
3h43m

The best way would be to make an escape code that matches zero or one punctuation.

So you's search for "/first\_name/i".

Izkata
0 replies
1h16m

That already exists as "?" and was used in their example:

  /first[-_]?name/i
Or to use your example, just checking for underscores and not also dashes:

  /first_?name/i
Backslash is already used to change special characters like "?" from these meanings into just "use this character without interpreting it" (or the reverse, in some dialects).

skrebbel
0 replies
5h26m

Hm I'd go even simpler than that. Notably, I'd not do this:

So the user would enter "first name" into the plugin's search field.

Why wouldn't the user just enter "first_name" or "firstName" or something like that? I'm thinking about situations like, you're looking at backend code that's snake_cased, but you also want it to catch frontend code that's camelCased. So when you search for "first_name" you automagically also match "firstName" (and "FirstName" and "first-name" and so on). I wouldn't personally introduce some convention that adds spaces into the mix, I'd simply convert anything that looks snake/kebab/pascal/camel-cased into a regex that matches all 4 forms.

Could even be as stupid as converting "first_name" or "firstName", or "FirstName" etc into "first_name|firstname|first-name", no character classes needed. That catches pretty much every naming convention right? (assuming it's searched for with case insensitivity)

inanutshellus
0 replies
6h22m

IIUC, you're not missing anything though your interpretation is off from mine*. He wasn't saying it'd be hard, he was saying it should be done.

* my understanding was simply that the regex would (A) recognize `[a-z][A-Z]` and inject optional _'s and -'s between... and (B) notice mid-word hyphens or underscores and switch them to search for both.

__MatrixMan__
0 replies
6h6m

Shame on me for jumping past the simple solutions, but...

If you're going that far, and you're in a context which probably has a parser for the underlying language ready at hand, you might as well just convert all tokens to a common format and do the same with the queries. So searches for foo-bar find strings like FooBar because they both normalize to foo_bar.

Then you can index by more than just line number. For instance you might find "foo" and "bar" even when "foo = 6" shows up in a file called "bar.py" or when they show up on separate lines but still in the same function.

adammarples
3 replies
8h7m

Fzf?

setopt
2 replies
6h51m

Fuzzy search is not the same. For instance, it might by default match not only “FooBar” and “foo_bar” but also e.g. “FooQux(BarQuux)”, which in a large code base might mean hundreds of false positives.

mgkimsal
1 replies
6h30m

Ideally there'd be some sort of ranking or scoring that would happen to sort by. FooQux(BarQuux) would seemingly rank much lower then FooBar when searching for FooBar or "Foo Bar" but might still be useful in results if ranked and displayed lower.

setopt
0 replies
5h10m

Indeed, that's a good solution – and I believe e.g. fzf does some sort of ranking by default. The devil is however in the details:

One minor inconvenience is that the scoring should ideally be different per filetype. For instance, Python would count "foo-bar" as two symbols ("foo minus bar") whereas Lisp would count it was one symbol, and that should ideally result in different scores when searching for "foobar" in both. Similarly, foo(bar) should ideally have a lower different score than "foo_bar" for symbol search even though the keywords are separated by the same number of characters.

I think this can be accomodated by keeping a per-language list of symbols and associated "penalties", which can be used to calculate "how far" keywords are from each other in the search results weighted by language semantics :)

WizardClickBoy
3 replies
8h10m

This reminds me of the substitution mode of Tim Pope's amazing vim plugin [abolish](https://github.com/tpope/vim-abolish?tab=readme-ov-file#subs...)

Basically in vim to substitute text you'd usually do something with :substitute (or :s), like:

:%s/textToSubstitute/replacementText/g

...and have to add a pattern for each differently-cased version of the text.

With the :Subvert command (or :S) you can do all three at once, while maintaining the casing for each replacement. So this:

textToSubstitute

TextToSubstitute

texttosubstitute

:%S/textToSubstitute/replacementText/g

...results in:

replacementText

ReplacementText

replacementtext

tambourine_man
0 replies
5h44m

Of course it does. Or it wouldn’t be Emacs

WizardClickBoy
0 replies
6h18m

Also just realised while looking at the docs it works for search as well as replacement, with:

:S/textToFind

matching all of textToFind TextToFind texttofind TEXTTOFIND

But not TeXttOfFiND.

Golly!

dominicrose
1 replies
6h7m

Let's say you have a FilterModal component and you're using it like this: x-filter-modal

Improving the IDE to find one or the other by searching for one or the other is missing the point or the article, that consistency is important.

I'd rather have a simple IDE and a good codebase than the opposite. In the example that I gave the worst thing is that it's the framework which forces you do use these two names for the same thing.

skrebbel
0 replies
5h25m

My point is that if grep tools were more powerful we wouldn't need this very particular kind of consistency, which gives us the very big benefit of being allowed to keep every part of the codebase in its idiomatic naming convention.

I didn't miss the point, I disagreed with the point because I think it's a tool problem, not a code problem. I agree with most other points in the article.

boxed
1 replies
5h48m

I think Nim has this?

archargelod
0 replies
2h46m

Nim comes bundled with a `nimgrep` tool [0], that is essentially grep on steroids. It has `-y` flag for style insensitive matching, so "fooBar", "foo_bar" and even "Foo__Ba_R" can be matched with a simple "foobar" pattern.

The other killer feature of nimgrep is that instead of regex, you can use PEG grammar [1]

  [0] - https://nim-lang.github.io/Nim/nimgrep.html
  [1] - https://nim-lang.org/docs/pegs.html

msmolkin
0 replies
3m

Hey, I just created a new tool called Super Grep that does exactly what you described.

I implemented a format-agnostic search that can match patterns across various naming conventions like camelCase, snake_case, PascalCase, kebab-case. If needed, I'll integrate in space-separated words.

I've just published the tool to PyPI, so you can easily install it using pip (`pip install super-grep`), and then you just run it from the command line with `super-grep`. You can let me know if you think there's a smarter name for it.

Source: https://www.github.com/msmolkin/super-grep

Groxx
0 replies
25m

fwiw I pretty frequently use `first.?name` - the odds of it matching something like "FirstSname" are low enough that it's not an issue, and it finds all cases and all common separators in one shot.

abc-1
12 replies
14h41m

A lot of this reads like code search tools could and should be a lot better. They probably will be with AI finding its way into everything. In the old days, people would Hungarian prefix types, but now the IDE mitigates that with color codes.

klodolph
5 replies
14h38m

Do you have some ideas for how to make code search better?

Right now, code search is basically just text search. If you think code search tools “could and should” be a lot better, what kind of improvements are you thinking about? How would those improvements work?

Terr_
2 replies
14h27m

Not OP, but we wouldn't need to worry so much about picking out distinct greppable names if (big if) there were tools that parsed the code to draw out concepts for us, ex:

1. The popular "Find Usages" which varies widely in accuracy and reliability by language, IDE, and codebase meta-quirks.

2. Tools that show Callee/Caller trees, and sometimes possible data-flows between variables.

3. DSLs to search hierarchies, like how XPath lets you find XML elements based on nesting, rather than relying on a distinctly greppable single tag-name for the leaf you're interested in. (e.g. `<Product><Name>` vs `<ProductName>`)

When things go well, the actual variable name no longer needs to restate certain aspects and relationships that can instead be found through metadata.

For example, `GiftCard.purchaser_customer_uuid` is nicely greppable, but you could relax that to `GiftCard.purchaser` if it had a static type of `UUID<Customer>`. Or perhaps you could go to the `Customer.uuid` definition and say "Show me all variables that can populate or be-populated-by this one, up to X steps out, and excluding ones that are function scoped."

That said, I do advocate for "greppability" as a general practice, since I seldom trust that languages, tools, or institutions will come together in a way that makes it unnecessary.

klodolph
0 replies
13h5m

I guess I wasn’t thinking of “find usages”, but as the article points out, it’s hard to find usages if the usages are dynamic.

The solution—to write code which is less dynamic—helps code search and features like find usages.

alexpovel
0 replies
12h10m

Regarding your third point, I put together a tool capable of that to some degree.

It allows you to grep inside source code, but limit the search to e.g. “only docstrings inside class definitions”, among other things. That is, it allows nesting and is syntax aware. That example is for Python, but the tool speaks more languages (thanks to treesitter).

https://github.com/alexpovel/srgn/blob/main/README.md#multip...

dragonwriter
0 replies
13h56m

Right now, code search is basically just text search.

We have lots of code search that is much more syntax-aware than just text search, but it tends to be behind very limited UI, because we have all the tech to do much better code search, but no one has come up with a generally-usable UI for it, so we just have very specific instances -- like "go to definition", "find references" , etc.

That takes all the same technological bits that would be need for, say, "find all definitions of functions visible in the current scope whose name starts with 'ban'" or "find all definitions of int8 constants visible in the current scope"...but what's the UI that makes that kind of searching outside of the kind of special cases now behind their own IDE menu items usable?

abc-1
0 replies
14h26m

Vector embeddings.

ddfs123
5 replies
14h35m

Unless you have syntax-aware grep support, I don't see how searching nested key json could be better. But grep is the default installed. Not to mention ad-hoc languages that does not have any IDE support.

hoherd
0 replies
5h56m

`gron` is so underrated. Usually when I try to show people who useful it is they don't seem to understand how powerful it is. One common use is showing how to customize only one part of a helm chart by checking values of an already installed chart:

    $ helm get values -n $NS $DEPLOYMENT -o json | gron | grep resources | gron -u | json-to-yaml.py
    elasticsearch:
      client:
        resources:
          limits:
            cpu: 3
            memory: 4Gi
          requests:
            cpu: 1
            memory: 2Gi
      data:
        resources:
          limits:
            cpu: 6
            memory: 6Gi
          requests:
            cpu: 200m
            memory: 2Gi
    fluentd:
      resources:
        limits:
          memory: 768Mi
        requests:
          memory: 384Mi
That snip could be provided to another team or a customer as a yaml file that could be included with `helm upgrade -f whatever.yaml`. This is soooo much easier than digging that limited set of data out of the much more detailed data.

abc-1
1 replies
14h27m

If you put a lot of arbitrary constraints to not allow it to be better, sure. Enjoy.

medstrom
0 replies
7h18m

There is no conflict between improving tools and learning how to express your code in such a way that as many tools as possible work better OOTB.

NavinF
0 replies
14h3m

ad-hoc languages

This is self-inflicted.

JoshTriplett
11 replies
14h46m

This is the reason many coding styles and tools (including the Linux kernel coding style and the default Rust style as implemented in rustfmt) do not break string constants across lines even if they're longer than the desired line length: you might see the string in the program's output, and want to search for the same string in the code to find where it gets shown.

knodi123
10 replies
14h30m

My team drives me bonkers with this. They hear the general principle "really long lines of code are bad", but extrapolate it to "no characters shall pass the soft gutter no matter what".

Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines. Now when you see that list of structs, you wonder "why is this one different?" and you have to read carefully to determine, nope, it just contained one longer string. Or god forbid the reformat all the structs to match, turning a 1-page file into 3 pages, and making it so you have to read and understand each element of each struct just to see what's going on.

If I could have written the rule of thumb, I would have said "No logic or control shall happen after the end of the gutter." But if there's a paragraph-long string on one line- who cares?? We all have a single keystroke that can toggle soft-wrap, and the odds that you're going to need to know anything about that string other than "it's a long string" are virtually nil.

Sorry. I got triggered. :-)

edflsafoiewq
2 replies
11h18m

This is world autoformatters have wrought. The central dogma of the autoformatter is that "formatting" is based on dumb syntactic rules with no inflow of imprecise human judgements.

scrollaway
1 replies
11h6m

Most autoformatters do not reformat string constants as GP has said, and even if they did, this is something that can be much more accurately and correctly specified with an AF than with a human.

Autoformatting collectively saves probably close to millions of work hours per year in our industry, and that’s at the current adoption. Do you think it’s productive to manually space things out, clean up missing trailing commas and what not? Machines do it better.

edflsafoiewq
0 replies
10h54m

Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines.

Autoformatters absolutely do this. They do not understand considerations like symmetry.

I am doubtful as to the costs of "somewhere in the codebase there is a missing trailing comma".

arp242
2 replies
13h50m

This is why autoformatters that frob with line endings are just terrible and fundamentally broken.

I'm fairly firmly in the "wrap at 80" camp by the way; but sometimes a tad longer just makes sense. Or shorter for that matter: forced removal of line breaks is just as bad.

jimmaswell
1 replies
11h55m

80 feels really impractically narrow. A project I work on uses 110 because it's approximately the widest you can comfortably compare two revisions on the same monitor, or was for some person at some time, and I can live with it, but any less would just feel so cramped. A few indentation levels deep and I'd be writing newspaper columns.

NotMichaelBay
0 replies
10h42m

There is usually a way to restructure the code so that it doesn't have multiple levels of nested indentation, which is a good practice IMO because it makes the code easier to read.

BigJono
1 replies
14h15m

Yep this triggers the fuck out of me too. It drives me absolutely insane when I'm taking the time and effort to write good test cases that use inline per test data that I've taken the time to format so it's nice and readable for the next person, then the next person comes along, spends 30 seconds writing some 2 line rubbish to hit a code coverage metric, then spends another 60 seconds adding a linter rule that blows all the test data out to 400 lines of unreadable dogshit that uses only the left 15% of screen real estate.

port19
0 replies
9h40m

I routinely spot 3-line prints with the string on its own line in our code. Even for cases where the string + print don't even reach the 80 character "limit"

yas_hmaheshwari
0 replies
14h23m

My team also had a similar thing in place. I am saving this article in my pocket saves, so that I can give "proofs" of why this is better

From Zen of Python: ``` Special cases aren't special enough to break the rules. Although practicality beats purity. ``` https://peps.python.org/pep-0020/

EasyMark
0 replies
5h15m

I have been places where we allow long strings, but other things aren’t allowed and generally 80 to 100 char limits otherwise. I like 100 for c++/java and 80 for C. If it gets much longer than that (not being strings) then it’s time for a rethink in most cases, grouping/scoping symbols are getting too deep. I’m sure other languages may or may not have that as a reasonable argument. It is just a rule of thumb though.

jackphilson
6 replies
14h11m

I wonder - why isn't this talked about more? We have had tens of thousands of software companies, each with probably a dozen people focused on hyperoptimizing everything. Why hasn't this point been talked about more on the internet to the point where it's obvious today? And it's not specifically about this, it's more in general. Do people just learn this on their own, and not say anything? Or is the discussion related to this topic buried in some old forum somewhere?

mrkeen
5 replies
12h16m

It's talked about, just in the opposite direction.

I've left hardcoded strings (think Kafka event type names) in my source for this very reason, but after a round of code review they get squirreled away as constants in separate files because string repetition is bad or something.

jimmaswell
4 replies
11h48m

Without constants, it's too easy to let a typo sneak in or have inconvenience later replacing one "event" but not replacing an unrelated "event". I'll only do it if the string is used two times at most, but usually I'll make a constant the first time and it doesn't feel like any loss.

mrkeen
3 replies
11h35m

Yes, this is exactly what I was fighting against.

If I have three classes that interact with "MyTable", then I can grep for places that interact with "MyTable" and I get back three classes.

After refactoring, the class which now knows about "MyTable" is Constants.java, which has no business knowing about "MyTable". Grepping it now turns up a false-positive and finds 0 of the actual usage sites (3 false-negatives).

philipwhiuk
0 replies
7h48m

`Constants.java` is a massive code-smell (which I have in many projects, but it's still a smell).

The file name is awful.

At worst it should be 'DbConstants' but probably they should be defined elsewhere.

NotMichaelBay
0 replies
10h5m

It's not exactly a false positive. It's just a level of indirection, 1 more search by the constant name to find usages. What you sacrifice there you gain by having the compiler help find typos and the IDE help with autocompletion.

GeneralMayhem
0 replies
10h44m

Sure, but now you have the string constant as a symbol, which you can either grep for (in which case you're delayed by one search, not the end of the world if you were going to unwind callstacks anyway) or, if you have an LSP, you can jump directly from it to users...

dwh452
5 replies
5h24m

This sounds like the advice to prefer the variable name 'ii' over 'i' because you can easily search for it. I loath such advice because it causes the code to become ugly. Similarly, there are 'YODA Conditions' which make code hard to comprehend which solves an insignificant error that is easily caught with tooling. The problem with advice like these is you will encounter deranged developers that become obsessive about such things and make the code base ugly trying to implement dozens of style rules. Code should look good. Making a piece of text look good for other humans to comprehend I consider to be job #1 or #2 for a good developer.

moolcool
0 replies
5h18m

The problem with advice like these is you will encounter deranged developers that become obsessive about such things and make the code base ugly trying to implement dozens of style rules

That's more of a "deranged developer" problem than a problem with the guidelines themselves. E.g. I think his `getTableName` example is quite sensible, but also one which some dogmatic engineers would flag and code-golf down to the one-liner.

marcosdumay
0 replies
3h32m

Those things only make the codebase "ugly" until you learn how to read it.

inetknght
0 replies
4h59m

This sounds like the advice to prefer the variable name 'ii' over 'i' because you can easily search for it

I've never heard of that advice. I honestly like algebraic names (singular digits) as long as they're well documented in a comment or aliasing another longer-name.

there are 'YODA Conditions' which make code hard to comprehend which solves an insignificant error that is easily caught with tooling

Yoda conditions [0] are a useful defensive programming technique and does not reduce readability except to someone new to it. I argue it improves readability, particularly for myself.

As for tooling... it doesn't catch every case for every language.

I loath such advice because it causes the code to become ugly.

Beauty is in the eye of the beholder. While I appreciate your opinion, I also reject it out of hand for professional developers. Instead of deciding whether code is "ugly" perhaps you should decide whether the code is useful. Feel free to keep your pretty code in your personal projects (and show them off so you can highlight how your style really comes together for that one really cool thing you're doing).

you will encounter deranged developers that become obsessive about such things

I don't like being called deranged but I am definitely obsessed about eliminated whole classes of bugs just by the coding design and style not allowing them to happen. If safe code is "ugly" to you... well then I consider myself to be a better developer than you. I'd rather have ugly code that's easily testable instead of pretty code that's difficult to test in isolation which most developers end up writing.

Code should look good. Making a piece of text look good for other humans to comprehend I consider to be job #1 or #2 for a good developer.

It depends on the project. Just remember that what looks good to you isn't what looks good to me. So if it's your personal project, then make it look good! If it's something we're both working on... then expect to defend your stylistic choices with numbers and logic instead of arguments about "pretty".

Then, from the article:

Flat is better than nested

If I'm searching for something in JSON I'm going to use jq [1] instead of grep. Use the right tools for the right job after all. I definitely prefer much richer structured data instead of a flat list of key-value pairs.

[0] https://en.wikipedia.org/wiki/Yoda_conditions

[1] https://en.wikipedia.org/wiki/Jq_(programming_language)

antifa
0 replies
3h46m

the advice to prefer the variable name 'ii' over 'i' because you can easily search for it

\bi\b is the easy way to search for i.

ajuc
0 replies
5h11m

'ii' over 'i'

You don't need to search for local variables, nobody names global variables "i" - so the "ii" advice is pointless.

You often do need to search for places where global stuff is referenced, and while IDEs can help with that - the same things that break grepability often break "find references" in IDE. For example if you dynamically construct function names to call, play with reflections, preproccessor, macros, etc.

So it's a good advice to avoid these things.

you will encounter deranged developers that become obsessive about such things and make the code base ugly

You can abuse any rule, including

Code should look good.

and I'd argue the more general a rule is - the more likely it is to be abused. So I prefer specific rules like "don't construct identifiers dynamically" to general "be good" rules.

dblotsky
5 replies
14h35m

Hard agree with the idea of greppability, but hard disagree about keeping names the same across boundaries.

I think the benefit of having one symbol exist in only one domain (e.g. “user_request” only showing up in the database-handling code, where it’s used 3 times, and not in the UI code, where it might’ve been used 30 times) reduces more cognitive load than is added by searching for 2 symbols instead of 1 common one.

Noumenon72
1 replies
13h47m

Not to mention the readability hit from identifiers like foo.user_request in JavaScript, which triggers both linters and my own sense of language convention.

emn13
0 replies
9h32m

Both of those are easy to fix. You'll adapt quickly if you pick a different convention.

Additionally, I find that in practice such "unusual" code is actually beneficial - it often makes it easy to see at a glance that the code is somehow in sync with some external spec. Especially when it comes to implicit usages such as in (de)serialization, noticing that quickly is quite valuable.

I'd much rather trash every languages' coding conventions than use subtly different names for objects serialized and shared across languages. It's just a pain.

runevault
0 replies
10h8m

Probably depends on how your system is structured. if you know you only want to look in the DB code, hopefully it is either all together or there is something about the folder naming pattern you can take advantage of when saying where to search to limit it.

The upside to doing it this way is it makes your grepping more flexible by allowing you to either only search the one part of the codebase to see say DB code or see all the DB and UI things using the concept.

plorkyeran
0 replies
13h43m

I’ve also found that I sometimes really like when I grep for a symbol and hit some mapping code. Just knowing that some value goes through a specific mapping layer and then is never mentioned again until the spot where it’s read often answers the question I had by itself, while without the mapping code there’d just be no occurrences of the symbol in the current code base and I’d have no clue which external source it’s coming from.

gregjor
0 replies
8h54m

I have mixed thoughts on this too. Fortunately grep (rg in my case) easily handles it:

rg -i ‘foo.?bar’ finds all of foo_bar, fooBar, and FooBar.

amingilani
5 replies
14h8m

I agree that code searchability is a good thing but I disagree with those examples. They intentionally increase the chance of errors.

Maybe there’s an alternative way to achieve what the author set out but increasing searchability at the cost of increasing brittleness isn’t it for me.

In this example:

const getTableName = (addressType: 'shipping' | 'billing') => { return `${addressType}_addresses` }

The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.

const getTableName = (addressType: 'shipping' | 'billing') => { if (addressType === 'shipping') { return 'shipping_addresses' } if (addressType === 'billing') { return 'billing_addresses' } throw new TypeError('addressType must be billing or shipping') }

Similarly, flattening dictionaries for readability introduces the chance of a random typo making our lives hell. A single typo in the repetitions below will be awful.

{ "auth.login.title": "Login", "auth.login.emailLabel": "Email", "auth.login.passwordLabel": "Password", "auth.register.title": "Login", "auth.register.emailLabel": "Email", "auth.register.passwordLabel": "Password", }

Typos aren’t unlikely. In a codebase I work with, we have a perpetually open ticket about how ARTISTS is mistyped as ATRISTS in a similarly flat enum.

The issue can’t be solved easily because the enum is now copied across several codebases. But the ticket has a counter for the number of developers that independently discovered the bug and it’s in the mid two digits.

usrusr
0 replies
13h10m

Entrenched typos like ATRISTS are actually a greppability goldmine. Chances are there are more occurrences of pluralized people who are making art in the codebase, but only ATRISTS is the one from that enum.

I certainly would not suggest deliberately mistyping, but there are places where the benefit is approaching the cost. Certain log messages can absolutely benefit from subtle letter garbling that retains readability while adding uniqueness.

peeters
0 replies
10h49m

The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.

I think it depends on whether the repetition is accidental or intrinsic. Does the table name happen to contain the address type as a prefix, or does it intrinsically have to? Greppability aside, when things are incidentally related, it's often better to repeat yourself to not give the wrong impression that they're intrinsically related. Conversely, if they are intrinsically related (i.e. it's an invariant of the system that the table name starts with the address type as a prefix) then it's better for the code to align with that.

kaelwd
0 replies
13h9m

REFERER moment.

ctxc
0 replies
13h44m

Agree with you.

What happens when translation files get too big and you want to split and send only relevant parts? Like send only auth keys when user is unauthenticated?

`return translations[auth][login]` is no longer possible.

Or just imagine you want to iterate through `auth` keys. _shudders_

Noumenon72
0 replies
13h53m

Typos are find-and-fix-once, while unsearchability is a maintenance burden forever.

I don't think coupling variable names by making sure they contain the same strings is the best way to show they're related, compared to an actual map from address type to table name. There might be a lot of things called 'shipping' in my app, only some of which are coupled to `shipping_addresses`.

Shouldn't a linter be able to catch that there is no enum member called MyEnum.ATRISTS, or is it not an actual enum?

adpirz
5 replies
14h38m

I've seen some pretty wild conditional string interpolation where there were like 3-4 separate phrases that each had a number of different options, something akin to `${a ? 'You' : 'we'} {b ? 'did' : 'will do' } {c ? 'thing' : 'things' }`.

When I was first onboarding to this project, I was tasked with updating a component and simply tried to find three of the words I saw in the UI, and this was before we implemented a straightforward path-based routing system. It took me far too long just to find what I was going to be working on, and that's the day I distinctly remember learning this lesson. I was pretty junior, but I'd later return to this code and threw it all away for a number of easily greppable strings.

ctxc
4 replies
13h40m

Tangential: I love it when UIs say "1 object" and "2 objects". Shows attention to detail.

As opposed to "1 objects" or "1 object(s)". A UI filled with "(s)", ughh

petepete
0 replies
12h47m

Moreso when it's not tripped up by "1 sheeps" or "1 diagnoses".

gnuvince
0 replies
7h49m

I like the more robotic "Objects: 1" or "Objects: 2", since it avoids the pluralization problems entirely (e.g., in French 0 is singular, but in English it's plural; some words have special when pluralized, such as child -> children or attorney general -> attorneys general). And related to this article, it's more greppable/awkable, e.g. `awk /^Objects:/ && $2 > 10`.

ajuc
0 replies
4h53m

Fun fact - I had to localize this kind of logic to my language (Polish). I realized quickly it's fucked up.

This is roughly the logic:

    function strFromNumOfObjects(n) {
      if (n === 1) {
          return "obiekt";
      }
      let last_digit = (n%10);
      let penultimate_digit = Math.trunc((n%100)/10);
      if ((penultimate_digit == 0 || penultimate_digit >= 2) && last_digit > 1 && last_digit <= 4) {
          return "obiekty";
      }
      return "obiektów";
    }
Basically pluralizing words in Polish is a fizz-buzz problem :) In other Slavic languages it should be similar BTW

WalterBright
5 replies
14h17m

That's why D has a cast keyword:

    ubyte c = cast(ubyte)i;
instead of:

    unsigned char c = (unsigned char)i;
Casts are a blunt instrument that subvert the type system, and so they need to be greppable.

Having the cast keyword also removes the grammatical ambiguities in the expression syntax.

jenadine
4 replies
12h59m

Do you often grep for casts? I never do that.

aa-jv
0 replies
11h46m

Try to think about why you might want to do that. It makes a lot of sense, but if you're not doing it, that might be enlightening...

WalterBright
0 replies
11h33m

I regard every cast as a bug in my own code and try to refactor it so there aren't any. I can't get rid of all of them, but they're always worth a second look.

I don't normally grep for them, but others have told me they did.

P.S. one thing about D is you can do things like this:

    ubyte b = i;            // error, losing bits
    ubyte b = cast(ubyte)i; // ugly cast
    ubyte b = i & 0xFF;     // no cast, no error!
It's just one of the nice little details that making programming in D a pleasure.

SnowflakeOnIce
0 replies
5h53m

When doing appsec review in C or C++, yes!

EasyMark
0 replies
5h10m

Honestly I know I don’t do that either. I mean if there was some special case where I remembered “oh yeah I had to cast that variable in this special case”. In general I avoid casting as much as I can in C/C++, but especially in C.

traxys
4 replies
11h16m

I read parts of the Linux kernel source code pretty often, and getting the definition of a function is often pretty involved:

- I don't always know the return code type, as the calling code assigned a field whose definition I don't know to find either

- I don't know if it's a C function or a preprocessor macro

This often results in me searching for the exact function name, and combing through the uses in the drivers. You then need to re-start all that recursively to understand the function you just read.

I could use clangd for that, but I don't have the ressources on my laptop to compile a kernel

dvh
1 replies
11h13m

Why not simply hold Ctrl and click on the name of the function?

GeneralMayhem
0 replies
10h48m

I don't have the resources on my laptop to compile a kernel
gregjor
0 replies
8h52m

ctags?

semiinfinitely
3 replies
13h55m

why two 'p's - grep only has one

mckn1ght
0 replies
13h13m

That’s usually how it’s done with words that end in one consonant when adding a suffix that starts with a vowel, so as not to change the pronunciation of the short vowel in the root word due to english’s rules around long and short vowels. See also map->mapped, bat->batted, tap->tappable etc

leetrout
3 replies
8h19m

I encourage my teams to write logs / output with interpolation with the variables at the end for searchability

For example:

  Added %d users
Vs:

  Added users (%d)

Then it is much easier to track down where things come from without needing wildcards in the search or to care too much about what might be dynamic in cases where its not obvious.

davemp
2 replies
8h11m

I’ve basically landed on the following form: ‘Short description. [foo={}, bar={}]’

Which will give you grepability and in theory parsability so you can automatically bisect for a value change or something along those lines.

medstrom
0 replies
7h23m

I like it, but it may be painful in languages where long variable names are common.

leetrout
0 replies
7h35m

Indeed. That is basically what the logging library from charm bracelet does.

alkonaut
3 replies
10h57m

There are of course cases of dynamic data in every language (The table name is an apt example) but usually when I look in code I just expect to be able to follow definitions. If the language doesn't reliably allow me to find "usages of this type" without risking finding another type with the exact same name then I'm already starting up my static type system compiler for the rewrite.

There are exceptions of course: when searching git logs, comments etc doesn't help what the language or IDE does.

And when searching for an unknown symbol (type, function, variable) you don't know the name of, but you know _should_ look like "DogOrder" or "OrderDog" is a common task too. In this case I'd probably search for " Dog.Order\(" or " Order.Dog\(" if I'm looking for a function. The language trait that enabled it is that method names are Pascal Case and always have an opening ( at the end. But my IDE at least lets me search for members (variables, functions) separate from type names. There should be an index in the IDE though that lets you query this data. E.g. looking for types starting with foo could be done with search t:Foo, instead of having to grep for "(struct|class) Foo" or similar. Tooling is the key.

berkes
2 replies
10h47m

The author uses JavaScript and Python as examples. So I presume they have (most?) experience with dynamic languages.

In static languages, greppability is hardly as much as a factor. Especially with the availability of LSPs and other such tools nowadays.

When I write rust, or Java, I hardly grep, I "go to usages" or "go to definition", "rename symbol" and so on. Similar, but not to that extent, with typescript. But when coding in Javascript, Ruby or Python, no matter how fancy or language-focused an IDE is, I'll be grepping a lot. Decades of Ruby and Rails "black magic" taught me to grep for partial patterns like the author shows, too. Or to just run the code-path entirely (through tests) because the table-definition of the database will change the available methods and behaviour of the code. Yes. I know.

An LSP (or linter, or checker) can only do so much when the available code, methods, classes, behaviour can be changed or added at runtime.

alkonaut
1 replies
10h39m

I'm happy to use dynamic languages occasionally too (Bash, Javascript, Python, ..) but I have a rule of thumb that says if I can't see the entire codebase on one screen, then it's too large for dynamic.

pistoleer
0 replies
7h23m

Would be great if the wider industry shared that view

trey-jones
2 replies
5h3m

As someone who almost exclusively uses grep for finding what I need in codebases that are new to me and old to me, you can make whatever arbitrary rules you want, as long as you're consistent, I'll be pretty happy with it. If syntax is loose in some area (single vs double quotes, parens or braces or none), just do the same thing every time. Whitespace consistency isn't crucial, but it can't hurt (between function name and parens, for example).

necrotic_comp
0 replies
4h45m

Agreed. So long as the code hits performance and business goals, there doesn't need to be an emphasis put on "newness" or any other sort of vanity metric - make the code obvious, searchable, and understandable so that in a time crunch or during an outage it's easy to search and find the culprit.

causal
0 replies
4h9m

I'm also thinking long-context LLMs are going to make this advice seem pretty archaic in a few years. They're so good at reading code and extremely useful for asking questions of a code base.

That said, I completely agree with the author on not using clever string tricks to compose identifiers. That makes code both harder to search and to read.

creesch
2 replies
11h37m

I fully understand the point the author is making. However, I am not going to sacrifice good JSON and make it flat just so someone can search for it more easily. With the example they give, it is still readable because it is a simple data structure. But with more complex data their flat structure to me does not make it easy to parse and easier to make mistakes as well.

smartmic
1 replies
11h18m

It's ofter a matter of having the right tool for the job. In your case, https://github.com/tomnomnom/gron might be useful.

creesch
0 replies
10h42m

Well, I'd say that in the author's case it might be more useful. ;) I never really had the inclination to grep for data like the author does.

I generally work from an IDE anyway, where it is clear that I am working with a value that is part of a JSON object and I can follow it back to the proper structure anyway. In fact, the more I think about it, the more I feel like the article is written for a very specific use case and perspective. Almost to the point where the saying "if all you have is a hammer, everything looks like a nail" is applicable. Where if it doesn't look enough like a nail it should be adjusted to look more like one instead of expanding your toolbox a bit.

arendtio
2 replies
9h35m

I am firmly against the suggested changes. I love grepping through code too (often using -A -B -C), but I also like browsing the code, with tools where you can just click on a function and see its definition.

However, changing how the code should be written so that grepping becomes easier is optimizing for the wrong target. It is much more important that the code is easily readable and maintainable.

In addition, some tools are designed explicitly for grepping through code (from the top of my head ack is an example). If grep doesn't work, one should try a more sophisticated tool instead of using different coding styles.

lucideer
0 replies
9h8m

Greppability is really a proxy metric here - these changes all have other benefits even if you never grep (mostly readability tbh).

    const getTableName = (addressType: 'shipping' | 'billing') => {
        return `${addressType}_addresses`
    }
This is a simplified example but in a longer function, readability of the `return` lines would be improved as the reader wouldn't have to reference the union type (which may or may not be defined in the signature). The rewrite is also safer as it errors out if a runtime `addressType` value doesn't match the union type (above code would not throw an error, just return an indeterminate value which would cause undefined behaviour).

"Flat is better than nested" also greatly improves readability in both examples: either reading the i18n line, or reading the classname at definition / call will be more readable when the name contains full context of function.

gregjor
0 replies
9h30m

Nothing the author wrote would necessarily make code harder to read or maintain. Consistent naming of the same thing throughout, not constructing variables or table names dynamically, etc. benefit both readers/maintainers and searching.

I understood “grepping” to mean ripgrep (rg) or ack, not just plain grep. I think programmers who use command line tools or vim know about those. VSCode uses rg.

RodgerTheGreat
2 replies
14h36m

one of the strangest and most grep-hostile approaches to identifiers that I have ever observed is Nim ignoring both case and underscores in an effort to allow everyone to write code in their preferred style:

https://nim-lang.org/docs/manual.html#partial-caseminusinsen...

planetis
0 replies
4h55m

And it works pretty well, coming from 6+ years of experience. It's not that strange if consider case insensitive filesystems and email addresses. But on the internet you only hear the opinion of the loudest minority.

x3n0ph3n3
1 replies
13h19m

No AI tooling was used in the creation of this article.

That was refreshing.

jimmaswell
0 replies
11h45m

It just doesn't have that genuine artisan smell to it when someone uses ̶a̶ ̶p̶r̶i̶n̶t̶i̶n̶g̶ ̶p̶r̶e̶s̶s̶ a̶ ̶c̶o̶m̶p̶u̶t̶e̶r̶ ̶w̶i̶t̶h̶ ̶a̶u̶t̶o̶m̶a̶t̶i̶c̶ ̶t̶y̶p̶e̶s̶e̶t̶t̶i̶n̶g̶ s̶p̶e̶l̶l̶c̶h̶e̶c̶k̶ W̶i̶k̶i̶p̶e̶d̶i̶a̶ AI to help write their article.

vijucat
1 replies
10h21m

One other thing I'd like to add is greppable comments! In the same vein as TODO and FIXME, I use hashtags in comments to drop hints to future me reading the code. #learning is a universal one:

// #learning: transparent color using color.new(color.white, 100). This is GREAT for hiding plot() lines during inapplicable periods (such as when no trade is on)

But project-specific hashtags are quite useful, too.

// #60within600: bunch API calls to not hit the 60 calls within 10 minutes limit

// This memoizes fn call results to prevent #60within600

The hashtagging was inspired long ago by del.icio.us, if you remember that. https://en.wikipedia.org/wiki/Delicious_(website)

philipwhiuk
0 replies
7h51m

Seems like you're trying to implement a ticketing system in your code to me.

If you need to prevent 60 within 600, write a test.

shahzaibmushtaq
1 replies
11h35m

This reminds me of the good practices and guidelines in coding when I was learning to code, which also includes "proper naming" so you can easily find what you are looking for throughout the codebase.

berkes
0 replies
10h56m

Me too.

But that's also what makes me uncomfortable when reading this article. Proper Naming is truly an "art" of balancing trade-offs.

It takes domain expertise (Ubiquitous language), understanding of the users of the code (other devs, not end-users), and a lifetime of coding f*ups where naming something wrong turned out painful to balance these.

The author gives a nice example of a dynamic table naming. But their refactoring didn't keep the behaviour the same (the else/catch). So it's hard to argue the first is better. And in this case, even without the else/catch, I'd say the latter is better. But there will be cases where greppability is to balanced with readability, testability or refactorability. And in these cases, for me, greppability comes last.

poikroequ
1 replies
5h11m

Grep is nice, but I would much prefer better tools for searching through code. Something that knows how to parse multiple languages and can infer the types of things. Not to mention indexing, for large code bases, grep'ing through possibly millions of lines of code can be awfully slow.

IDEs do a decent job but are typically lacking compared to the raw power of grep.

packetlost
0 replies
5h8m

I mean, I prefer faster symbol, type, etc. based navigation too, but it doesn't work in all scenarios so grep is an extremely handy fallback.

larsrc
1 replies
11h3m

As an avid grepper, I disagree with most of these specific recommendations. Use a tool that actual understands references. Don't make the code harder to read for humans just to please grep.

As for identifiers, use 'foo.?bar' case-insensitively.

medstrom
0 replies
7h15m

Which of the examples are harder to read for humans?

jayd16
1 replies
3h46m

I'm a big proponent of visual scripting (where it makes sense) but you really do miss the text-based tooling like grep.

One trade-off you can make is using text-based serialization so you're at least able to grep the yaml or JSON or whatever and get to the right file at least. This of course costs you some editor load time.

On the flip side you're basically always using an IDE to edit the visual script. In theory symantic search should be possible and built in although reality usually falls short.

Someone in a previous HN thread mentioned the idea of a standard graph syntax. Something that game engines and tools could store their graph-based assets in. If there was a standard syntax then standard tools could be made and we could end up seeing something like a graph grep. One one could imagine a visual studio graph editor type app with plug-in support. Even a standard merge tool would be a huge step up for non-text-based code assets.

A man can dream!

wizzwizz4
0 replies
3h40m

There is a standard graph syntax: Graphviz DOT notation.

eterevsky
1 replies
9h14m

This also applies to dependency injection. While it has significant benefits, it hurts clarity of the code. It becomes more difficult to see where each object is coming from.

mrkeen
0 replies
3h14m

Magical dependency injection frameworks, that is.

Plain old putting-dependencies-in-the-constructor-instead-of-newing-them is great.

If you 'wire' it yourself, you see the top-level structure of the project in main, e.g.

  cache          <- createCache "./cache"
  workQueue      <- createWorkQueue parallelism
  projectFinder  <- createProjectFinder basePath
  gradleBuilder  <- createGradleBuilder cache
  normaliser     <- createNormaliser
  gradleParser   <- createGradleParser normaliser
  relationFinder <- createRelationFinder cache normaliser
At a glance I can see what uses normaliser, and what is used by normaliser.

Timwi
1 replies
5h8m

All of the apostrophes in this article are wrong. The correct character is ’, but this article uses ‘ (open single quote) throughout.

riz_
0 replies
3h13m

Thanks, fixed.

wrsh07
0 replies
7h55m

Nice article. Two notes:

First, some of these suggestions will make it harder to introduce bugs when updating the code. That's good! Particularly tricky is when somebody splits up identifiers or function names. These types of things often occur at boundaries (calls between servers or to the db) which can make them tricky to test. Even if all your identifier combining is initially done in a single file, it's easy for someone to see the final shape of the identifier and accidentally hard code it somewhere else.

Second, In the spirit of Titus Winters' "software engineering is programming over time", a codebase should be greppable over time.

That means that if you rename a function, you might consider saving the old name of the function in a comment.

whirlwin
0 replies
12h3m

Code grepping at build time can be useful.

Grepping at at runtime, if you can call it that, is also very powerful. If you have a binary, either your company or a third party one, but don't have the source code easily available, I have used the `strings` program from GNU binutils which shows tokens in binary code, e.g. hardcoded URLs, credentials and so on. It can also be useful for analyzing certain things in memory.

welder
0 replies
3h6m

Also import traceability

trilbyglens
0 replies
6h31m

Imo this is another big selling point of using tailwinds css. Those log stacks of classes become almost like UUIDs for markup that's discoverable from dev tools.

tlb
0 replies
1h41m

An editor feature I'd like is that as I'm typing an identifier, a hover popup shows me how many other instances appear in my codebase. It should be easy to build a map of identifier->count for instant lookups. I generally know if I want 0 (a new unique identifier), 1, a small number, or a large number. For a few pixels, this would prevent a lot of dumb mistakes and ambiguous names.

t43562
0 replies
8h39m

astgrep is a very useful tool when grep fails: https://ast-grep.github.io/

It's not as easy to use as grep but I think one can script it to be nearly so. It has huge power but without learning it all one can do searches that grep finds difficult. e.g. finding all the locations where a method is called and showing the parameters even if they are on multiple lines. You can then use rewrite rules to do CLI code refactoring.

I think it also has potential in a build toolchain e.g. to look for patterns you want to discourage as a pre-commit hook.

ultragrep - https://github.com/zendesk/ultragrep - I don't love this quite as much but it does have a way to build indexes so you can do fast greps across a big codebase. It also has a text mode UI if you want it and I find that almost worthwhile.

I use ripgrep most of the time but while I like it, there is a limit to how many grep tools I can remember and I should probably cut down to using ultragrep and astgrep.

plain gnu grep itself is something one has to know when one is on an unfamiliar machine.

svennidal
0 replies
5h7m

This approach along with ack, instead of grep, has been a godsend to me.

settsu
0 replies
3h22m

I'm ardently in favor of making code human readable as practically as possible. Personally it follows on my personal Rule #1: Be kind (i.e., in this case, to others and your future self.)

However, searchable =/= greppable.

Flat is better than nested

Context matters but, generally speaking, I would say that flatter is anti-grep.

rezaprima
0 replies
13h5m

I had been bitten by ruby's metaprogramming on this.

recursivecaveat
0 replies
14h31m

A small, but underappreciated benefit of grammar changes like from the form `mytype myfun()` to `keyword myfun() sigil mytype`

ralusek
0 replies
14h40m

Especially in untyped languages, working with an old or unfamiliar codebase, sometimes the only way to know "was anything else using this code" is just to search for the name of a function or whatever.

r34
0 replies
7h14m

Good point. I would refer to another (similar) metric, which could be called "IDE-search-ability): it extends greppability, by adding some more conventions which work well with your (your company's) IDE.

qwertox
0 replies
9h48m

I use greppable strings explicitly, like

  requests.get(f'http://a.b.c.d/wol?device={wol_computer}&grep-id=wake-on-lan', timeout=3)
This way I find `grep-id` in the server logs as a reminder of what to grep for, then `grep-id=wake-on-lan` in the entire codebase to find the actual source of the call.

Or I add comments with a grepable token to the code.

pooriar
0 replies
5h6m

I just shared this in the work Slack and everyone resoundingly agreed with the sentiment. Definitely going to pay more attention to this now, thanks for sharing!

peanut-walrus
0 replies
9h55m

These are all extremely good suggestions. Especially the flattening bit - yes, it's verbose as hell, but it just makes so much sense whenever you have to deal with the code any time after writing it. Helm charts, please take note, the docs even say that "In most cases, flat should be favored over nested.", yet almost every time I have to deal with a Helm chart, it's a mess of nested structures.

nsonha
0 replies
11h18m

is there something like an universal "semantic grep" for code? I think rating code based on some (limitation of a) tool, might not be the best way.

noufalibrahim
0 replies
9h56m

I don't know how to validate this but this seems to be a specific case of "avoiding magic" where there's a lot of dynamically generated variables and things. Having the static text of the program more or less show its intent helps readability and searchability quite a bit.

I suppose the other extreme is to have a program generator with an input spec and you being left to read through the generated code without access to the input spec.

nottorp
0 replies
3h44m

Funny, I started work on a legacy code base a couple months ago and yes, it has all the problems described in the article and that hinders our understanding of it.

nickjj
0 replies
6h56m

Absolutely.

It's what I like about Rails when it comes to file names too. Having controllers/users_controller.rb as a path might sound wasteful because "you're already in the controllers directory, you don't need _controllers in the path".

But when you want to fuzzy find that file, it's really nice to type "users con" and get that file instead of also picking up views, models and other user related files with just a "users" search.

mrb
0 replies
12h3m

I will always remember my professor explaining that greppability is the reason C++ casting operators use a long keyword: static_cast<...> const_cast<...>, etc as you can easily grep for "_cast" or the whole keyword.

moomin
0 replies
9h55m

If you really do want your code to be searchable, here’s a couple of practices I’ve adopted:

1) Eliminate spelling mistakes. Eliminate alternative spellings. UK vs US English? Pick a side and stick to it.

2) Eliminate contractions. Or keep a very short list of allowable ones (We permit “info” for instance.)

The point of this is to increase the predictability of the names you use. If you’ve got “tradeable” and “tradable” in your code base, search for it is going to be a pain. You can supplement these rules with common coding standards like “We call these things providers.” but just getting the spelling consistent is huge.

mashlol
0 replies
14h19m

Greppable commit messages and descriptions are also important, for a similar reason. If you want to learn where a feature exists in the codebase, searching the commits for where it was added is often easier than trying to grep through the codebase to find it. Once you've found the commit or even a nearby commit, it's much easier to find the rest.

mannycalavera42
0 replies
4h56m

if only there was a language where code is data so that... hold on a sec! #LISP-languages

loeg
0 replies
13h26m

Yeah, this is also a benefit of e.g. C identifiers vs C++, where namespace, class, and method/variable can all be listed in separate places, breaking the ability to locate non-unique method/variable names with grep.

ljsprague
0 replies
10h44m

In other words: don't try to be clever?

linuxdude314
0 replies
1h7m

The examples are pretty silly, especially the first one.

If you know that you need either a shipping or billing address and the user has specified which one they need, just query based on that.

There’s no need to introduce a function (getTableName) to detemplate a string or match on a case.

Instead just create a function that gets the item you want from the DB and has the table name as input.

On your UI make sure when the users specifies billing or shipping address the correct parameter is passed to the API.

kmarc
0 replies
10h28m

Some good recommendations in the article.

Greppability is also helpful when you start scripting your editor. Vim has `includeexpr` and co. to implement some "intelligence" when trying to find declarations etc. This enabled me to write a couple line snippet that immediately could resolve Bazel starlark symbols even in "imported" (`load()`) files. At one point I realized I have better code navigation than any of my colleagues using IDEs.

This, and tools like ripgrep really help a lot. This is something that VS Code developers also realized when indlcluded ripgrep itself as their "backend" of searching in files.

klysm
0 replies
5h39m

A good IDE makes up for this if the syntax of the language doesn’t lend itself to easy greppage. I lean heavily on JetBrains search with their editors

jongjong
0 replies
13h8m

This is a great point. One of my pet peeves is seeing an error in the logs which I cannot find in the code for various reasons. Sometimes the error message is constructed in a complicated way with variables concatenated together or the error message is extremely generic and I get matches in 100 different places.

I'm an advocate for the idea that any aspect of a system which communicates either with end users or with sysadmins should be given high exposure in the code base. Typically, this means constructing abstractions in such a way that higher-level business logic and log messages are easily traceable from a single file. I make it so that the business layer sits above all other layers, as close to the program's entry point as possible.

jgrahamc
0 replies
8h8m

In my very first job I wrote a spell checker/corrector for code comments. This was specifically to make greppability possible because some of my colleagues were appalling at spelling and it meant that the incredibly detailed comments we used to write were hard to search for key details.

j45
0 replies
3h28m

Code that is not written for others in the future may have a limited future

indymike
0 replies
7h13m

Greppability adn debugability are two things that I look for in code reviews. If you ask, "How would you debug that?" and the answer stats with, "I'd rewite it to..." Maybe, just maybe you should write it that way.

hugodan
0 replies
7h54m

Try to keep code reference indirections to a tasteful minimum. If a split is needed that’s one more indirection for whoever will be maintaining it in the future. This weight needs to be on the table.

Keeping things referentially transparent helps a lot here.

hilux
0 replies
3h0m

Digital marketers have known this for a long time.

hcfman
0 replies
11h3m

.* is your friend :)

guhcampos
0 replies
4h43m

One situation that comes to mind is configuration of applications on containers using environment variables.

It's extremely valuable to be able to just `grep -r PREFIX_` on a codebase and be able to visualize all possible configuration values for that application.

This is encouraged by some frameworks like Django, where you are expected to list all the configuration values in a `settings` module, but is not standard for `viper`, `click` and `pydantic-settings`, which try to be too smart and auto-generate the variable names for you. It's one of these cases where "modern" frameworks and applications try to save a minuscule amount of work by automating some task, but end up reducing the maintainability of the code over time.

frabjoused
0 replies
5h30m

This is why I’ve always fought against BEM in CSS. Tends to drive greppability to zero.

emblaegh
0 replies
10h23m

Python people should think twice before implementing a `__call__` method if they want to improve greppability.

elijahbenizzy
0 replies
12h59m

Heh, this was very much the design philosophy behind Hamilton (github.com/dagworks-inc/hamilton).

The basic idea was that if you have a data artifact (columns for dataframes initially), you should be able to ctrl-f and find it in your codebase. 1:1 mapping of data -> function.

People take a long time to figure out that the readability gains from having greppability is worth whatever verbosity that comes, largely because they think of code too much as a craft (make it as small/neat as possible) and not documentation for a live process...

eitland
0 replies
7h34m

Working in spring, which accepts I don't know how many formats from ENV_VARS to yaml, this very much resonates with me, because as a general rule, if one can use a certain option, someone will do it.

Also the reason why I try to avoid Gradle when possible:

The possibilities are endless. At one place I think I found 21 wildly different Gradle configs out of 24 that I checked.

(For anyone that wonders, it was combinations of:

- placeholders vs straightforward depency (this is a thing in maven too)

- for loops doing things based on lists or maps instead of just calmly declaring them one after another, maybe to save some characters

- helper functions so you could declare dependencies like azure(<something>(<version>))

- order of declarations

- Kotlin vs Groovy syntax

I have probably forgotten a couple more but this is thankfully already a few years ago.)

ceritium
0 replies
6h3m

I built the command line tool flatito just for the Rails i18n translations keys.

I am unsure if I like the author's approach because there are other cons, but it's a good point.

* https://github.com/ceritium/flatito

breck
0 replies
3h18m

There's a new kind of language where the practice is to use whitespace, and only whitespace, as your syntax. Newlines separate blocks and spaces separate words.

One of the unexpected extremely powerful things this allows is finding function usage extremely easy in any text editor that supports regex. You just search for ^[functionName] . Since you know that function pretty much will only be used at the beginning of lines. You can thus make edits against the AST with regexes and without parsing the AST at all.

It's pretty amazing, and leads to quite faster development, and allows one to tackle bigger and more complex problems.

binary132
0 replies
14h22m

This can even be as simple as not using multi-line error strings, or expanding variables in them.

atoav
0 replies
10h53m

One very simple way to make code less greppable is to use only single leter variables or other short variables that are very likely to be contained in a ton of other words.

assanineass
0 replies
8h4m

Sounds a little erotic…

anordal
0 replies
11h28m

Setting a variable by split identifier is surprisingly common in CMake (because functions can't return a value):

set(${VAR}_VERSION ${VERSION})

This is the main reason I don't like CMake.

ajayvk
0 replies
14h34m

Just spent an hour trying to figure out how a Hugo theme was picking up a shortcode definition. Grep did not help.

Turned out the shortcode name is based on the file name rather than file contents.

advael
0 replies
38m

For my purposes, among the most important

Even a major refactor is relatively easy if you can find stuff in your codebase. Even a small bugfix can get complicated if there's a ton of ambiguity

TeMPOraL
0 replies
10h17m

Grep is indeed a critical tool for navigating and understanding an unfamiliar codebase, but greppability should not be a goal unto itself. The article seems to be making that mistake - it's basically advocating improving greppability at the cost of making the codebase even larger, messier, and harder to read: i.e. reinforcing the problem that makes you reach for grep in the first place. It's a false economy. It's asking you to optimize your code for one specific scenario - trying to figure out where an unfamiliar string comes from; but that isn't the most important or most frequent thing people need to do with code anyway.

(If it is for you, congratulations, you're the janitor in the codebase. It sucks, but that's what you're being paid for. Maintenance is a means, not an end.)

In particular, one of the most important and frequent thing you do with code is read it in order to understand it (locally, at the abstraction level of interest), and the advice from this article compromise it badly - almost as if hoping that, on a greppable enough codebase, you could use grep to avoid reading or thinking entirely.

1. Don't split up identifiers

Don't split them up for the sake of splitting, sure. That's not helping anything. But in the example given, there's likely a good reason for it - for example, it codifies the intended coupling between tables. `billing_address` isn't an independent term in this code, nor are the other `_address` table names. There's a naming pattern there, encoded directly in the initial example. The proposed refactor obscures it and triples the amount of code in the process (all of which is low-value noise) and introduces possibility of making errors (typos, copy-paste) of the kind that isn't picked up by compilers (hope you have good tests!).

FWIW, the author's refactor may be eventually required - if and when the naming pattern in the original code no longer holds. But not before then.

2. Use the same names for things across the stack

Excessive data repackaging is bad, but that tends to be a symptom of having too many layers. A good layer has specific semantics that distinguish it from layers above and below it. This may necessitate renaming some thing, in which case even if such renaming is as trivial as in the example, it should be spelled out explicitly; you can't just return Layer 1 Address object instead of Layer 3 Address object, if the two layers mean something different by "Address"; the triviality of the mapping is incidental and may not hold over time. If it really feels trivial, chances are one of the layers is not necessary in the first place, so go fix that.

3. Flat is better than nested

Now that's just screwing with people, especially wrt. nesting namespaces. It's asking to reintroduce the visual noise that the person reading the code will then have to filter out again mentally.

The way I see it, if you grep for some log message or unrolled identifier and can't find it, you're supposed to keep grepping for parts of the string, until you hit a match. You then go look, and it's usually apparent that you're dealing with a compound identifier or an interpolated string - congratulations, you just learned something important about that part of the legacy codebase, which is the real job you're supposed to be doing.

Shorel
0 replies
4h42m

I use grep and git grep all the time.

This post is very welcome, it sums up my own ideas about grep in a better way.

Mikhail_Edoshin
0 replies
9h26m

Conceptually it is akin to having file names that sort well.

Grep is a simple tool, not too different from a simple string sort. It is better than no tool, but is it better than a tool that understands the notation? A strong side of grep is that it is universal and is not tied to a particular notation. Yet if you could easily define a specific notation and have a tool to immediately understand it, would you still prefer grep?

We tend to organize the code according to the tools we have. E.g. if a tool gives us a list of entities in alphabetic order, we will try to name the entities so that they form “logical” groups. This may pass as a local organizational principle and may be useful but it is always intimately coupled with the underlying tool.

KronisLV
0 replies
40m

I work on a project where people decided to refer to translations by doing the equivalent of:

  :label="$translate(getProductSectionLabel('title'))"
where the logic is a bit like:

  const getProductSectionLabel = (code) => `myapp.sales.sections.products.${code}`
and then the actual values are in a nested structure, like:

  myapp: {
    sales: {
      sections: {
       products: {
         title: "Products"
         ...
       }
       ...
     }
     ...
    }
    ...
  }
People seem to have gone for that because writing that first part is simpler within the component, but I couldn't get across that this makes the codebase harder to navigate.

Meanwhile, my personal codebases are more like:

  :label"$translate('myapp-sales-products-title')"
and the translation file also has the equivalent of:

  myapp-sales-products-title: "Products"
which is way simpler at the expense of some more duplication (easily mitigated by compressing the translations).

IshKebab
0 replies
11h50m

This is why I always recommend avoiding kebab-case as much as possible. You'll eventually need to convert it to snake_case and now you have broken grep. (Nobody is going to remember to use a regex every time.)

HeavyStorm
0 replies
2h27m

Only read the title but if it's what I guess it is, then I finally met someone who'll understand that I always declare functions in js using the keyword function.

Groxx
0 replies
27m

Along similar lines: I highly recommend making every metric and log in your system spelled out completely somewhere.

    Don't:
      base      = "abc"
      something = base + ".some.suffix"
    Do:
      something = "abc.some.suffix"
I've also had some luck with hard-coded UUIDs at call sites, e.g.:

    log.Info("something", "callsite", "DECAFBAD-000...")
because it makes it absolutely trivial to find a log, and unlike caller-lines (which are great! use them too!) it doesn't change when you refactor code.

0x69420
0 replies
7h2m

sure. the reason i put a line break between return type and function name in c-likes is `grep ^fname`. but i seriously wish greppability wasn't important. the extensive line-orientedness of unix tools really puts a damper on the whole hose-of-bytes concept, and it's no wonder by the time of plan 9, there was a strong desire to do away with it—cf. "structural regular expressions", as deployed in sam(1), which, of all the places to put them, certainly has historical irony, as sam's (decidedly not line-oriented) editing language nonetheless descends from ed, the definitive line editor, and gave us such hits as "stream ed" and "simulate typing `g/regex/p` into ed".

just the other week i noticed a change in recommended formatting style in a project i contribute to regularly, and the result was source files got about 20% taller, 20% more of a pain in the ass to edit without some sort of syntax folding. the rationale? diff. making you reach for a syntax-aware editor to compensate for a deficiency in the syntax-awareness of a version control frontend is certainly a choice.

the business end of git as seen by most programmers is in fact diff city, sure, but deep down git is a bunch of snapshots. even deltas behave nothing like diffs. pull up the spec for the pack format and look for the word "line". you will not find it.

things could be so much better, but for now we live in a world where the headline is true.