HN comments for: Greppability is an underrated code metric

db48x

152 replies

14h33m

2024-09-03 03:58:36 UTC

Rust and Javascript and Lisp all get extra points because they put a keyword in front of every function definition. Searching for “fn doTheThing” or “defun do-the-thing” ensures that you find the actual definition. Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in. Some C coding conventions have you split the definition into two lines, first the return type on a line followed by a second line that starts with the function name. It looks ugly, but at least you can search for “^doTheThing” to find just the definition(s).

CGamesPlay

70 replies

14h13m

2024-09-03 04:19:16 UTC

Not JavaScript. Cool kids never write “function” any more, it’s all arrow functions. You can search for const, which will typically work, but not always (could be a let, var, or multi-const intializer).

lispisok

26 replies

14h11m

2024-09-03 04:21:26 UTC

Am I the only one who hates arrow functions?

crabmusket

8 replies

11h45m

2024-09-03 06:47:21 UTC

I don't like using them everywhere, but they're very handy for inline anonymous functions.

But it really pains me when I see

export const foo = () => {}

instead of

export function foo() {}

jappgar

2 replies

7h20m

2024-09-03 11:12:09 UTC

These aren't equivalent as function foo will be hoisted but const foo will not be.

cxr

0 replies

6h51m

2024-09-03 11:40:49 UTC

Sure the food that this restaurant serves is pricey, but you have to remember that it also tastes terrible.

crabmusket

0 replies

4h16m

2024-09-03 14:16:01 UTC

Yep, and that usually doesn't matter at the top level.

berkes

1 replies

11h6m

2024-09-03 07:26:07 UTC

I wish javascript had a built-in or at least (defacto) default linter. Like go-fmt or rust fmt. Or clippy even.

One that could enforce these styles. Because not only is the export const foo = () {}

painful on itself, it will quite certainly get intermixed with the

function foo() {}

and then in the next library a

const foo = function() {}

and so on. I'd rather have a consistently irritating style, than this willy-nilly yolo style that the JS community seems to embrace.

throwitaway1123

0 replies

10h16m

2024-09-03 08:15:38 UTC

ESLint and Prettier are the de facto default linter/formatter combo in JS. There are rules you can enable to enforce your preferred style of function [1][2].

[1] https://eslint.org/docs/latest/rules/func-style

[2] https://eslint.org/docs/latest/rules/prefer-arrow-callback

NohatCoder

1 replies

6h5m

2024-09-03 12:27:24 UTC

But do they make much of a difference? You have always been able to write:

    myArray.sort(function(a,b){return a-b})

People for some reason treat this syntactic sugar like it gives them some new fundamental ability.

marcosdumay

0 replies

3h48m

2024-09-03 14:44:28 UTC

Oh Javascript would be much better if it could only be syntactic sugar...

`function(a,b){return a-b;}` is different from `(a,b) => a - b`

And `function diff(a,b) {return a-b;}` is different from `const diff(a,b) => a - b;`.

creesch

0 replies

11h40m

2024-09-03 06:52:07 UTC

Thank you, that's something I also never have understood myself. For inline anonymous functions like callbacks they make perfect sense. As long as you don't need `this`.

But everywhere else they reduce readability of the code with no tangible benefit I am aware of.

spartanatreyu

7 replies

13h48m

2024-09-03 04:44:02 UTC

I did, until I used them enough where I saw where they were useful.

The bad examples of arrow functions I saw initially were of:

1. Devs trying to mix them in with OOP code as a bandaid over OOP headahes (e.g. bind/this) instead of just not using OOP in the first place.

2. Devs trying to stick functional programming everywhere because they had seen a trivial example where a `.map()` made more semantic sense than a for/for-in/for-of loop. Despite the fact that for/for-in/for-of loops were easier to read for anything non-trivial and also had better performance because you had access to the `break`, `continue` and `return` keywords.

mewpmewp2

4 replies

10h59m

2024-09-03 07:33:31 UTC

Another benefit of using for instead of array fns is that it is easy to add await keyword should the fn become async.

But many teams will have it as a rule to always use array fns.

jappgar

3 replies

7h22m

2024-09-03 11:09:42 UTC

That gives you have the option of making it serially async but not parallel, which can be achieved easily using Promise.all in either scenario.

adregan

2 replies

5h22m

2024-09-03 13:10:15 UTC

As an aside: It’s way less ergonomic, but you likely want `Promise.allSettled` rather than `Promise.all` as the first promise that throws aborts the rest.

wruza

1 replies

3h57m

2024-09-03 14:35:03 UTC

It doesn’t really abort the rest, it just prioritizes the selection of a first catch-path as a current continuation. The rest is still thenable, and there’s no “abort promise” operation in general. There are abort signals, but it’s up to an async process to accept a signal and decide when/whether to check it.

adregan

0 replies

1h12m

2024-09-03 17:19:41 UTC

Admittedly, I was being a bit hand-wavy and describing a bit more of how it feels rather than the way it is (I'm perpetually annoyed that promises can't be cancelled), but I was thinking of the code I've seen many times across many code bases:

    let results;
    try {
      results = await Promise.all(vals.map(someAsyncOp))
    } catch (err) {
      console.error(err)
    }

While you could pull that promises mapping into a variable and keep it thenable, 99% of the time I see the above instead. Promises have some rough edges because they are stateful, so I think it might be easier to recommend swapping that Promise.all for an Promise.allSettled, and using a shared utility for parsing the promise result.

I consider this issue akin to the relationship between `sort`, `reverse`, `splice`, the mutating operation APIs, and their non mutating counterparts `toSorted`, `toReversed`, `toSpliced`. Promise.all is kind of the mutating version of allSettled.

throwaway2037

1 replies

8h59m

2024-09-03 09:33:35 UTC

    > also had better performance because you had access to the `break`, `continue` and `return` keywords.

This is a great point.

One more: Debugging `.map()` is also much harder than a for loop.

medstrom

0 replies

7h36m

2024-09-03 10:56:25 UTC

I feel there are a few ways to invoke .map() in a readable way and many ways that make the code flow needlessly indirect.

Should be a judgment call, and the author needs to be used to doing both looping and mapping constructs, so that they are unafraid of the bit of extra typing needed for the loop.

turboponyy

4 replies

11h55m

2024-09-03 06:37:22 UTC

I like them because it reinforces the idea that functions are just values like any other - having a separate keyword feels like it is inconsistent.

mewpmewp2

2 replies

10h57m

2024-09-03 07:35:01 UTC

Why do you want to reinforce that idea?

To me arrow functions mostly just decrease readability and makes them blend in too much, when it should be important distinction what is a function and what is not.

turboponyy

0 replies

9h35m

2024-09-03 08:57:07 UTC

Not to be dismissive, but because I like it - it just sits right with me.

benrutter

0 replies

16m

2024-09-03 18:16:32 UTC

I'm not a javascript programmer, but I really like the arrow pattern from a distance exactly because it enforces that idea.

My experience is that newcomers are often thrown off and confused by higher order functions. I think partly because, well let's be honest they just are more confusing than normal functions, but I think it's also because languages often bind functions differently from everything else.

`const cool = () => 5`

Makes it obvious and transparent, that `cool' is just a variable where as:

`function cool() {return 5}`

looks very different from other variable bindings.

0xfffafaCrash

0 replies

11h23m

2024-09-03 07:09:27 UTC

Moreover the binding and lexical scope aspects supported by classic functions are amongst the worst aspects of the language.

Arrow functions are also far more concise and ergonomic when working with higher order functions or simple expressions

The main thing to be wary of with arrow functions is when they are used anonymously inline without it being clear what the function is doing at a glance. That and Error stack traces but the latter is exacerbated by there being no actual standard regarding Error.prototype.stack

nsonha

0 replies

11h16m

2024-09-03 07:15:37 UTC

why the need to pronounce arbitrary preferences, who cares?

nosianu

0 replies

11h59m

2024-09-03 06:32:43 UTC

A simple heuristic I use is to use arrow functions for inline function arguments, and named "function" functions for all others.

One reason is exactly what the subject of discussion is here, it's easier to string-search with that keyword in front of the name, but I don't need that for trivial inline functions (whenever I do I make it an actual function that I declare normally and not inline).

Then there's the different handling of "this", depending on how you write your code this may be an important reason to use an arrow function in some places.

ndnxncjdj

0 replies

13h12m

2024-09-03 05:19:36 UTC

I very much prefer the way scoping is handled in arrow functions.

ajuc

0 replies

7h15m

2024-09-03 11:17:25 UTC

I'm of the opinion that giving a global name to an anonymous function should result in a compilation error.

supriyo-biswas

19 replies

14h10m

2024-09-03 04:22:10 UTC

You can still search for `<keyword> = $.*$ => `, albeit it's a bit cumbersome.

troupo

16 replies

12h23m

2024-09-03 06:08:47 UTC

All you need is a tool that actually understands the language.

It's 2024 and HN still suggests using regular expressions to search through a code base.

lukan

9 replies

10h46m

2024-09-03 07:45:59 UTC

Regex is a universal tool.

Your special tool might not work on plattform X, fails for edge case - and you generally don't know how it works. With regex or simple string search - I am in control. And can understand why results show up, or investigate when they don't, but should.

troupo

8 replies

10h15m

2024-09-03 08:17:11 UTC

Your special tool might not work on plattform X

As always, people come out with the weirdest of excuses to not use actual tools in the 99.9999% of the cases when they are available, and work.

When that tools doesn't work, or isn't sufficient, use another one like fuzzy text search or regexps.

and you generally don't know how it works.

Do you know how your stove works? Or do you truly understand what the device you're typing this comment on truly works?

Only in programming I see people deliberately avoid useful tools because <some fringe edge case that comes up once in a millenium in their daily work>

lukan

3 replies

9h20m

2024-09-03 09:11:43 UTC

When you specialize in one thing only, do what you want.

But I prefer tools, that I can use wherever I go. To not be dependant and chained to that environment.

"Do you know how your stove works? Or do you truly understand what the device you're typing this comment on truly works?"

Also yes, I do.

" people deliberately avoid useful tools because <some fringe edge case that comes up once in a millenium in their daily work>"

Well, or I did already changed tools often enough, to be fed up with it and rather invest in tech that does not loose its value in the next iteration of the innovation cycle.

troupo

2 replies

7h31m

2024-09-03 11:01:32 UTC

When you specialize in one thing only, do what you want.

I specialize in one thing only: programming

But I prefer tools, that I can use wherever I go.

Do you always walk everywhere, or do you use a tool available at the time, like cars, planes, bycicles, public transport?

rather invest in tech that does not loose its value in the next iteration of the innovation cycle.

Things like "fund symbol", "find usages", "find implementation" have been available in actual tools for close to two decades now.

lukan

1 replies

6h46m

2024-09-03 11:46:16 UTC

I did not say I do not use what is avaiable, but this debate is about in general having your code in a shape that simply searching for strings work.

troupo

0 replies

3h8m

2024-09-03 15:24:33 UTC

Simply searching for strings rarely works well as the codebase grows larger. Because besides knowing where all things named X are, you want to actually see where X is used, or where it's called from, or where it is defined.

With search you end up grepping the code twice:

- first grepping for the name

We're literally in a thread where people invent regexes for how to search the same thing (a function) defined in two different ways (as a function or as a const)

- secondly, manually grepping through search results deducing if it's relevant to what you're looking for

It becomes significantly worse if you want to include third-party libs in your search.

There are countless times when I would just Cmd+B/Cmd+Click a symbol in IDEA and continue my exploration down to Java's own libraries. There are next to zero cases when IDEA would fail to recognise a function and find its usages if it was defined as a const, not as a function. Why would I willingly deny myself these tools as so many in this thread do?

wruza

2 replies

3h15m

2024-09-03 15:16:37 UTC

It’s you who sees it as excuses. If I have a screwdriver multitool, I don’t need another one which is for d10 only. It simply creates unnecessary clutter in a toolbox. The difference between definition and mention search for a function is:

  gr<bs><bs>ion name<cr>
  vs
  grname<cr>

or for the current identifier, simply

  gr<m-w><cr>

I could even make my own useful tools like “\[fvm]gr” for function, variable or field search and brag about it watching miserable ide guys from the high balcony, but ain’t that unnecessary as well.

troupo

1 replies

3h12m

2024-09-03 15:20:16 UTC

It simply creates unnecessary clutter in a toolbox.

And then you proceed to... invent several pale imitations of a symbol/usages search.

More here: https://news.ycombinator.com/item?id=41435862 so as not to repeat myself

wruza

0 replies

2h54m

2024-09-03 15:37:55 UTC

Doesn’t really apply, ignores things just said.

kragen

0 replies

7h28m

2024-09-03 11:04:28 UTC

if you think anything works in 99.9999% of cases, you’ve never programmed a computer

wruza

1 replies

3h33m

2024-09-03 14:58:52 UTC

Its current year and IDEs still can’t remember how I just transformed the snippet of code and suggest to transform the rest of the files in the same way. All they can do in “refactor” menu is only “rename” and then some extract/etc nonsense which no one uses irl.

By using regexps I have an experience that opens many doors, and the fact that they aren’t automatic could make me sad, if only these doors weren’t completely shut without that experience.

troupo

0 replies

3h0m

2024-09-03 15:32:17 UTC

No one is stopping you from using regexps in IDEs.

And you somehow manage to undersell the rename functionality in an IDE. And I've used move/extract functionality multiple times.

I do however agree that applicable transformations (like upgrading to new syntaxes, or ways of doing stuff as languages evolve) could be applied wholesale to large chunks of code.

throwaway2037

1 replies

8h56m

2024-09-03 09:36:16 UTC

Not to move the goal posts too much, but when I am searching a huge Python or Java code base from IntelliJ, I use a mixture of symbol and text search. One good thing about text search, you get hits from comments.

troupo

0 replies

7h30m

2024-09-03 11:02:33 UTC

Yup. I do, too.

I'm mostly ranting against this weird "we will never use great tools because full-text search" obsession

renox

1 replies

7h47m

2024-09-03 10:45:24 UTC

The thing is: In large codebase the tool may become slow or crash, in a new language you may not have such tool.. Grep is far more robust!

troupo

0 replies

3h6m

2024-09-03 15:26:16 UTC

When tools don't work or unsuitable, you use different tools.

And yet people are obsessed with never using useful tools in the first place because they can invent scenarios when this tool doesn't work. Even if these scenarios might never actually come up in their daily work.

post-it

1 replies

13h51m

2024-09-03 04:40:51 UTC

All you need is `<keyword> =`

Really, all you need is `<keyword>` and if the first result is a call to that function, just jump to its definition.

spartanatreyu

0 replies

13h45m

2024-09-03 04:46:44 UTC

Exactly.

Just search the definition.

Any time that a function doesn't have a definition, it's never the target of a search anyway.

pjerem

17 replies

11h18m

2024-09-03 07:14:05 UTC

Yes but that’s an anti pattern. Arrow functions aren’t there to look cool, they’re how you define lambdas / anonymous functions.

Other than that, functions should be defined by the keyword.

wiseowise

16 replies

11h5m

2024-09-03 07:26:50 UTC

How is that an anti-pattern?

Other than that, functions should be defined by the keyword.

Says who?

hansworst

12 replies

10h4m

2024-09-03 08:28:01 UTC

Anonymous functions don't have names. This makes it much harder to do things like profiling (just try to find that one specific arrow function in your performance profile flame graph) and tracing. Tools like Sentry that automatically log stack traces when errors occur become much less useful if every function is anonymous.

medstrom

10 replies

8h25m

2024-09-03 10:07:33 UTC

    const foo = () => {}

This function is not anonymous, it's called foo.

croes

4 replies

7h19m

2024-09-03 11:13:30 UTC

But to call foo in bar you must define foo before bar.

function foo(){} is also callable if bar is defined before foo.

sestep

3 replies

6h0m

2024-09-03 12:31:56 UTC

Not true at the top-level.

wruza

2 replies

4h6m

2024-09-03 14:25:52 UTC

Not sure what you find not true about it. All named “function”s get hoisted just like “var”s, I use post-definitions of utility functions all the time in file scopes, function scopes, after return statements, everywhere. You’re probably thinking about

  const foo = function (){}

without its own name before (). These behave like expressions and cannot be hoisted.

MetaWhirledPeas

1 replies

3h37m

2024-09-03 14:55:00 UTC

I use post-definitions of utility functions all the time in file scopes, function scopes, after return statements, everywhere

I haven't figured out if people consider this a best practice, but I love doing it. To me the list of called functions is a high-level explanation of the code, and listing all the definitions first just buries the high-level logic "below the fold". Immediately diving into function contents outside of their broader context is confusing to me.

wruza

0 replies

3h3m

2024-09-03 15:29:03 UTC

I don’t monitor “best” practices, so beware. But in languages like C and Pascal I also had a habit of simply declaring all interfaces at the top and then grouping implementations reasonably. It also created a nice “index” of what’s in the file.

Hoisting also enables cross-imports without helper unit extraction headaches. Many hate js/ts at the “kids hate == and null” level but in reality these languages have a very practical design that wins so many rounds irl.

mrighele

2 replies

6h1m

2024-09-03 12:31:21 UTC

Interesting, it seems that the javascript runtime is smart enough detect this pattern and actually create a named function (I tried Chrome and Node.js)

    const foo = () => {}
    console.log( foo.name );

actually outputs 'foo', and not the empty string that I was expecting.

   const test = () => ( () => {} );
   const foo = test();
   console.log( foo.name );

outputs the empty string.

Is this behavior required by the standard ?

svieira

0 replies

5h31m

2024-09-03 13:00:37 UTC

Yes, in great detail. https://tc39.es/ecma262/multipage/ordinary-and-exotic-object... is the specification, and for the TL;DR https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... is pretty good.

Izkata

0 replies

1h28m

2024-09-03 17:04:33 UTC

You're probably remembering how it used to work. This is the example I remember from way back that we shouldn't use because (aside from being unnecessary and weird) this function wouldn't have a name in stack traces:

  var foo = function() {};

Except nowadays it too does have the name "foo".

mapcars

0 replies

5h1m

2024-09-03 13:31:04 UTC

Not really, its an anonymous function stored in a variable foo

BlarfMcFlarf

0 replies

7h23m

2024-09-03 11:09:07 UTC

Does the function know it’s called foo for tracing/error logging/etc?

mostlylikeable

0 replies

6h5m

2024-09-03 12:27:21 UTC

To me, arrow functions behave more like I would expect functions to behave. They don’t include all the magic bindings that the function keyword imparts. Feels more “pure” to me. Anonymous functions can be either function () {} or () => {}

lukan

1 replies

10h50m

2024-09-03 07:41:52 UTC

All the wise ones. Well, except for you maybe.

Serious arguments would be:

- readability

- greppability

lukan

0 replies

1h23m

2024-09-03 17:09:04 UTC

(It wasn't an insult, but a joke on the username)

tylerhou

0 replies

10h37m

2024-09-03 07:55:22 UTC

As of a few years ago (not sure about now) the backtrace frame info for anonymous functions were far worse than ones defined via the function keyword with a name.

spartanatreyu

2 replies

14h3m

2024-09-03 04:29:34 UTC

Yes JavaScript.

You can search for both: "function" and "=>" to find all function expressions and arrow function expressions.

All named functions are easily searchable.

All anonymous functions are throw away functions that are only called in one place so you don't need to search for them in the first place.

As soon as an anonymous function becomes important enough to receive a label (i.e. assigning it to a variable, being assigned to a parameter, converting to function expression), it has also become searchable by that label too.

CGamesPlay

1 replies

13h0m

2024-09-03 05:31:40 UTC

The => is after the param spec, so you’re searching for foo.*=> or something more complex, but then still missing multiline signatures. This is very easy to get caught by in TypeScript, and also happens when dealing with higher-order functions (quite common in React).

spartanatreyu

0 replies

12h9m

2024-09-03 06:23:06 UTC

Why are you searching for foo.=>

Are you searching through every function, or functions that have a very specific parameter?

And whatever you picked, why?

---------------------------------------------------------------

- If you're searching for every function, then there's no need to search for foo.

=>, you only need to search for function and =>.

- If you're searching for a specific parameter, then just search for the parameter. Searching for functions is redundant.

---------------------------------------------------------------

Arrow function expressions and function expressions can both be named or anonymous.

Introducing arrow functions didn't suddenly make JavaScript unsearchable.

JavaScript supported anonymous functions before arrow function expressions were introduced.

Anonymous functions can only ever be:

- run on the spot

- thrown away

- or passed around after they've been given a label

Which means, whenever you actually want to search for something, it's going to be labelled.

So search for the label.

albedoa

0 replies

10h49m

2024-09-03 07:42:45 UTC

I want to talk to the developer who considers greppability when deciding whether to use the "function" keyword but requires his definitions to be greppable by distancing them from their call locations. I just have a few questions for him.

Pxtl

0 replies

2h58m

2024-09-03 15:34:19 UTC

You can still look for `(funcname)\s*=` can't you? I mean it's not like functions get re-declared a lot.

koito17

30 replies

13h34m

2024-09-03 04:57:50 UTC

Golang has a similar property as a side-effect of the following design decision.

  ... the language has been designed to be easy to analyze and can be parsed without a symbol table

Taken from https://go.dev/doc/faq

The "top-level declarations" in source files are exactly: package, import, const, var, type, func. Nothing else. If you're searching for a function, it's always going to start with "func", even if it's an anonymous function. Searching for methods implemented by a struct similarly only needs one to know the "func" keyword and the name of the struct.

Coming from a background of mostly Clojure, Common Lisp, and TypeScript, the "greppability" of Go code is by far the best I have seen.

Of course, in any language, Go included, it's always better to rely on static analysis tools (like the IDE or LSP server) to find references, definitions, etc. But when searching code of some open source library, I always resort to ripgrep rather than setting up a development environment, unless I found something that I want to patch (which in case I set up the devlopment environment and rely on LSP instead of grep to discover definitions and references).

eptcyka

18 replies

11h9m

2024-09-03 07:23:00 UTC

Golang gets zero points from me because function receivers are declared between func and the name of the function. God ai hate this design choice and boy am I glad I can use golsp.

medstrom

11 replies

8h28m

2024-09-03 10:04:31 UTC

Is it just hard to get used to, or does it fundamentally make something more difficult?

kragen

5 replies

7h32m

2024-09-03 10:59:36 UTC

this thread is about using `grep` to find things, and this subthread is specifically about how the `func` keyword in golang makes it easy to distinguish the definition of a function from its uses, so yes, because `grep 'func lart('` will not find definitions of `lart` as a method. you might end up with something like `grep 'func .*) *lart('` which is both imprecise and enough noise that you will not want to type it; you'll have to can it in a script, with the associated losses of flexibility and transparency

medstrom

4 replies

7h10m

2024-09-03 11:21:58 UTC

That's fair, I see many examples in this thread where people pass an exact string directly to grep, as you do. I'm an avid grepper, but my grep tool [1] translates spaces to ".*?", so I would just type "func lart(" in that example and it would work.

An incremental grep tool with just this one transformation rule gets you a lot more mileage out of grep.

[1] https://github.com/minad/consult/blob/screenshots/consult-li...

EDIT: Better demo https://jumpshare.com/s/zMENBSr2LwwauJVjo1wS

kragen

3 replies

6h59m

2024-09-03 11:33:14 UTC

that's going to find all the functions that take an argument named lart or of a lart type too, but it also sounds like a thing i really want to try

vitus

2 replies

6h42m

2024-09-03 11:50:03 UTC

Also, anything that contains "func" and "lart" as a substring, e.g. foobar(function), blart(baz).

It's not far off from my manually-constructed patterns when I want to make sure I find a function definition (and am willing to tolerate some false positives), but I personally prefer fine-grained control over when it's in use.

medstrom

1 replies

6h27m

2024-09-03 12:04:38 UTC

Mmh, I type "func\ lart(" when I need the literal string. But it's less often, so it's fair that it's slightly more to type.

kragen

0 replies

6h19m

2024-09-03 12:12:52 UTC

yeah!

eptcyka

3 replies

7h2m

2024-09-03 11:30:17 UTC

I have to always add wildcards between func and the function name, because I can never know how the other developer has decided to specify the name of the receiver. This will always be a problem as far as grepping with primitive tools that don't parse the language.

medstrom

2 replies

6h14m

2024-09-03 12:18:32 UTC

FYI, many people use thin wrappers like this, it's still a primitive tool that doesn't parse the language, but it can handle that problem: https://jumpshare.com/s/zMENBSr2LwwauJVjo1wS (GIF)

eptcyka

1 replies

5h59m

2024-09-03 12:33:19 UTC

On machines where I control the tooling, this is not an issue. But I can’t take my config to my colleagues machine.

lanstin

0 replies

3h42m

2024-09-03 14:49:37 UTC

If only AFS had succeeded. What would a modern version of this look like?

ljm

0 replies

8h4m

2024-09-03 10:28:34 UTC

Can’t say I’ve ever had an issue with it, but it does get a bit wild when you have a function signature that takes a function and returns one, unless you clear it up with some types.

  func (s *Recv) foo(fn func(x any) err) func bar(y any) (*Recv, err)

As an exaggerated example. Easy to parse but not always easy to read at a glance.

kazinator

3 replies

3h30m

2024-09-03 15:02:34 UTC

Receivers are utterly idiotic. Like how could anyone with two working brain cells sign off on something like that?

If you don't want OOP in the language, but want people to be able to write thing.function(arg), you just make function(thing, arg) and thing.function(arg) equivalent syntax.

Pxtl

2 replies

3h1m

2024-09-03 15:31:28 UTC

C# did this for extension methods and it Just Works. You just add the "this" keyword to a function in a pure-static class and you get method-like calling on the first param of that function.

kazinator

1 replies

2h54m

2024-09-03 15:37:48 UTC

If the function has to be modified in any way in order to grant permission to be used that way, then it is not quite "did this".

Equivalent means that there is no difference at the AST level between o.f(a) and f(o, a), like there is no difference in C among (a + i), a[i], i[a] and (i + a).

However, a this keyword is way better than making the programmers fraction off a parameter and move it to the other side of the function name.

tczMUFlmoNk

0 replies

47m

2024-09-03 17:45:10 UTC

A search term here is "Uniform Function Call Syntax", as present in (e.g.) D:

https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax

sethammons

0 replies

6h44m

2024-09-03 11:48:07 UTC

I search ") myFunc" to find member functions. It would be nice to search "c myFunc", but a parentheses works

executesorder66

0 replies

4h15m

2024-09-03 14:16:40 UTC

How many God AI's have expressed their hate for this design? /s

madeofpalk

8 replies

9h50m

2024-09-03 08:41:48 UTC

The culture of single letter variables in golang, at least in the codebases I've seen, undoes this.

lelanthran

5 replies

9h5m

2024-09-03 09:27:04 UTC

The culture of single letter variables in golang, at least in the codebases I've seen, undoes this.

The convention, not just in Go, is that the smaller the scope, the smaller the variable reference.

So, sure, you're going to see single-letter variables in short functions, inside short block scopes, etc, but that is true of almost any language.

I haven't seen single-letter variables in Go that are in a scope that isn't short.

Of course, this could just mean that I haven't seen enough of other peoples Go source.

kazinator

1 replies

3h9m

2024-09-03 15:22:42 UTC

E.g. food and art are very important in Japan, so stomach is i and a drawing/painting is e.

BrandoElFollito

0 replies

1h10m

2024-09-03 17:22:21 UTC

Food is very very important in France so we call it nourriture :)

marcosdumay

0 replies

3h55m

2024-09-03 14:37:00 UTC

that is true of almost any language

You'd be surprised how often language-local cultures break that rule on either side. And a few times it's even an improvement.

lanstin

0 replies

3h45m

2024-09-03 14:47:08 UTC

Zipf's law, right - these rules are a formalization of our brain's functionality with language.

Of course, with enough code, someone does everything.

iudqnolq

0 replies

8h3m

2024-09-03 10:28:53 UTC

I like using l for logger and db for database client/pool/handle even if there's a wider scope. And if the bulk of a file is interacting with a single client I might call that c.

alienchow

0 replies

7h47m

2024-09-03 10:44:38 UTC

Single letter variables in Golang are to be used in small, local contexts. Akin to the throwaway i var in for loops. You only grep the struct methods, the same way no one greps 'this' or 'self'.

The code bases you've been reading, and even some of the native libraries, don't do it properly. Probably due to legacy reasons that wouldn't pass readability approvals nowadays.

VonGallifrey

0 replies

9h30m

2024-09-03 09:02:33 UTC

The way I have seen this is that single letter variables are mostly used when declaration and (all) usages are very close together.

If I see a loop with i or k, v then I can be fairly confident that those are an Index or a Key Value pair. Also I probably don't need to grep them since everything interacting with these variables is probably already on my screen.

Everything that has a wider scope or which would be unclear with a single letter is named with a more descriptive name.

Of course this is highly dependent on the people you work with, but this is the way it works on projects I have worked on.

vitus

1 replies

6h50m

2024-09-03 11:41:43 UTC

I'm not so sure about greppability in the context of Go. At least at Google (where Go originates, and whose style guide presumably has strong influence on other organizations' use of the language), we discourage "stuttering":

A piece of Go source code should avoid unnecessary repetition. One common source of this is repetitive names, which often include unnecessary words or repeat their context or type. Code itself can also be unnecessarily repetitive if the same or a similar code segment appears multiple times in close proximity.

https://google.github.io/styleguide/go/decisions#repetitive-...

This is the style rule that motivates the sibling comment about method names being split between method and receiver, for what it's worth.

I don't think this use case has received much attention internally, since it's fairly rare at Google to use grep directly to navigate code. As you suggest, it's much more common to either use your IDE with LSP integration, or Code Search (which you can get a sense of via Chromium's public repository, e.g. https://source.chromium.org/search?q=v8&sq=&ss=chromium%2Fch...).

klodolph

0 replies

30m

2024-09-03 18:02:08 UTC

The thing about stuttering is that the first part of the name is fixed anyway, MOST of the time.

If you want to search for `url.Parse`, you can find most of the usages just by searching for `url.Parse`, because the package will generally be imported as `url` (and you won’t import Parse into your namespace).

It’s not as good as find references via LSP but it is like 99% accurate and works with just grep.

zarzavat

9 replies

11h39m

2024-09-03 06:53:26 UTC

C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {. In order to get a complete list of symbols you need to preprocess it, this requires knowing what the compiler flags are. Nobody really knows what their compiler flags are because they are hidden between multiple levels of indirection and a variety of build systems.

skissane

6 replies

8h8m

2024-09-03 10:24:14 UTC

C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {.

That’s not really C; that’s a C-based DSL. The same problem exists with Lisp, except even worse, since its preprocessor is much more powerful, and hence encourages DSL-creation much more than C does. But in fact, it can happen with any language - even if a language lacks any built-in processor or macro facility, you can always build a custom one, or use a general purpose macro processor such as M4.

If you are creating a DSL, you need to create custom tooling to go along with it - ideal scenario, your tools are so customisable that supporting a DSL is more about configuration than coding something from scratch.

kragen

4 replies

7h30m

2024-09-03 11:01:47 UTC

the issue is that the c preprocessor is always available and usually used

skissane

3 replies

6h54m

2024-09-03 11:38:28 UTC

Other languages have preprocessors or macro facilities too.

C's is very weak. Languages with more powerful preprocessors/macros than C's include many Lisp dialects, Rust, and PL/I. If you think everyone using a weak preprocessor is bad, wait until you see what people will do when you give them a powerful one.

Microfocus COBOL has an API for writing custom COBOL preprocessors in COBOL (the Integrated Preprocessor Interface). (Or some other language, if you insist.) I bet there are some bizarre abominations hidden in the bowels of various enterprises based on that ("our business doesn't just run on COBOL, it runs on our own custom dialect of COBOL!")

kragen

2 replies

6h31m

2024-09-03 12:00:44 UTC

c's macro system is weak on purpose, based on, i suspect, bad experiences with m6 and m4. i think they thought it was easier to debug things like ratfor, tmg, lex, and (much later) protoc, which generate code in a more imperative paradigm for which their existing debugging approaches worked

i can't say i think they were wholly wrong; paging through compiler error messages is not my favorite part of c++ templates. but i have a certain amount of affection for what used to be called gasp, the gas macro system, which i've programmed for example to compute jump offsets for compiling a custom bytecode. and i think m4 is really a pathological case; most hairy macro systems aren't even 10% as bad as m4, due to a combination of several tempting but wrong design decisions. lots of trauma resulted

so when they got a do-over they eliminated the preprocessor entirely in golang, and compensated with reflection, which makes debugging easier rather than harder

probably old hat to you, but i just learned last month how to use x-macros in the c preprocessor to automatically generate serialization and deserialization code for record types (speaking of cobol): http://canonical.org/~kragen/sw/dev3/binmsg_cpp.c (aha, i see you're linking to a page that documents it)

skissane

1 replies

6h26m

2024-09-03 12:05:40 UTC

C's is weak yet not weak – you can do various advanced things (like conditional expansion or iteration), but using esoteric voodoo with extreme performance cost. Whereas other preprocessors let you do that using builtins which are fast and easy to grok.

See for example https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,...

Poor C preprocessor performance has a negative real world impact, for example recently with the Linux kernel – https://lwn.net/Articles/983965/ – a more powerful preprocessor would enable people to do those things they are doing anyway much more cheaply

lanstin

0 replies

1h49m

2024-09-03 16:43:00 UTC

I've always suspected the powerful macro facilities in Lisp are why it's never been very common - the ability to do proper macros means all the very smart programmers create code that has to be read like a maths paper. It's too bespoke to the problem domain and too tempting to make it short rather than understandable.

I like Rust (tho I have not yet programmed in it) but I think if people get too into macro generated code, there is a risk there to its uptake.

It's hard for smart programmers to really believe this, but the old "if you write your code as cleverly as possible, you will not be able to debug it" is a useful warning.

kazinator

0 replies

3h0m

2024-09-03 15:31:47 UTC

If your Lisp macro starts with a symbol whose name begins with def, and the next symbol is a name, then good old Exuberant Ctags will index it, and you get jump to definition.

Not so with DEFINE_FUNCTION(foo) {, I think.

  $ cat > foo.lisp
  (define-musical-scale g)
  $ ctags foo.lisp
  $ grep scale tags
  g       foo.lisp        /^(define-musical-scale g)$/;"  f

Exuberant Ctags is not even a tool from the Lisp culture. I suspect it is mostly shunned by Lisp programmers. Except maybe for the Emacs one, which is different. (Same ctags command name, completely different software and tag file format.)

db48x

1 replies

7h25m

2024-09-03 11:07:25 UTC

Yes, the usefulness of macros always has to be balanced against their cost. I know of only one codebase that does this particular thing though, Emacs. It is used to define Lisp functions that are implemented in C.

shadowgovt

0 replies

3h55m

2024-09-03 14:37:20 UTC

It's a common pattern for just about any binding of C-implementation to a higher-level language. Python has a similar pattern, and I once had to re-invent it from scratch (not knowing any of this) for a game engine.

jampekka

4 replies

8h24m

2024-09-03 10:08:22 UTC

Rust though does lose some of those points by more or less forcing[1] snake_case. It's really annoying to navigate bindings which are converted from camelCase.

I don't care which case is used. It's a trivial superficial thing, and tribal zealotry about such doesn't reflect well on the language and community.

[1] The warnings can be turned off, but in some cases it requires ugly hacks, and the community seems to be actively hostile to making it easier)

kibwen

3 replies

5h35m

2024-09-03 12:57:26 UTC

The Rust community is no more zealous about naming conventions than any other language which has naming conventions. Perhaps you're arguing against the concept of naming conventions in general, but that's not a Rust thing, every language of the past 20 years suggests naming conventions if for no other reason than every language provides a standard library which needs to follow some sort of naming conventions itself. Turning off the warnings emitted by the Rust compiler takes two lines of code, either at the root of the crate or in the crate manifest.

jampekka

2 replies

5h12m

2024-09-03 13:20:35 UTC

I've yet to encounter another compiler that warns about naming conventions, by default at least. So at least it's most enforced zealotry I've encountered.

Yes, it can be turned off. But for e.g. bindgen generated code it was not trivial to find out.

kibwen

1 replies

2h58m

2024-09-03 15:33:48 UTC

The Rust compiler doesn't produce warnings out of zealotry, but rather as a consequence of pre-1.0 historical decisions. Note that Rust doesn't use any syntax in pattern matching contexts to distinguish between bindings and enum variants. In pre-1.0 versions of Rust, this created footguns where an author might think they were matching on an enum, but the compiler was actually parsing it as a catch-all binding that would cause any following match arms to never be executed. This was exacerbated by the prevailing naming conventions of the time (which you can see in this 2012 blog post: https://pcwalton.github.io/_posts/2012-06-03-maximally-minim... (note the lower-cased enum variants)). So at some point the naming conventions were changed in an attempt to prevent this footgun, and the lint was implemented to nudge people over to the new conventions. However, as time went on the footgun was otherwise fixed by instead causing the compiler to prioritize parsing enum variants rather than bindings, in conjunction with other errors and warnings about non-exhaustive patterns and dead code (which are all desirable in their own right). At this point it's mostly just vestigial, and I highly doubt that anybody really cares about it beyond "our users are accustomed to this warning-by-default, so they might be surprised if we stopped doing this".

jampekka

0 replies

1h5m

2024-09-03 17:26:42 UTC

Ah, thanks for the info! I do think this default does have some ramifications, especially in that binding casings are typically changed due to it even for "non-native" wrappers which I find materially makes things more difficult.

sva_

3 replies

9h36m

2024-09-03 08:56:06 UTC

People don't use LSP?

gregjor

1 replies

9h19m

2024-09-03 09:12:46 UTC

That’s right, not everyone uses an LSP. Nothing wrong with LSPs, very useful tools. I use ripgrep, or plain grep if I have to, far more often than an LSP.

Working with legacy code — the scenario the author describes — I often can’t install anything on the server.

giancarlostoro

0 replies

6h49m

2024-09-03 11:42:43 UTC

Fun fact: VS Code uses ripgrep by default.

https://github.com/microsoft/vscode-ripgrep

menaerus

0 replies

3h48m

2024-09-03 14:44:34 UTC

LSP doesn't always work without issues with large C and C++ codebases which is why one needs to fallback to grep techniques.

semiinfinitely

2 replies

13h54m

2024-09-03 04:38:04 UTC

python also!

jsjohnst

1 replies

6h47m

2024-09-03 11:45:27 UTC

Python is the only one mentioned that “actually works” without endless exceptions to the rule in the normal case. The ones mentioned (Rust/Javascript/Lisp/Go) all have specific syntax that is commonly enough used which makes it harder to search. Possible, absolutely, but still harder.

zbentley

0 replies

5h17m

2024-09-03 13:15:11 UTC

I'd say Python works well at greppability because community conventions generally discourage concealing certain kinds of definitions (e.g. function definitions are usually "def whatever").

However, that's just convention. Lots of modules do metaprogramming tricks that obscure greppability, which can be a pain. This is particularly acute when searching for code that is "import-time polymorphic"--that is, code which picks one of several implementations for a piece of functionality at import time at the module scope. That frequently ends up with some hanky-panky a la "exported_function_name = _implementation1 if platform_supported else _implementation2" at the module scope.

While sometimes annoying, that type of thing is usually done for understandable reasons (picking an optimized/platform-supported implementation of an interface--think select or selectors in the stdlib, or any pypi implementation of filesystem monitoring using fsnotify/fanotify/kqueue/fsevents/ReadDirectoryChangesW). Additionally, good type annotations help with greppability, though they can't fully mitigate this issue.

Much less defensible in Python is code that abuses locals/globals to indirect symbol access, or code that abuses star imports to provide interfaces/implementation switching.

Those, fortunately, are rare, but the elephant in the "no greppability ever" room is not: getattr bullshit in OO code is so often utterly obscure, unnecessary and terrible. And it's distressingly common on PyPi. At first I thought this was Ruby's encouragement of method_missing in the bad old days bleeding into the Python community, but the number of programmers for whom getattr magic is catnip seems to be disproportionate to the number of folks with Ruby experience, and, more concerningly, seems to me to be growing over time.

dan-robertson

2 replies

12h7m

2024-09-03 06:24:45 UTC

Not sure this is very true for Common Lisp. Classic example are accessor functions where the generic function is created by whichever class is defined first and the method where the class is defined. Other macros will construct new symbols for function names (or take them from the macro arguments).

f1shy

0 replies

12h2m

2024-09-03 06:30:34 UTC

Still you can extend the concept without a lot of work, couldn't you?

db48x

0 replies

7h29m

2024-09-03 11:02:54 UTC

That’s true, but I regard it as fairly minor. Accessor functions don't have any logic in them, so in practice you don’t have to grep for them. But it can be confusing for new players, since they don't know ahead of time which ones are accessors and which are not.

wpollock

1 replies

2h24m

2024-09-03 16:07:50 UTC

In the bygone days of ctags, C function definitions included a space before opening parenthesis, while function calls never had that space. I have a hard time remembering that modern coding styles never have that space and my IDE complains about it. (AFAIK, the modern gtags doesn't rely on that space to determine definitions.) Even without *tags, the convention made it easy to grep for definitions.

mzs

0 replies

1h37m

2024-09-03 16:55:24 UTC

space after builtin was recommended instead:

  if (x == 0) { ...
  sizeof (buf);
  return (-1);
  exit(0);

hgomersall

1 replies

11h44m

2024-09-03 06:48:25 UTC

Though glob imports in rust can hide a source, so those should be avoided.

mre

0 replies

10h38m

2024-09-03 07:54:03 UTC

Exactly. I wrote an entire blog post about that: https://corrode.dev/blog/dont-use-preludes-and-globs/

andersa

1 replies

3h54m

2024-09-03 14:38:10 UTC

Do people really use text search for this rather than an IDE that parses all of the code and knows exactly where each declaration is, able to instantly jump to them from a key press on any usage...? Wild.

iamwil

0 replies

3h53m

2024-09-03 14:39:24 UTC

Yes. Not everyone uses or likes an IDE. Also, when you lean on an IDE for navigation, there is a tendency to write more complicated code, since it feels easy to navigate, you don't feel the pain.

wruza

0 replies

7h30m

2024-09-03 11:02:04 UTC

Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in

That’s why in my personal projects I follow classic “type\nname” and grep with “^name\>”.

looks ugly

Single line definitions with long, irregular type names and unaligned function names look ugly. Col 1 names are not only greppable but skimmable. I can speedscroll through code and still see where I am.

veltas

0 replies

13h3m

2024-09-03 05:29:31 UTC

For most functions ^\S.*name( will find declarations and definitions.

Most of us use exuberant ctags to allow jumping to definitions.

throwawayffffas

0 replies

8h41m

2024-09-03 09:51:35 UTC

Meanwhile C lacks any such keyword

It's a hassle. But not the end of the world.

I usually search for "doTheThing$.+?$ \{" first.

If I don't get a hit, or too many hits I move to "doTheThing$[^$]*?\) \{" and so on.

suprjami

0 replies

6h6m

2024-09-03 12:25:45 UTC

Meanwhile C lacks any such keyword, so the best you can do is...

...use source code tagging or LSP.

skywal_l

0 replies

12h52m

2024-09-03 05:40:23 UTC

Yet you reply to an article that defines functions as variables, which I've seen a lot of developers do usually for no good reason at all.

To me, that's a much common and worse practice with regards to greppability than splitting identifiers using string which I haven't seen much in the wild.

mav3ri3k

0 replies

9h4m

2024-09-03 09:27:39 UTC

Although in rust, function like macros make it super hard to trace code. I like them when I am writing the code and hate then when I have to read others macros.

marcosdumay

0 replies

3h58m

2024-09-03 14:33:47 UTC

Those also make your language easier to parse, and to read.

Many people insist that IDEs make the entire point moot, but that's the kind of thing that make IDEs easier to write and debug, so I disagree.

leogout

0 replies

48m

2024-09-03 17:43:48 UTC

Javascript is a bit trickier i think nowadays with the fat arrow notation : const myFunc = () => console. log("can't find me :p");

kazinator

0 replies

12h14m

2024-09-03 06:18:30 UTC

C has "classical" tooling like Cscope and Exuberant Ctags. The stuff works very well, except on the odd weird code that does idiotic things that should not be done with preprocessing.

Even for Lisp, you don't want to be grepping, or at least not all the time for basic things.

For TXR Lisp, I provide a program that will scan code and build (or add to) your tags file (either a Vim or Emacs compatible one).

Given

  (defstruct point ()
    x
    y)

it will let your editor jump to the definition of point, x and y.

johannes1234321

0 replies

3h57m

2024-09-03 14:35:31 UTC

One thing which works for C is to search something like `[a-z] foo$.+$ \{` assuming that spacing matches the coding style, often the shorter form `[a-z] foo\(` works well, which tries to ensure there is a type definition and bin assignment or something before name. Then there is only a handful false positives.

gregjor

0 replies

9h28m

2024-09-03 09:04:11 UTC

ctags.

fsckboy

0 replies

3h40m

2024-09-03 14:51:43 UTC

C, starting with K&R, has all declarations and definitions on lines at the left margin, and little else. this is easy to grep for.

eddieh

0 replies

14h10m

2024-09-03 04:22:35 UTC

I used to define functions as `funcname (arglist)`

And always call the function as `funcname(args)`

So definitions have a space between the name and arg parentheses, while calls do not. Seemed to work well, even in languages with extraneous keywords before definitions since space + paren is shorter than most keywords.

Now days I don’t bother since it really isn’t that useful especially with tags or LSP.

I still put the return type on a line of its own, not for search/grep, but because it is cleaner and looks nice to me—overly long lines are the ugliest of coding IMO. Well that and excessive nesting.

drewg123

0 replies

1h34m

2024-09-03 16:58:17 UTC

In terms of C, that's one reason I prefer the BSD coding style:

int

foo(void) { }

vs the Linux coding style:

int foo(void) { }

The BSD style allows me to find function definitions using git grep ^foo.

darepublic

0 replies

32m

2024-09-03 17:59:59 UTC

There is arrow syntax with js

bryanrasmussen

0 replies

13h9m

2024-09-03 05:22:37 UTC

JavaScript has multiple ways to define a function so you sort of lose that getting the actual definition benefit.

on edit: I see someone discussed that you can grep for both arrow functions and named function at the same time and I suppose you can also construct a query that handles a function constructor as well - but this does not really handle curried functions or similar patterns - I guess at that point one is letting the perfect become the enemy of the good.

Most people grepping know the code base and the patterns in use, so they probably only need to grep for one type of function declaration.

bionsystem

0 replies

14h13m

2024-09-03 04:19:23 UTC

Doesn't cscope fit this usecase ?

akritid

0 replies

10h36m

2024-09-03 07:55:59 UTC

Looks fine (subjective) and there is also ctags

akira2501

0 replies

7h28m

2024-09-03 11:03:37 UTC

so the best you can do is search for the name

This is why in C projects libs go in "lib/" and sources go in "src/". If your header files have the same directory structure as libs, then "include/" is a also a decent way to find definitions.

lucumo

96 replies

12h40m

2024-09-03 05:51:50 UTC

Grepping for symbols like function names and class names feels so anemic compared to using a tool that has a syntactic understanding of the code. Just "go to definition" and "find usages" alone reduce the need for text search enormously.

For the past decade-plus I have mostly only searched for user facing strings. Those have the advantage of being longer, so are more easily searched.

Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.

laserbeam

18 replies

10h11m

2024-09-03 08:21:17 UTC

Scenarios where an IDE with full syntactic understanding is better:

- It's your day to day project and you expect to be working in it for a long time.

Scenarios where grepping is more useful:

- Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

- You just opened the project for the first time.

- It's in a language you don't daily drive (you write backend but have to delve in frontend code, it's a 3rd party library, it's configuration files, random json/xml files or data)

- You're editing or searching through documentation.

- You haven't even downloaded the project and are checking things out in github (or some similar site for your project).

- You're providing remote assistance to someone and you are not at your main development machine.

- You're remoting via SSH and have access to code there (say it's a python server).

Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.

cxr

4 replies

6h24m

2024-09-03 12:08:06 UTC

- You're fully aware that it would be better to be able to use tooling for $THING, but tooling doesn't exist yet or is immature.

kragen

3 replies

4h46m

2024-09-03 13:45:53 UTC

you would not believe the amount of time i spent pretty-printing python dicts by hand last week

yen223

2 replies

4h26m

2024-09-03 14:06:28 UTC

https://docs.python.org/3/library/pprint.html

kragen

1 replies

4h24m

2024-09-03 14:08:01 UTC

yeah, pprint is why i was doing it by hand ;)

lkbm

0 replies

3h12m

2024-09-03 15:19:46 UTC

I used to pipe things through black for that. (a script that imported black, not just black on the command line.)

I also had `j2p` and `p2j` that would convert between python (formatted via black) and json (formatted via jq), and the `j2p_clip`/`p2j_clip` versions that would pipe from clipboard and back into clipboards.

It's worth taking the time to build a few simple scripts for things you do a lot. I used to open up the repl and import json to convert between json and python dicts multiple times a day, so spending a few minutes throwing together a simple script to do it was well worth the effort.

umanwizard

2 replies

2h20m

2024-09-03 16:12:26 UTC

Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

Your other points make sense, but in this case, at least for C/C++, you can generate a compile_commands.json that will let clangd interpret your code accurately.

If building with make just do `bear -- make` instead of `make`. If building with cmake pass `-DCMAKE_EXPORT_COMPILE_COMMANDS=1`.

camel-cdr

1 replies

1h28m

2024-09-03 17:04:33 UTC

Does it evaluate macros? Because macros allow for arbitrary computation.

umanwizard

0 replies

12m

2024-09-03 18:19:53 UTC

The macros I see in the real world seem to usually work fine. I’m sure it’s not perfect and you can construct a macro that would confuse it, but it’s a lot better than not having a compilation db at all.

joe-six-pack

2 replies

4h33m

2024-09-03 13:59:16 UTC

You forgot massive codebases. Language servers really struggle with anything on the order of the Linux kernel, FreeBSD, or Chromium.

umanwizard

0 replies

2h17m

2024-09-03 16:14:49 UTC

clangd works fine for me with the linux kernel. For best results build the kernel with clang by setting LLVM=1 and KERNEL_LLVM=1 in the build environment and run ./scripts/clang-tools/gen_compile_commands.py after building.

Groxx

0 replies

21m

2024-09-03 18:10:39 UTC

I honestly suspect that the amount of time spent dealing with the issues monorepos cause is net-larger than the gains most get from what a monorepo offers. It's just harder to measure because it tends to degrade slowly, happen to things you didn't realize you were relying on (until you need them), and without clear ways to point fingers at the cause.

Plus it means your engs don't learn how to deal with open source code concerns, e.g. libraries, forking, dependency management. Which gradually screws over the whole ecosystem.

If you're willing to put Google-scale effort into building your tooling, sure. Every problem is solvable. Only Google does that though, everyone else is getting by with a tiny fraction of the resources and doesn't already have a solid foundation to reduce those maintenance costs.

popinman322

1 replies

9h56m

2024-09-03 08:36:23 UTC

Grep is also useful when IDE indexing isn't feasible for the entire project. At past employers I worked in monorepos where the sheer size of the index caused multiple seconds of delay in intellisense and UI stuttering; our devex team's preferred approach was to better integrate our IDE experience with the build system such that only symbols in scope of the module you were working on would be loaded. This was usually fine, and it works especially well for product teams, but it's a headache when you're doing cross-cutting work (e.g. for infrastructure projects/overhauls).

We also had a livegrep instance that we could use to grep any corporate repo, regardless of where it was hosted. That was extremely useful for investigating failures in build scripts that spanned multiple repositories (e.g. building a Go sidecar that relies on a service config in the Java monorepo).

cma

0 replies

5h53m

2024-09-03 12:39:24 UTC

If running into this, make sure to enable 64-bit intellisense and increase the ram limit, by default it is 4gb.

jollyllama

1 replies

6h17m

2024-09-03 12:15:07 UTC

It's your day to day project and you expect to be working in it for a long time.

Bold of everyone here to assume that everyone has a day to day project. If you're a consultant or for other reasons you're switching projects on a month to month basis, greppability is probably the top metric second to UT coverage.

switchbak

0 replies

2h8m

2024-09-03 16:23:56 UTC

They said the scenario in which that would be useful was IF: "It's your day to day project and you expect to be working in it for a long time". The implication being that if neither of those hold then skip to the next section.

I don't think anyone is assuming anything here. I've contracted for most of my career and this didn't seem like an outlandish statement.

Also, if you're working in a project for a month, odds are you could set up an IDE in the first few hours. Not sure how any of this rises to the level of being "bold".

lolinder

0 replies

4h49m

2024-09-03 13:42:43 UTC

It's your day to day project and you expect to be working in it for a long time.

I don't think we need to restrict the benefits quite that much—if it's a project that isn't my day-to-day but is in a language I already have set up in my IDE, I'd much prefer to open it up in my IDE and use jump to definition and friends than to try to grep and hope that the developers made it grepable.

Going further, I'd equally rather have plugins ready to go for every language my company works in and use them for exploring a foreign codebase. The navigation tools all work more or less the same, so it's not like I need to invest effort learning a new tool in order to benefit from navigation.

Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.

Certainly don't sabotage, but some of these suggestions are bad for other reasons that aren't about grep.

For example: breaking the naming conventions of your language in order to avoid remapping is questionable at best. Operating like that binds your business logic way too tightly to the database representation, and while "just return the db object" sounds like a good optimization in theory, I've never not regretted having frontend code that assumes it's operating directly on database objects.

gpderetta

0 replies

4h34m

2024-09-03 13:58:30 UTC

- you just switched branch/rebased and the index is not up to date.

- the project is large enough that the IDE can't cope.

- you want to also match comments, commented out code or in-project documentation

- you want fuzzy search and match similarly named functions

I use clangd integration in my IDE all the time, but often brute force is the right solution.

emn13

0 replies

9h41m

2024-09-03 08:50:55 UTC

Further important (to me) scenarios that also argue for greppability:

- greppability does not preclude IDE or language server tooling; there's often special cases where only certain e.g. context-dependant usages matter, and sometimes grep is the easiest way to find those.

- projects that include multiple languages, such as for instance the fairly common setup of HTML, JS, CSS, SQL, and some server-side language.

- performance in scenarios with huge amounts of code, or where you're searching very often (e.g. in each git commit for some amount of history)

- ease of use across repositories (e.g. a client app, a spec, and a server app in separate repos).

I treat greppability as an almost universal default. I'd much rather have code in a "weird" naming style in some language but have consistent identifiers across languages, than have normal-style-guide default identifiers in each language, but differing identifiers across languages. If code "looks weird", if anything that's often actually a _benefit_ in such cases, not a downside - most serialization libraries I use for this kind of stuff tend to do a lot of automagic mapping that can break in ways that are sometimes hard to detect at compile time if somebody renames something, or sometimes even just for a casing change or type change. Having a hint as to this fragility immediate at a glance even in dynamically typed languages is sometimes a nice side-effect. Very speculatively, I wouldn't be surprised if AI coding tools can deal with consistent names better than context-dependent ones too; greppability is likely not specifically about merely the tool grep.

And the best part is that there's almost no downside; it's not like you need to pick either a language server, IDE or grep - just use whatever is most convenient for each task.

gregjor

17 replies

9h12m

2024-09-03 09:20:22 UTC

I abandoned VSCode and went back to vim + ctags + ripgrep after a year with the most popular IDE. I miss some features but it didn’t give me a 10x or even 1.5x improvement in my own work along any dimension.

I attribute that mostly to my several decades of experience with vi(m) and command line tools, not to anything inherently bad about VSCode.

What counts as “better” tools has a lot of subjectivity and circumstances implied. No one set of tools works for everyone. I very often have to work over ssh on servers that don’t allow installing anything, much less Node and npm for VSCode, so I invest my time in the tools that always work everywhere, for the work I do.

The main project I’ve worked on for the last few years has a little less than 500,000 lines of code. VSCode’s LSP takes a few seconds fairly often to maintain the LSP indexes. Running ctags over the same code takes about a second and I can control when that happens. vim has no delays at all, and ripgrep can search all of the files in a second or two.

wrasee

14 replies

8h43m

2024-09-03 09:49:31 UTC

Did you consider Neovim? You get the benefit of vim while also being able to mix in as much LSP tooling as you like. The tradeoff is that it takes some time to set up, although that is getting easier.

That won’t make LSP go any faster though. There’s still something interesting in the fact that a ripgrep of every line in the codebase can still be faster than a dedicated tool.

gregjor

8 replies

8h31m

2024-09-03 10:01:17 UTC

Considered it and have tried repeatedly to get it to work with mixed success. As you wrote, it takes "some time" to set up. In my case it would only offer marginal improvements over plain vim, since I'm not that interested in the LSP integration (and vim has that too, through a plugin).

In the environments I often work in I can't install anything or run processes like node. I ssh into a server and have to use whatever came with the Linux distro, which means sticking with the tools I will find everywhere. I can't copy the code from the server either. If I get lucky they used version control. I know not everyone works with those constraints. I specialize in working on abandoned and legacy code.

kragen

5 replies

4h44m

2024-09-03 13:47:50 UTC

can you not upload executables over ssh, say for policy reasons or disk-space reasons? how about shell scripts?

i mean, i doubt i'm going to come up with some brilliant breakthrough that makes your life easier that you've somehow overlooked, but i'd like to understand what kinds of constraints people like you often confront

i'm just glad you don't have to use teamviewer

gregjor

4 replies

3h18m

2024-09-03 15:13:37 UTC

I don't have to use TeamViewer, though I very occasionally have to use Windows RDP.

You can transfer any kind of file over ssh. scp, sftp, rsync will all copy binaries. Mainly the issues come down to policy and billable time. Many of my customers simply don't allow installing anything on their servers without a tedious approval process. Even if I can install things I might spin my wheels trying to get it to work in an environment I don't have root privileges on, with no one willing to help, and I can't bill for that time. I don't work for free to get an editor installed. I use the tools I know I can find on any Linux/BSD server.

With some customers I have root privileges and manage the server for them. With others their IT dept has rules I have to follow (I freelance) if I want to keep a good relationship. Since I juggle multiple customers and environments I find it simpler not having to manage different editors and environments, so I mostly stick with the defaults. I do have a .profile and .vimrc I copy around if allowed to, that's about it.

I can't lose time/money and possibly goodwill whining about not having everything just-so for me. I recently worked on a server over ssh that didn't have tmux installed. Fortunately it did have screen, and I can use that too, no big deal. I spent less than 60 seconds figuring that out and getting to work rather than wasting hours of non-billable time annoying someone about how I needed tmux installed.

kragen

3 replies

3h7m

2024-09-03 15:24:59 UTC

i see, thanks!

wrt rdp, i feel like rdp is actually better than vnc or x11-over-ssh, but for cases where regular ssh works, i'd rather use ssh

i wasn't thinking in terms of installing tmux, more like a self-contained binary that doesn't require any kind of 'installation'

gregjor

2 replies

2h56m

2024-09-03 15:36:05 UTC

I used the word "install" but the usual rule says I can't install, upload, or execute any non-approved software. Usually that just gets stated as a policy, but I have seen Linux home directories on noexec partitions -- government agencies and big corporations can get very strict about that. So copying a self-contained binary up and running it would violate the policy.

I pretty much live in ssh. Remote Desktop means a lot of clicking and watching a GUI visibly repaint. Not efficient. Every so often I have customers using applications that only run on Windows, no API, no command line, so they will enable RDP to that, usually through a VPN.

kragen

1 replies

2h28m

2024-09-03 16:04:20 UTC

i see! but i guess your .profile and .vimrc don't count?

gregjor

0 replies

1h59m

2024-09-03 16:33:26 UTC

They aren't executables.

wrasee

1 replies

8h3m

2024-09-03 10:29:23 UTC

Yes ok. And legacy code might be a good example where grep works well, if it's fair to argue a greater propensity for things like preprocessors, older languages and custom builds that may not play as well with semantic-level tools, let alone be written with modern tooling in mind.

gregjor

0 replies

4h47m

2024-09-03 13:45:31 UTC

Lol, I'm not working with COBOL or Fortran. Legacy code in my world means the original developers have left, not that it dates from the 1970s. Mostly I work with PHP, shell scripts, various flavors of SQL, Python, sometimes Rails or other stuff. All things modern LSPs can handle.

VHRanger

4 replies

6h31m

2024-09-03 12:00:58 UTC

There's also helix now, which requires next to no setup, but requires learning new motions (subject is before the verb in helix)

gregjor

3 replies

4h45m

2024-09-03 13:46:43 UTC

I looked at Helix but since I dream in vim motions at this point (vi user since it came out) I'd have to see a 10x improvement to switch. VSCode didn't give me a 10X improvement, I doubt Helix would.

VHRanger

2 replies

3h51m

2024-09-03 14:40:54 UTC

Helix certainly won't give you a 10x improvement. It tends to convert a lot of people moving "up" from VS Code, and still a decent chunk, but certainly fewer neovim users moving "down".

Advantages of Helix are pretty straightforward:

1. Very little configuration bullshit to deal with. There's not even a plugin system yet! You just paste your favorite config file and language/LSP config file and you're good to go. For anything else, submit a pull request.

2. Built in LSP support for basically anything an LSP exists for.

3. There's a bit of a new generation command line IDE forming itself around zellij (tmux that doesn't suck) + helix + yazi (basically nnn or mc on crack, highly recommended).

That whole zellij+helix+yazi environment is frankly a joy to work in, and might be the 2-3x improvement over neovim that makes the switch worth it.

gregjor

1 replies

3h9m

2024-09-03 15:23:25 UTC

Like I wrote, I looked at Helix. Seems cool but not enough for me to switch. And I would have to install it on the machines I work on, which very often I can't do because of company policies, or can't waste the non-billable time on.

I only recently moved from screen to tmux, and I still have to fall back to screen sometimes because tmux doesn't come with every Linux distro. I expect I will retire before I think tmux (or screen, for that matter) "sucks" to the point I would look at something else. And again I very often can't install things on customer servers anyway.

VHRanger

0 replies

37m

2024-09-03 17:54:55 UTC

Tmux does suck pretty bad though?

It conflicts with the clipboard and a bunch of hotkeys, and configuring it never works because they have breaking change in how their config file works ever 6months or so.

These days I only use it to launch a long running job in ssh to detach the session it's on and leave.

joe-six-pack

1 replies

4h37m

2024-09-03 13:55:18 UTC

VSCode is not an IDE, it's an extensible text editor. IDEs are integrated (it's in the name) and get developed as a whole. I'm 99% certain that if you were forced to spend a couple of months in a real IDE (like IDEA or Rider), you would not want to go back to vim, or any other text editor. Speaking as a long time user of both.

gregjor

0 replies

3h28m

2024-09-03 15:04:25 UTC

I get your point, but VSCode does far more than text editing. The line between an advanced editor and an IDE gets blurry. If you look at the Wikipedia page about IDEs[1] you see that VSCode ticks off more boxes than not. It has integration with source code control, refactoring, a debugger, etc. With the right combination of extensions it gets really close to an IDE as strictly defined. These days advanced text editor vs. "real" IDE seems more like a distinction without much of a difference.

You may feel 99% certain, but you got it wrong. I have quite a bit of experience with IDEs, you shouldn't assume I use vim out of ignorance. I have worked as a programmer for 40+ years, with development tools (integrated or not) that I have forgotten the names of. That includes "real" IDEs like Visual Studio, Metrowerks CodeWarrior, Symantec Think C, MPW, Oracle SQL Developer, Turbo Pascal, XCode, etc. and so on. When I started programming every mainframe and minicomputer came with an IDE for the platform. Unix came along with the tools broken out after I had worked for several years. In high school I learned programming on an HP-2000 BASIC minicomputer -- an IDE.

So I have spent more than "a couple of months in real IDEs" and I still use vim day to day. If I went back to C++ or C# for Windows I would use Visual Studio, but I don't do that anymore. For the kind of work I do now vim + ctags + ripgrep (and awk, sed, bash, etc.) get my work done. At my very first real job I used PWB/Unix[2] -- PWB means Programmer's Work Bench -- an IDE of sorts. I still use the same tools (on Linux) because they work and I can always count on finding a large subset of them on any server I have to work with.

I don't dislike or mean to crap on IDEs. I have used my share of IDEs and would again if the work called for that. I get what I need from the tools I've chosen, other people make different choices, no perfect language, editor, IDE, what have you exists.

[1] https://en.wikipedia.org/wiki/Integrated_development_environ...

[2] https://en.wikipedia.org/wiki/PWB/UNIX

zarzavat

12 replies

12h20m

2024-09-03 06:12:15 UTC

Go to definition and find usages only work one symbol at a time. I use both, but I still use global find/replace for groups of symbols sharing the same concept.

For example if I want to rename all “Dog” (DogModel, DogView, DogController) symbols to “Wolf”, find/replace is much better at that because it will tell me about symbols I had forgotten about.

turboponyy

5 replies

11h56m

2024-09-03 06:35:46 UTC

There's no reason they have to work one symbol at a time - that's just a missing feature in your language server implementation.

Some language servers support modifying the symbols in contexts like docstrings as well.

setopt

4 replies

11h45m

2024-09-03 06:46:44 UTC

I’ve never seen an LSP server that lets you rename “Dog” to “Wolf” where your actual class names are “Dog[A-Za-z]*”?

Do you have an example?

turboponyy

1 replies

9h37m

2024-09-03 08:55:29 UTC

Neither have I; and no, I don't - I misinterpreted what you said.

But I don't see why LSP servers shouldn't support this, still. I'm not sure if the LSP specification allows for this as of current, though.

setopt

0 replies

1h32m

2024-09-03 17:00:01 UTC

I would actually love a regexp search-and-replace assisted by either TreeSitter or LSP.

Something that lets me say that I want to replace “Dog$.*$” with “Wolf\1”, but where each substitution is performed only within single “symbols” as identified by TS or LSP.

Maxion

1 replies

11h25m

2024-09-03 07:07:23 UTC

IntelliJ's refactor tool?

yen223

0 replies

4h21m

2024-09-03 14:10:49 UTC

IntelliJ doesn't use LSP as far as I know.

It does usually make that kind of DogModel -> WolfModel refactoring.

gugagore

3 replies

12h15m

2024-09-03 06:16:39 UTC

I am familiar with the situation you describe, and it's a good point.

However, it does suggest that there is an opportunity for factoring "Dog" out in the code, at least by name spacing (e.g. Dog.Model).

zarzavat

1 replies

10h21m

2024-09-03 08:11:04 UTC

That gets to the core of the issue doesn’t it? There are two cultures: Do you prefer to refactor DogView into Dog.View, or do you prefer to refactor Dog.View into DogView.

Personally I value uniqueness/canonicalness over conciseness. I would rather have DogView because then there is one name for the symbol regardless of where I am in the codebase. If the same symbol is used with differently qualified names it is confusing - I want the least qualified name to be more descriptive than “View”.

The other culture is to lean heavily on namespaces and to not worry about uniqueness. In this case you have View and Dog.View that may be used interchangeably in different files. This is the dominant culture in Java and C#.

kccqzy

0 replies

5h20m

2024-09-03 13:11:56 UTC

The second culture that you describe happens also to be how OCaml structures things in modules. It's quite a turnoff for me.

f1shy

0 replies

12h5m

2024-09-03 06:27:13 UTC

That really depends on the context, and specific situation.

sandermvanvliet

0 replies

10h5m

2024-09-03 08:26:56 UTC

Jetbrains ReSharper (and Rider) is smart enough to handle these things. It’ll suggest renames across other symbols even ones that have related names

f1shy

0 replies

12h3m

2024-09-03 06:28:42 UTC

For that use case I think you can use treesitter[1] you can find Dog.* but only if it is a variable name, for example. Avoiding replacement inside of say literals.

[1] https://www.youtube.com/watch?v=MZPR_SC9LzE

jakub_g

9 replies

12h34m

2024-09-03 05:57:47 UTC

Your observation does not help with the majority of the points in the article. How do you find all usages of a parameter value literal?

troupo

7 replies

12h25m

2024-09-03 06:07:33 UTC

This is what the article starts with: "Even in projects exclusively written by myself, I have to search a lot: function names, error messages, class names, that kind of thing."

All of that is trivial to search for with a tool that understands the language.

nosianu

2 replies

12h13m

2024-09-03 06:18:49 UTC

All of that is trivial to search for with a tool that understands the language.

Isn't string search, or grepping for patterns, even more trivial? So what is your argument? You found an alternative method, good, but how is it any better?

In my own case, I wrote a library that we used in many projects, and I often wanted to know where and how functions from my lib were used in those projects. For example, to be able to tell how much of an effort it would be for the users to refactor when I changed something. However, your method of choice at least with my IDE (Webstorm) only worked locally within the project. Only string search would let me reliably and easily search all projects.

I actually experimented creating a "meta" project of all projects, but while it worked that lead to too many problems, and the main method to find anything still was string search (CTRL-SHIFT-F Find dialog in IDEA IDEs is string search and it's a wonderful dialog in that IDE family). I also had to open that meta project. Instead, I created a gitignored folder with symlinks to the sources of all the other projects and created a search scope for that folder, in which the search dialog let me string-search all projects' sources at once right from within the library project and still being able to use the excellent Find dialog.

In addition, I found that sometimes the IDE would not find a usage even within the project. I only noticed because I used both methods, and string search showed me one or two places more than the method that relied on the underlying code-parsing. Unfortunately IDEs have bugs, and the method you suggests relies on much more work of the IDE in parsing and indexing compared to the much more mundane string or string pattern search.

troupo

1 replies

11h32m

2024-09-03 07:00:35 UTC

Isn't string search, or grepping for patterns, even more trivial?

It's not trivial when you looking for symbols in context.

the method you suggests relies on much more work of the IDE in parsing and indexing compared to

...compared to parsing and indexing you have to do manually because a full-text search (especially in a large codebase) will return a lot of irrelevant info?

Funnily enough I also have a personal anecdote. We had a huge PHP code base based on Symfony. We were in the middle of a huge refactoring spree. I saw my colleagues switch from vim/emacs to Idea/WebStorm looking at how I easily found symbols in the code base, found their usages, refactored them etc. compared to the full-text search they were always stuck with.

This was 5-6 years ago, before LSP became ubiquitous.

nosianu

0 replies

32m

2024-09-03 18:00:29 UTC

It's not trivial

Did you miss the comparison? The "more trivial"? The context of my response? Please read the parent comment I responded to, treating my comment as standalone and adding some new meaning makers no sense.

String search is more trivial than a search that involves an interpretation of the code structure and meaning. I have no idea why you wish to start a discussion about such trivial statement.

* because a full-text search (especially in a large codebase) will return a lot of irrelevant info?*

It doesn't do that for me but instead works very well. I don't know what you do with your symbol names, but I have barely any generic function names, the vast majority of them are pretty unique.

No idea how you use search, but I'm never looking for "doSomething(", it's always "doSomethingVerySpecific()", or some equally specific string constant.

I don't have the problems you tell me I should have, and my use case was the subject of my comment, as should be clear, as well as my comment being a response to a specific point made by the parent comment.

renewiltord

1 replies

12h17m

2024-09-03 06:15:16 UTC

I actually don't think there's a tool that handles usages when using PHP varvars or when using example number one there which is parametrically choosing a table name.

When you string interpolate to build the name you lose searchability.

troupo

0 replies

11h30m

2024-09-03 07:02:02 UTC

Yes, full-text search is a great fallback when everything else fails. But in the use cases listed at the beginning of the article it's usually not needed if you have proper tools

cma

1 replies

5h41m

2024-09-03 12:51:29 UTC

All of that is trivial to search for with a tool that understands the language.

Some literal in a log message may come from the code or it might be remapped in some config file outside the language the LSP is looking at, or an environment variable etc.. I just go back and forth with grep and IDE tools, both have different tradeoffs.

troupo

0 replies

3h5m

2024-09-03 15:27:23 UTC

The thing is, so many people are weirdly obsessed with never using any other tools besides full-text search. As if using useful tools somehow makes them a lesser programmer or something :)

CrimsonRain

0 replies

7h45m

2024-09-03 10:47:10 UTC

By not using literals everywhere. All literals are defined somewhere (start of function, class etc) as enums or vars and used.

Just because I have 20 usage of 'shipping_address' doesn't mean I'll have this string 20 times in different places.

Grep has its place and I often need to grep code base which have been written without much thoughts towards DX. But writing it nicely allows LSP to take over.

aa-jv

8 replies

11h50m

2024-09-03 06:42:22 UTC

On the flipside, IDE's can turn you into lazy, inefficient programmers by doing all the hand-holding for you.

If your feelings are anemic when tasked with doing a grep, its because you have lost a very valuable skill by delegating it to a computer. There are some things the IDE is never going to be able to find - lest it becomes the development environment - so keeping your grep fu sharpened is wise beyond the decades.

(Disclaimer: 40 years of software development, and vim+cscope+grep/silversearcher are all I really need, next to my compiler..)

throwaway2037

2 replies

8h26m

2024-09-03 10:05:45 UTC

    > lazy... programmers

Since when was that a bad thing? Since time immemorial, it has been hailed as a universal good for programmers to be lazy. I'm pretty sure Larry Wall has lots of jokes about this on Usenet.

Also, I can clearly remember switching from vim/emacs to Microsoft Visual Studio (please, don't throw your tomatoes just yet!). I was blown away by IntelliSense. Suddenly, I was focusing more on writing business logic, and less time searching for APIs.

trashtester

1 replies

7h38m

2024-09-03 10:54:10 UTC

This is the wrong type of lazy.

Command line tools like grep are force multipliers for programmers. GUI's come with the risk of not being able to learn how to leverage this power. In the end, that often leads to more manual work.

And today, bash is a lingua franca that you can bring with you almost everywhere. Even Windows "speaks" bash these days, with WSL.

In itself, there's nothing wrong with using the built-in features of a GUI. Right-clicking a method (or using a keyboard shortcut) to find the definition in a given code base IS nice for that particular operation.

But by knowing grep/awk/find/git command line and so on, combined with bash scripting and advanced regular expressions, you open up a new world of possibilities.

All those things CAN be done using Python/C#/Java or whatever your language is. But a 1-liner in bash can be 10-100 lines of C#.

lucumo

0 replies

6h12m

2024-09-03 12:20:11 UTC

Where does this stupid notion come from that using powerful tools means you can't handle the less powerful ones anymore? Did your skills with a hand screwdriver atrophy when you learned how to use a powered screwdriver? Come on.

I use grep multiple times a day. I write bash scripts quite often. I'm not speaking from a position of ignorance of these tools. They have their place as a lowest common denominator of programming tools. But settling for the lowest common denominator is not a path to productivity.

Doesn't mean you should forget your skills, but it does mean you should investigate better tools. And leverage them. A lot.

But a 1-liner in bash can be 10-100 lines of C#.

Yes. And the reverse is also true. bash is fast and easy if there's an existing tool you can leverage, and slow and hard when there's not.

HdS84

1 replies

11h16m

2024-09-03 07:15:51 UTC

Huh? I have an old hand-powered drill from my Grandpa in my workshop. I used it once for fun. For all other tasks I use a powered drill. Same for IDEs. They help your refactor and reason about code - both properties I value. Sure, I could print it and use a textmarker, but I'm not Grandpa

trashtester

0 replies

7h34m

2024-09-03 10:58:14 UTC

Knowing the bash ecosystem translates better to how you use the knife in the kitchen.

Sure you can replace most uses of a knife with power tools, but there is a reason why most top chefs still rely on that knife for most of those tasks.

A hand powered drill is more like a hand powered meatgrinder. It has the same limitation as the powered versions, and is simply a more primitive version.

winwang

0 replies

9h58m

2024-09-03 08:33:58 UTC

I count the IDE and stuff like LSP as natural extensions of the compiler. For sure I grep (or equivalent) for stuff, but I highly prefer statically typed languages/ecosystems.

At the end of the day, I'm here to solve problems, and there's no end to them -- might as well get a head start.

lucumo

0 replies

8h26m

2024-09-03 10:06:33 UTC

If your feelings are anemic

I'm not feeling anemic. The tool is anemic, as in, underpowered. It returns crap you don't want, and doesn't return stuff you do want.

My grep-fu is fine. It's a perfectly good tool if you have nothing better. But usually you do have something better.

Using the wrong tool to make yourself feel cool is stupid. Using the wrong tool because a good tool could make you lazy shows a lack of respect for the end result.

high_na_euv

0 replies

9h22m

2024-09-03 09:10:19 UTC

Leveraging technology is good thing

kragen

2 replies

7h18m

2024-09-03 11:14:25 UTC

posts like this sound like the author routinely solves harder problems than you are, because the solutions you suggest don't work in the cases the post is about. we've had 'go to definition' since 01978 and 'find usages' since 01980, and you should definitely use them for the cases where they work

mjr00

1 replies

5h33m

2024-09-03 12:59:25 UTC

From the article,

- dynamically built identifiers is 100% correct, never do this. Breaks both text search and symbol search, results in complete garbage code. I had to deal with bugs in early versions of docker-compose because of this.

- same name for things across the stack? Shouldn't matter, just use find usages on `getAddressById`. Also easy way to bait yourself because database fields aren't 1:1 with front-end fields in anything but the simplest of CRUD webshit.

- translation example: the fundamental problem is using strings as keys when they should be symbols. Flat vs nested is irrelevant here because you should be using neither.

- react component example: As I mentioned in another comment, trivially managed with Find Usages.

Nothing in here strikes me as "routinely solves harder problems," it's just standard web dev.

kragen

0 replies

4h49m

2024-09-03 13:42:51 UTC

yes, i agree that standard web dev is full of these problems, which can't be solved with go-to-definition and find-usages. it's a mess. i wasn't claiming that these messy, hard problems where grep is more helpful than etags are exotic; they are in fact very common. they are harder than the problems lucumo is evidently accustomed to dealing with because they don't have correct, complete solutions, so we have to make do with heuristics

advice to the effect of 'you should not make a mess' is obviously correct but also, in many situations, unhelpful. sometimes i'm not smart enough to figure out how to solve a problem without making a mess, and sometimes i inherit other people's messes. in those situations that advice decays into 'you should not try to solve hard problems'

heisenbit

2 replies

12h17m

2024-09-03 06:14:42 UTC

A good IDE can be so much better iff it understands the code. However this requires the IDE to be able to understand the project structure, dependencies etc. which can be considerable effort. In a codebase with many projects employing several different languages it becomes hard to get and maintain the IDE understands everything state.

carlmr

0 replies

10h29m

2024-09-03 08:02:47 UTC

And especially in large monorepos anything that understands the code can become quite sluggish. While ripgrep remains fast.

A kind of in-between I've found for some search and replace action is comby (https://comby.dev/). Having a matching braces feature is a godsend for doing some kind of replacements properly.

amichal

0 replies

10h37m

2024-09-03 07:55:04 UTC

And an IDE would also fail to find references for most of the cases described in the article: name composition/manipulation, naming consistency across language barriers, and flat namespaces in serialization. And file/path folder naming seems to be irrelevant to the smart IDE argument. "Naming things is hard"

underdeserver

1 replies

12h3m

2024-09-03 06:28:49 UTC

Unfortunately in larger codebases or dynamic languages these tools are just not good enough today. At least not those I and my employers have tried.

They're either incomplete (you don't get ALL references or you get false references) or way too slow (>10 seconds when rg takes 1-2).

Recommendations are most welcome.

jimmaswell

0 replies

12h1m

2024-09-03 06:30:46 UTC

Only thing I can recommend is using C# (obviously not always possible). Never had an issue with these functions in Visual Studio proper no matter how big the project.

mjr00

1 replies

5h47m

2024-09-03 12:45:29 UTC

Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.

Completely agreed. The React component example in the article is trivial solvable with any modern IDE; right click on class name, "Find Usages" (or use the appropriate hotkey, of course). Trying to grep for a class name when you could just do that is insane.

I mainly see this from juniors who don't know any better, but as seen in this thread and the article, there are also experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.

gpderetta

0 replies

4h28m

2024-09-03 14:04:33 UTC

I worked on codebases large enough where enabling autocomplete/indexing would lock the IDE and cause the workstation to swap hard.

db48x

1 replies

7h23m

2024-09-03 11:09:12 UTC

True, but IDEs are fragile tools. Sometimes you want to fall back to simpler tools that will always work, and grep is not fragile.

cxr

0 replies

6h18m

2024-09-03 12:13:56 UTC

The basis if this article (and its forebear "Too DRY - The Grep Test"[1]) is that grep is fragile. It's just fragile in a way that's different from the way that IDEs are fragile.

1. <http://jamie-wong.com/2013/07/12/grep-test/>

brooke2k

1 replies

59m

2024-09-03 17:33:23 UTC

with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)

with a legacy codebase, or a fork of a dependency that had to be patched which uses an incompatible buildsystem, or any C/C++/obj-c/etc that heavily uses the preprocessor or nonstandard build practices, or codebases that mix lots of different languages over awkward FFI boundaries and so on and so forth -- there are so many situations where sometimes an IDE just can't get you 100% of the way there and you have to revert to grepping to do any real work

that being said, I don't fully support the idea of handcuffing your code in the name of greppability, but I think dismissing it as a metric under the premise that IDEs make grepping "obsolete" is a little bit hasty

lucumo

0 replies

26m

2024-09-03 18:05:41 UTC

with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)

I wish, but no. I've found people will make a mess of everything. Which is why I don't trust solutions that rely on humans having more discipline, like what this article advocates.

In any situation where grep is your last saviour, you cannot rely on the greppability of the code. You'll have to check and double check everything, and still accept the risk of errors.

brain5ide

1 replies

12h30m

2024-09-03 06:02:28 UTC

I think the first sentence of the author counters your comment. What you described works best in a familiar codebase where the organizing principles have been maintained well and are familiar to the reader and the tools are just the extension of those organizing principles. Even then a deviation from those rules might produce gaps in understanding of what the codebase does.

And grep cuts right through that in a pretty universal way. What the post describes are just ways to not work against grep to optimize for something ephemeral.

ricardo81

0 replies

12h16m

2024-09-03 06:16:17 UTC

Agree. Not just because it's unfamiliar code, you can also get a feel for how the program/programmer(s) structured the whole thing.

sauercrowd

0 replies

7h47m

2024-09-03 10:44:44 UTC

strongly disagree here. This works if - your IDE/language server is performant - all the tools are fully set up - you know how to query the specific semantic entity you're looking for (remembering shortcuts) - you are only interested in a single specific semantic entity - mixing entities is rarely supported

I dont map out projects in terms of semantics, I map out projects in files and code - That makes querying intuitive and I can easily compose queries that match the specificity of what I care about (e.g. I might want to find a `Server` but I want to show both classes, interfaces and abstract classes).

For the specific toolchain I'm using - typescript - the symbol search is also unusable once it hits a certain project size, it's just way too slow for it to be part of my core workflow

phyrex

0 replies

9h30m

2024-09-03 09:01:54 UTC

This breaks down at scale and across languages. All the FAANGs make heavy use of the equivalent of grepping in their code base

leni536

0 replies

11h2m

2024-09-03 07:29:48 UTC

I can't use an IDE on my entire git history, but git can grep.

k__

0 replies

10h37m

2024-09-03 07:55:06 UTC

Honestly, in my 18 years of software development, I haven't "greped" code once.

I only use grep to filter the output of CLI tools.

For code, I use my IDE or repository features.

jmmv

0 replies

36m

2024-09-03 17:56:17 UTC

Sure, if you have the luxury of having a functional IDE for all of your code.

You can't imagine how much faster I was than everybody else at answering questions about a large codebase just because I knew how to use ripgrep (on Windows). "Knowing how to grep" is a superpower.

hyperpape

0 replies

7h11m

2024-09-03 11:21:07 UTC

I can run rg over my project faster than I can do anything in my IDE. Both tools have their places.

citrin_ru

0 replies

10h39m

2024-09-03 07:53:00 UTC

Not everything you need to look for is a language identifier. I often grep for configuration option names in the code to see what the option actually does - sometimes it is easy to grep, sometimes there are too many matches, sometimes they cannot be found because option name composed in the code from separate unrepeatable (because of too many matches) parts. It's not hard to make config options greppable but some coders just don't care about this property.

a_e_k

0 replies

11h24m

2024-09-03 07:07:54 UTC

I've come to really like language servers for big personal and work projects where I already have my tools configured and tuned for efficiently working with it.

But being able to grep is really nice when trying to figure out something out about a source tree that I don't yet have set up to compile, nor am I a developer of. I.e., I've downloaded the source for a tool I've been using pre-built binaries of and am now trying to trace why I might be getting a particular error.

PhilipRoman

0 replies

9h40m

2024-09-03 08:51:53 UTC

IDEs are cool and all, but there is no way I'm gonna let VSCode index my 80GB yocto tmp directory. Ctags can crunch the whole thing in a few minutes, and so can grep.

Plus there are cases where grep is really what you need, for example after updating a particular command line tool whose output changed, I was able to find all scripts which grepped the output of the tool in a way that was broken.

IshKebab

0 replies

11h49m

2024-09-03 06:43:33 UTC

Definitely true when you can use static typing.

Unfortunately sometimes you can't, and sometimes you can but people can't be arsed, so this is still a consideration.

EasyMark

0 replies

5h22m

2024-09-03 13:10:24 UTC

It seems like the law of diminishing returns; while I'm sure in a few cases this characteristic of a code writing style is extremely useful, it cuts into other things such as readability and conciseness. Fewer lines can mean fewer bugs, within reason, if you aren't in lisp and are using more than 3 parentheses, you might want to split it up because the compiler/JIT/interpreter is going to anyway.

VoxPelli

35 replies

11h29m

2024-09-03 07:03:06 UTC

I advocate for greppability as well – and in Swedish it becomes extra fun – as the equivalent phrase in Swedish becomes "grep-bar" or "grep-barhet" and those are actual words in Swedish – "greppbar" roughly means "understandable", "greppbarhet" roughly means "the possibility to understand"

sshine

13 replies

10h59m

2024-09-03 07:32:42 UTC

How many other UNIX commands did the Swedes adopt into their language?

I know that they invented "curl". Do you tar xfz?

lukan

8 replies

10h53m

2024-09-03 07:38:52 UTC

As far as I understood, it was part of the language before.

The german equivalent of the word would be probably "greifbar". Being able to hold something, usually used metaphorically.

kagevf

6 replies

10h49m

2024-09-03 07:42:40 UTC

able to hold

Would "grasp" work?

octocop

4 replies

10h34m

2024-09-03 07:58:29 UTC

It's closer to grip

trashtester

2 replies

8h14m

2024-09-03 10:18:32 UTC

"zu greifen" may best translate to "to grip", but "grip" has different mental connotations in English (it refers to mental stability, not intellectual insight).

The best dual purpose translation of "zu greifen"/"gripe" (German/Scandinavian) meaning "zu begreifen"/"begripe"/"understand" would be "to grasp", which covers both physically grabbing into something and also to understand it intellectually.

All these words stem back to the Proto-Indo-European gʰrebʰ, which more or less completes the circle back to "grep".

lordgrenville

1 replies

6h37m

2024-09-03 11:54:46 UTC

related to "grok"?

trashtester

0 replies

5h33m

2024-09-03 12:58:54 UTC

grok /ɡrɒk/

Origin 1960s: a word invented by Robert Heinlein (1907–88), American author.

n_plus_1_acc

0 replies

9h51m

2024-09-03 08:41:31 UTC

I've always related grep to grab

actionfromafar

0 replies

7h49m

2024-09-03 10:43:18 UTC

Yes. "Grasping for straws."

ManuelKiessling

0 replies

10h40m

2024-09-03 07:51:59 UTC

Which leads to "begreifbar", which I would explain/translate (badly) with "something is begreifbar if it can be understood".

scbrg

2 replies

10h19m

2024-09-03 08:12:56 UTC

We do tar, for xfz I think you have to look to the Slavic languages :)

Anyway, to answer your question:

  $ grep -Fxf <(ls -1 /bin) /usr/share/dict/swedish 
  ack
  ar
  as
  black
  dialog
  dig
  du
  ebb
  ed
  editor
  finger
  flock
  gem
  glade
  grep
  id
  import
  last
  less
  make
  man
  montage
  pager
  pass
  pc
  plog
  red
  reset
  rev
  sed
  sort
  sorter
  split
  stat
  tar
  test
  transform
  vi

[edit]: Ironically, grep in that list is not the same word as the one OP is talking about. That one is actually based on grepp, with the double p. grep means pitchfork.

pbhjpbhj

1 replies

7h45m

2024-09-03 10:47:27 UTC

Pitchfork? As in something that might be used to search a haystack?? How delightful.

sshine

0 replies

3h24m

2024-09-03 15:08:08 UTC

Yeah, that’s one type.

Another is for turning soil at a small scale by hand (also called a cultivator, I think).

But they all have somewhat long prongs.

tripzilch

0 replies

9h35m

2024-09-03 08:56:58 UTC

I learned from bash.org that "tar -xzvf" is in German accent for "xtract ze vucking files".

vanschelven

6 replies

10h47m

2024-09-03 07:45:09 UTC

Begreppelijk (begrijpelijk) in Dutch

Cthulhu_

5 replies

9h37m

2024-09-03 08:55:10 UTC

or "Grijpbaar" (grabbable)

medstrom

4 replies

8h18m

2024-09-03 10:14:04 UTC

So Dutch/German make "begreif" a verb, for Swedish it is just a noun (that means "concept").

But "begrijpelijk" has a clone: "begriplig". An adverb based on a verb in a foreign dictionary. There is no verb that goes "begreppa", it's just "greppa".

trashtester

1 replies

8h7m

2024-09-03 10:25:33 UTC

"Jag kan inte begripa svenska."

medstrom

0 replies

7h58m

2024-09-03 10:33:54 UTC

Oh, you're right.

jeroenhd

0 replies

5h10m

2024-09-03 13:22:11 UTC

Dutch also has a noun ("begrip") meaning "notion" or "understanding".

fedder

0 replies

5h42m

2024-09-03 12:49:50 UTC

The term concept itself suggests grasping or holding/taking hold of, see the latin verb concipio or adjective conceptus.

elygre

6 replies

11h20m

2024-09-03 07:12:34 UTC

Could I suggest that greppbarhet is more precisely translated as “the ability of being understood”?

(Norwegian here. Our languages are similar, but we miss this one.)

medstrom

3 replies

8h1m

2024-09-03 10:31:33 UTC

Norwegian still translates grep as "grip"/"grab". I always thought of grepping as reaching in with a hand into the text and grabbing lines. That association is close at hand (insert lame chuckle) for German and English speakers too.

pbhjpbhj

2 replies

7h47m

2024-09-03 10:45:29 UTC

In English that association is going to depend a lot on one's accent; until now I've never associated grep-ing with anything other than using grep! (But, equally, that might just be a me thing.)

medstrom

0 replies

6h2m

2024-09-03 12:30:16 UTC

What about groping? Groping around for text.

bee_rider

0 replies

6h49m

2024-09-03 11:42:49 UTC

It doesn’t sound anything like grip in my accent but for some reason the association has always been there for me. Grabbing or ripping parts from the file.

psychoslave

1 replies

10h2m

2024-09-03 08:30:13 UTC

So, at the extrem opposite of the esoteric "general regular expression print" that grep stands for with few ever knowing it?

johncoltrane

0 replies

8h48m

2024-09-03 09:43:52 UTC

s/general/global

octocop

2 replies

10h31m

2024-09-03 08:00:54 UTC

And we also have "begrepp", which is also a spin on content and understanding it's content.

majewsky

1 replies

8h41m

2024-09-03 09:50:54 UTC

Oh, that's like German "begreifen", no? (Which means "to grok".)

medstrom

0 replies

8h34m

2024-09-03 09:57:37 UTC

Grok is right! I'd translate Swedish "greppbar" directly as "grokkable"; "att greppa" as "to grok".

TeMPOraL

2 replies

10h15m

2024-09-03 08:17:32 UTC

Which is ironic, given that the article is about making it easier to use grep in order to avoid having to understand anything.

bob88jg

1 replies

9h2m

2024-09-03 09:29:38 UTC

Nah, you've got it backwards. The article isn't about dodging understanding - it's about making it way easier to spot patterns in your code. And that's exactly how you start to really get what's going on under the hood. Better searching = faster learning. It's like having a good map when you're exploring a new city

TeMPOraL

0 replies

4h35m

2024-09-03 13:57:04 UTC

The article advocates making code harder to understand for the sake of better search. It's like forcing a city to conform to a nice, clean, readable map: it'll make exploring easier for you, at the cost of making the city stop working.

layer8

0 replies

7h54m

2024-09-03 10:37:47 UTC

Graspability. ;)

More customarily: intelligibility.

skrebbel

20 replies

9h10m

2024-09-03 09:21:48 UTC

The second point here made me realize that it'd be super useful for a grep tool to have a "super case insensitive" mode which expands a search for, say, "FooBar|first_name" to something like /foo[-_]?bar|first[-_]?name/i, so that any camel/snake/pascal/kebab/etc case will match. In fact, I struggle to come up with situations where that wouldn't be a great default.

hnben

5 replies

6h36m

2024-09-03 11:56:10 UTC

"super case insensitive"

lets say someone would make a plugin for their favorite IDE for this kind of search. How would the details look like?

To keep it simple, lets assume we just do the super-case-insensitivity, without the other regex condition. Lets say the user searches for "first_name" and wants to find "FirstName".

one simple solution would be to have a convention where a word starts or ends, e.g. with " ". So the user would enter "first name" into the plugin's search field. The plugin turns it into "/first[-_]?name/i" and gives this regexp to the normal search of the IDE.

another simple solution would be to ignore all word boundaries. So when the user enters "first name", the regexp would become "/f[-_]?i[-_]?r[-_]?s[-_]?t[-_]?n[-_]?a[-_]?m[-_]?e[-_]?/i". Then the search would not only be super-case-insensitive, but super-duper-case-insensitive. I guess the biggest downside would be, that this could get very slow.

I think implementing a plugin like this would be trivial for most IDEs, that support plugins.

Am I missing something?

marcosdumay

1 replies

3h43m

2024-09-03 14:48:40 UTC

The best way would be to make an escape code that matches zero or one punctuation.

So you's search for "/first\_name/i".

Izkata

0 replies

1h16m

2024-09-03 17:16:34 UTC

That already exists as "?" and was used in their example:

  /first[-_]?name/i

Or to use your example, just checking for underscores and not also dashes:

  /first_?name/i

Backslash is already used to change special characters like "?" from these meanings into just "use this character without interpreting it" (or the reverse, in some dialects).

skrebbel

0 replies

5h26m

2024-09-03 13:06:09 UTC

Hm I'd go even simpler than that. Notably, I'd not do this:

So the user would enter "first name" into the plugin's search field.

Why wouldn't the user just enter "first_name" or "firstName" or something like that? I'm thinking about situations like, you're looking at backend code that's snake_cased, but you also want it to catch frontend code that's camelCased. So when you search for "first_name" you automagically also match "firstName" (and "FirstName" and "first-name" and so on). I wouldn't personally introduce some convention that adds spaces into the mix, I'd simply convert anything that looks snake/kebab/pascal/camel-cased into a regex that matches all 4 forms.

Could even be as stupid as converting "first_name" or "firstName", or "FirstName" etc into "first_name|firstname|first-name", no character classes needed. That catches pretty much every naming convention right? (assuming it's searched for with case insensitivity)

inanutshellus

0 replies

6h22m

2024-09-03 12:10:32 UTC

IIUC, you're not missing anything though your interpretation is off from mine*. He wasn't saying it'd be hard, he was saying it should be done.

* my understanding was simply that the regex would (A) recognize `[a-z][A-Z]` and inject optional _'s and -'s between... and (B) notice mid-word hyphens or underscores and switch them to search for both.

__MatrixMan__

0 replies

6h6m

2024-09-03 12:26:26 UTC

Shame on me for jumping past the simple solutions, but...

If you're going that far, and you're in a context which probably has a parser for the underlying language ready at hand, you might as well just convert all tokens to a common format and do the same with the queries. So searches for foo-bar find strings like FooBar because they both normalize to foo_bar.

Then you can index by more than just line number. For instance you might find "foo" and "bar" even when "foo = 6" shows up in a file called "bar.py" or when they show up on separate lines but still in the same function.

adammarples

3 replies

8h7m

2024-09-03 10:24:41 UTC

Fzf?

setopt

2 replies

6h51m

2024-09-03 11:40:56 UTC

Fuzzy search is not the same. For instance, it might by default match not only “FooBar” and “foo_bar” but also e.g. “FooQux(BarQuux)”, which in a large code base might mean hundreds of false positives.

mgkimsal

1 replies

6h30m

2024-09-03 12:02:03 UTC

Ideally there'd be some sort of ranking or scoring that would happen to sort by. FooQux(BarQuux) would seemingly rank much lower then FooBar when searching for FooBar or "Foo Bar" but might still be useful in results if ranked and displayed lower.

setopt

0 replies

5h10m

2024-09-03 13:22:26 UTC

Indeed, that's a good solution – and I believe e.g. fzf does some sort of ranking by default. The devil is however in the details:

One minor inconvenience is that the scoring should ideally be different per filetype. For instance, Python would count "foo-bar" as two symbols ("foo minus bar") whereas Lisp would count it was one symbol, and that should ideally result in different scores when searching for "foobar" in both. Similarly, foo(bar) should ideally have a lower different score than "foo_bar" for symbol search even though the keywords are separated by the same number of characters.

I think this can be accomodated by keeping a per-language list of symbols and associated "penalties", which can be used to calculate "how far" keywords are from each other in the search results weighted by language semantics :)

WizardClickBoy

3 replies

8h10m

2024-09-03 10:22:29 UTC

This reminds me of the substitution mode of Tim Pope's amazing vim plugin [abolish](https://github.com/tpope/vim-abolish?tab=readme-ov-file#subs...)

Basically in vim to substitute text you'd usually do something with :substitute (or :s), like:

:%s/textToSubstitute/replacementText/g

...and have to add a pattern for each differently-cased version of the text.

With the :Subvert command (or :S) you can do all three at once, while maintaining the casing for each replacement. So this:

textToSubstitute

TextToSubstitute

texttosubstitute

:%S/textToSubstitute/replacementText/g

...results in:

replacementText

ReplacementText

replacementtext

User23

1 replies

6h0m

2024-09-03 12:32:23 UTC

The Emacs replace command[1] defaults to preserving UPCASE, Capitalized, and lowercase too.

[1] https://www.gnu.org/software/emacs/manual/html_node/emacs/Re...

tambourine_man

0 replies

5h44m

2024-09-03 12:48:27 UTC

Of course it does. Or it wouldn’t be Emacs

WizardClickBoy

0 replies

6h18m

2024-09-03 12:14:08 UTC

Also just realised while looking at the docs it works for search as well as replacement, with:

:S/textToFind

matching all of textToFind TextToFind texttofind TEXTTOFIND

But not TeXttOfFiND.

Golly!

dominicrose

1 replies

6h7m

2024-09-03 12:25:07 UTC

Let's say you have a FilterModal component and you're using it like this: x-filter-modal

Improving the IDE to find one or the other by searching for one or the other is missing the point or the article, that consistency is important.

I'd rather have a simple IDE and a good codebase than the opposite. In the example that I gave the worst thing is that it's the framework which forces you do use these two names for the same thing.

skrebbel

0 replies

5h25m

2024-09-03 13:07:35 UTC

My point is that if grep tools were more powerful we wouldn't need this very particular kind of consistency, which gives us the very big benefit of being allowed to keep every part of the codebase in its idiomatic naming convention.

I didn't miss the point, I disagreed with the point because I think it's a tool problem, not a code problem. I agree with most other points in the article.

boxed

1 replies

5h48m

2024-09-03 12:44:03 UTC

I think Nim has this?

archargelod

0 replies

2h46m

2024-09-03 15:46:10 UTC

Nim comes bundled with a `nimgrep` tool [0], that is essentially grep on steroids. It has `-y` flag for style insensitive matching, so "fooBar", "foo_bar" and even "Foo__Ba_R" can be matched with a simple "foobar" pattern.

The other killer feature of nimgrep is that instead of regex, you can use PEG grammar [1]

  [0] - https://nim-lang.github.io/Nim/nimgrep.html
  [1] - https://nim-lang.org/docs/pegs.html

msmolkin

0 replies

2024-09-03 18:28:49 UTC

Hey, I just created a new tool called Super Grep that does exactly what you described.

I implemented a format-agnostic search that can match patterns across various naming conventions like camelCase, snake_case, PascalCase, kebab-case. If needed, I'll integrate in space-separated words.

I've just published the tool to PyPI, so you can easily install it using pip (`pip install super-grep`), and then you just run it from the command line with `super-grep`. You can let me know if you think there's a smarter name for it.

Source: https://www.github.com/msmolkin/super-grep

Groxx

0 replies

25m

2024-09-03 18:07:06 UTC

fwiw I pretty frequently use `first.?name` - the odds of it matching something like "FirstSname" are low enough that it's not an issue, and it finds all cases and all common separators in one shot.

abc-1

12 replies

14h41m

2024-09-03 03:51:15 UTC

A lot of this reads like code search tools could and should be a lot better. They probably will be with AI finding its way into everything. In the old days, people would Hungarian prefix types, but now the IDE mitigates that with color codes.

klodolph

5 replies

14h38m

2024-09-03 03:53:46 UTC

Do you have some ideas for how to make code search better?

Right now, code search is basically just text search. If you think code search tools “could and should” be a lot better, what kind of improvements are you thinking about? How would those improvements work?

Terr_

2 replies

14h27m

2024-09-03 04:04:37 UTC

Not OP, but we wouldn't need to worry so much about picking out distinct greppable names if (big if) there were tools that parsed the code to draw out concepts for us, ex:

1. The popular "Find Usages" which varies widely in accuracy and reliability by language, IDE, and codebase meta-quirks.

2. Tools that show Callee/Caller trees, and sometimes possible data-flows between variables.

3. DSLs to search hierarchies, like how XPath lets you find XML elements based on nesting, rather than relying on a distinctly greppable single tag-name for the leaf you're interested in. (e.g. `<Product><Name>` vs `<ProductName>`)

When things go well, the actual variable name no longer needs to restate certain aspects and relationships that can instead be found through metadata.

For example, `GiftCard.purchaser_customer_uuid` is nicely greppable, but you could relax that to `GiftCard.purchaser` if it had a static type of `UUID<Customer>`. Or perhaps you could go to the `Customer.uuid` definition and say "Show me all variables that can populate or be-populated-by this one, up to X steps out, and excluding ones that are function scoped."

That said, I do advocate for "greppability" as a general practice, since I seldom trust that languages, tools, or institutions will come together in a way that makes it unnecessary.

klodolph

0 replies

13h5m

2024-09-03 05:26:44 UTC

I guess I wasn’t thinking of “find usages”, but as the article points out, it’s hard to find usages if the usages are dynamic.

The solution—to write code which is less dynamic—helps code search and features like find usages.

alexpovel

0 replies

12h10m

2024-09-03 06:22:29 UTC

Regarding your third point, I put together a tool capable of that to some degree.

It allows you to grep inside source code, but limit the search to e.g. “only docstrings inside class definitions”, among other things. That is, it allows nesting and is syntax aware. That example is for Python, but the tool speaks more languages (thanks to treesitter).

https://github.com/alexpovel/srgn/blob/main/README.md#multip...

dragonwriter

0 replies

13h56m

2024-09-03 04:36:14 UTC

Right now, code search is basically just text search.

We have lots of code search that is much more syntax-aware than just text search, but it tends to be behind very limited UI, because we have all the tech to do much better code search, but no one has come up with a generally-usable UI for it, so we just have very specific instances -- like "go to definition", "find references" , etc.

That takes all the same technological bits that would be need for, say, "find all definitions of functions visible in the current scope whose name starts with 'ban'" or "find all definitions of int8 constants visible in the current scope"...but what's the UI that makes that kind of searching outside of the kind of special cases now behind their own IDE menu items usable?

abc-1

0 replies

14h26m

2024-09-03 04:05:39 UTC

Vector embeddings.

ddfs123

5 replies

14h35m

2024-09-03 03:56:42 UTC

Unless you have syntax-aware grep support, I don't see how searching nested key json could be better. But grep is the default installed. Not to mention ad-hoc languages that does not have any IDE support.

uasi

1 replies

14h13m

2024-09-03 04:18:52 UTC

gron makes nested JSON greppable https://github.com/tomnomnom/gron

hoherd

0 replies

5h56m

2024-09-03 12:35:59 UTC

`gron` is so underrated. Usually when I try to show people who useful it is they don't seem to understand how powerful it is. One common use is showing how to customize only one part of a helm chart by checking values of an already installed chart:

    $ helm get values -n $NS $DEPLOYMENT -o json | gron | grep resources | gron -u | json-to-yaml.py
    elasticsearch:
      client:
        resources:
          limits:
            cpu: 3
            memory: 4Gi
          requests:
            cpu: 1
            memory: 2Gi
      data:
        resources:
          limits:
            cpu: 6
            memory: 6Gi
          requests:
            cpu: 200m
            memory: 2Gi
    fluentd:
      resources:
        limits:
          memory: 768Mi
        requests:
          memory: 384Mi

That snip could be provided to another team or a customer as a yaml file that could be included with `helm upgrade -f whatever.yaml`. This is soooo much easier than digging that limited set of data out of the much more detailed data.

abc-1

1 replies

14h27m

2024-09-03 04:05:28 UTC

If you put a lot of arbitrary constraints to not allow it to be better, sure. Enjoy.

medstrom

0 replies

7h18m

2024-09-03 11:13:53 UTC

There is no conflict between improving tools and learning how to express your code in such a way that as many tools as possible work better OOTB.

NavinF

0 replies

14h3m

2024-09-03 04:28:45 UTC

ad-hoc languages

This is self-inflicted.

JoshTriplett

11 replies

14h46m

2024-09-03 03:45:57 UTC

This is the reason many coding styles and tools (including the Linux kernel coding style and the default Rust style as implemented in rustfmt) do not break string constants across lines even if they're longer than the desired line length: you might see the string in the program's output, and want to search for the same string in the code to find where it gets shown.

knodi123

10 replies

14h30m

2024-09-03 04:01:49 UTC

My team drives me bonkers with this. They hear the general principle "really long lines of code are bad", but extrapolate it to "no characters shall pass the soft gutter no matter what".

Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines. Now when you see that list of structs, you wonder "why is this one different?" and you have to read carefully to determine, nope, it just contained one longer string. Or god forbid the reformat all the structs to match, turning a 1-page file into 3 pages, and making it so you have to read and understand each element of each struct just to see what's going on.

If I could have written the rule of thumb, I would have said "No logic or control shall happen after the end of the gutter." But if there's a paragraph-long string on one line- who cares?? We all have a single keystroke that can toggle soft-wrap, and the odds that you're going to need to know anything about that string other than "it's a long string" are virtually nil.

Sorry. I got triggered. :-)

edflsafoiewq

2 replies

11h18m

2024-09-03 07:14:00 UTC

This is world autoformatters have wrought. The central dogma of the autoformatter is that "formatting" is based on dumb syntactic rules with no inflow of imprecise human judgements.

scrollaway

1 replies

11h6m

2024-09-03 07:26:31 UTC

Most autoformatters do not reformat string constants as GP has said, and even if they did, this is something that can be much more accurately and correctly specified with an AF than with a human.

Autoformatting collectively saves probably close to millions of work hours per year in our industry, and that’s at the current adoption. Do you think it’s productive to manually space things out, clean up missing trailing commas and what not? Machines do it better.

edflsafoiewq

0 replies

10h54m

2024-09-03 07:37:57 UTC

Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines.

Autoformatters absolutely do this. They do not understand considerations like symmetry.

I am doubtful as to the costs of "somewhere in the codebase there is a missing trailing comma".

arp242

2 replies

13h50m

2024-09-03 04:42:13 UTC

This is why autoformatters that frob with line endings are just terrible and fundamentally broken.

I'm fairly firmly in the "wrap at 80" camp by the way; but sometimes a tad longer just makes sense. Or shorter for that matter: forced removal of line breaks is just as bad.

jimmaswell

1 replies

11h55m

2024-09-03 06:37:22 UTC

80 feels really impractically narrow. A project I work on uses 110 because it's approximately the widest you can comfortably compare two revisions on the same monitor, or was for some person at some time, and I can live with it, but any less would just feel so cramped. A few indentation levels deep and I'd be writing newspaper columns.

NotMichaelBay

0 replies

10h42m

2024-09-03 07:49:48 UTC

There is usually a way to restructure the code so that it doesn't have multiple levels of nested indentation, which is a good practice IMO because it makes the code easier to read.

BigJono

1 replies

14h15m

2024-09-03 04:16:56 UTC

Yep this triggers the fuck out of me too. It drives me absolutely insane when I'm taking the time and effort to write good test cases that use inline per test data that I've taken the time to format so it's nice and readable for the next person, then the next person comes along, spends 30 seconds writing some 2 line rubbish to hit a code coverage metric, then spends another 60 seconds adding a linter rule that blows all the test data out to 400 lines of unreadable dogshit that uses only the left 15% of screen real estate.

port19

0 replies

9h40m

2024-09-03 08:51:42 UTC

I routinely spot 3-line prints with the string on its own line in our code. Even for cases where the string + print don't even reach the 80 character "limit"

yas_hmaheshwari

0 replies

14h23m

2024-09-03 04:09:30 UTC

My team also had a similar thing in place. I am saving this article in my pocket saves, so that I can give "proofs" of why this is better

From Zen of Python: ``` Special cases aren't special enough to break the rules. Although practicality beats purity. ``` https://peps.python.org/pep-0020/

EasyMark

0 replies

5h15m

2024-09-03 13:16:55 UTC

I have been places where we allow long strings, but other things aren’t allowed and generally 80 to 100 char limits otherwise. I like 100 for c++/java and 80 for C. If it gets much longer than that (not being strings) then it’s time for a rethink in most cases, grouping/scoping symbols are getting too deep. I’m sure other languages may or may not have that as a reasonable argument. It is just a rule of thumb though.

jackphilson

6 replies

14h11m

2024-09-03 04:20:38 UTC

I wonder - why isn't this talked about more? We have had tens of thousands of software companies, each with probably a dozen people focused on hyperoptimizing everything. Why hasn't this point been talked about more on the internet to the point where it's obvious today? And it's not specifically about this, it's more in general. Do people just learn this on their own, and not say anything? Or is the discussion related to this topic buried in some old forum somewhere?

mrkeen

5 replies

12h16m

2024-09-03 06:15:50 UTC

It's talked about, just in the opposite direction.

I've left hardcoded strings (think Kafka event type names) in my source for this very reason, but after a round of code review they get squirreled away as constants in separate files because string repetition is bad or something.

jimmaswell

4 replies

11h48m

2024-09-03 06:43:51 UTC

Without constants, it's too easy to let a typo sneak in or have inconvenience later replacing one "event" but not replacing an unrelated "event". I'll only do it if the string is used two times at most, but usually I'll make a constant the first time and it doesn't feel like any loss.

mrkeen

3 replies

11h35m

2024-09-03 06:57:20 UTC

Yes, this is exactly what I was fighting against.

If I have three classes that interact with "MyTable", then I can grep for places that interact with "MyTable" and I get back three classes.

After refactoring, the class which now knows about "MyTable" is Constants.java, which has no business knowing about "MyTable". Grepping it now turns up a false-positive and finds 0 of the actual usage sites (3 false-negatives).

philipwhiuk

0 replies

7h48m

2024-09-03 10:44:02 UTC

`Constants.java` is a massive code-smell (which I have in many projects, but it's still a smell).

The file name is awful.

At worst it should be 'DbConstants' but probably they should be defined elsewhere.

NotMichaelBay

0 replies

10h5m

2024-09-03 08:26:53 UTC

It's not exactly a false positive. It's just a level of indirection, 1 more search by the constant name to find usages. What you sacrifice there you gain by having the compiler help find typos and the IDE help with autocompletion.

GeneralMayhem

0 replies

10h44m

2024-09-03 07:47:36 UTC

Sure, but now you have the string constant as a symbol, which you can either grep for (in which case you're delayed by one search, not the end of the world if you were going to unwind callstacks anyway) or, if you have an LSP, you can jump directly from it to users...

dwh452

5 replies

5h24m

2024-09-03 13:08:04 UTC

This sounds like the advice to prefer the variable name 'ii' over 'i' because you can easily search for it. I loath such advice because it causes the code to become ugly. Similarly, there are 'YODA Conditions' which make code hard to comprehend which solves an insignificant error that is easily caught with tooling. The problem with advice like these is you will encounter deranged developers that become obsessive about such things and make the code base ugly trying to implement dozens of style rules. Code should look good. Making a piece of text look good for other humans to comprehend I consider to be job #1 or #2 for a good developer.

moolcool

0 replies

5h18m

2024-09-03 13:13:45 UTC

The problem with advice like these is you will encounter deranged developers that become obsessive about such things and make the code base ugly trying to implement dozens of style rules

That's more of a "deranged developer" problem than a problem with the guidelines themselves. E.g. I think his `getTableName` example is quite sensible, but also one which some dogmatic engineers would flag and code-golf down to the one-liner.

marcosdumay

0 replies

3h32m

2024-09-03 14:59:36 UTC

Those things only make the codebase "ugly" until you learn how to read it.

inetknght

0 replies

4h59m

2024-09-03 13:33:32 UTC

This sounds like the advice to prefer the variable name 'ii' over 'i' because you can easily search for it

I've never heard of that advice. I honestly like algebraic names (singular digits) as long as they're well documented in a comment or aliasing another longer-name.

there are 'YODA Conditions' which make code hard to comprehend which solves an insignificant error that is easily caught with tooling

Yoda conditions [0] are a useful defensive programming technique and does not reduce readability except to someone new to it. I argue it improves readability, particularly for myself.

As for tooling... it doesn't catch every case for every language.

I loath such advice because it causes the code to become ugly.

Beauty is in the eye of the beholder. While I appreciate your opinion, I also reject it out of hand for professional developers. Instead of deciding whether code is "ugly" perhaps you should decide whether the code is useful. Feel free to keep your pretty code in your personal projects (and show them off so you can highlight how your style really comes together for that one really cool thing you're doing).

you will encounter deranged developers that become obsessive about such things

I don't like being called deranged but I am definitely obsessed about eliminated whole classes of bugs just by the coding design and style not allowing them to happen. If safe code is "ugly" to you... well then I consider myself to be a better developer than you. I'd rather have ugly code that's easily testable instead of pretty code that's difficult to test in isolation which most developers end up writing.

Code should look good. Making a piece of text look good for other humans to comprehend I consider to be job #1 or #2 for a good developer.

It depends on the project. Just remember that what looks good to you isn't what looks good to me. So if it's your personal project, then make it look good! If it's something we're both working on... then expect to defend your stylistic choices with numbers and logic instead of arguments about "pretty".

Then, from the article:

Flat is better than nested

If I'm searching for something in JSON I'm going to use jq [1] instead of grep. Use the right tools for the right job after all. I definitely prefer much richer structured data instead of a flat list of key-value pairs.

[0] https://en.wikipedia.org/wiki/Yoda_conditions

[1] https://en.wikipedia.org/wiki/Jq_(programming_language)

antifa

0 replies

3h46m

2024-09-03 14:45:47 UTC

the advice to prefer the variable name 'ii' over 'i' because you can easily search for it

\bi\b is the easy way to search for i.

ajuc

0 replies

5h11m

2024-09-03 13:21:16 UTC

'ii' over 'i'

You don't need to search for local variables, nobody names global variables "i" - so the "ii" advice is pointless.

You often do need to search for places where global stuff is referenced, and while IDEs can help with that - the same things that break grepability often break "find references" in IDE. For example if you dynamically construct function names to call, play with reflections, preproccessor, macros, etc.

So it's a good advice to avoid these things.

you will encounter deranged developers that become obsessive about such things and make the code base ugly

You can abuse any rule, including

Code should look good.

and I'd argue the more general a rule is - the more likely it is to be abused. So I prefer specific rules like "don't construct identifiers dynamically" to general "be good" rules.

dblotsky

5 replies

14h35m

2024-09-03 03:57:18 UTC

Hard agree with the idea of greppability, but hard disagree about keeping names the same across boundaries.

I think the benefit of having one symbol exist in only one domain (e.g. “user_request” only showing up in the database-handling code, where it’s used 3 times, and not in the UI code, where it might’ve been used 30 times) reduces more cognitive load than is added by searching for 2 symbols instead of 1 common one.

Noumenon72

1 replies

13h47m

2024-09-03 04:45:02 UTC

Not to mention the readability hit from identifiers like foo.user_request in JavaScript, which triggers both linters and my own sense of language convention.

emn13

0 replies

9h32m

2024-09-03 09:00:26 UTC

Both of those are easy to fix. You'll adapt quickly if you pick a different convention.

Additionally, I find that in practice such "unusual" code is actually beneficial - it often makes it easy to see at a glance that the code is somehow in sync with some external spec. Especially when it comes to implicit usages such as in (de)serialization, noticing that quickly is quite valuable.

I'd much rather trash every languages' coding conventions than use subtly different names for objects serialized and shared across languages. It's just a pain.

runevault

0 replies

10h8m

2024-09-03 08:23:47 UTC

Probably depends on how your system is structured. if you know you only want to look in the DB code, hopefully it is either all together or there is something about the folder naming pattern you can take advantage of when saying where to search to limit it.

The upside to doing it this way is it makes your grepping more flexible by allowing you to either only search the one part of the codebase to see say DB code or see all the DB and UI things using the concept.

plorkyeran

0 replies

13h43m

2024-09-03 04:48:56 UTC

I’ve also found that I sometimes really like when I grep for a symbol and hit some mapping code. Just knowing that some value goes through a specific mapping layer and then is never mentioned again until the spot where it’s read often answers the question I had by itself, while without the mapping code there’d just be no occurrences of the symbol in the current code base and I’d have no clue which external source it’s coming from.

gregjor

0 replies

8h54m

2024-09-03 09:37:38 UTC

I have mixed thoughts on this too. Fortunately grep (rg in my case) easily handles it:

rg -i ‘foo.?bar’ finds all of foo_bar, fooBar, and FooBar.

amingilani

5 replies

14h8m

2024-09-03 04:24:15 UTC

I agree that code searchability is a good thing but I disagree with those examples. They intentionally increase the chance of errors.

Maybe there’s an alternative way to achieve what the author set out but increasing searchability at the cost of increasing brittleness isn’t it for me.

In this example:

const getTableName = (addressType: 'shipping' | 'billing') => { return `${addressType}_addresses` }

The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.

const getTableName = (addressType: 'shipping' | 'billing') => { if (addressType === 'shipping') { return 'shipping_addresses' } if (addressType === 'billing') { return 'billing_addresses' } throw new TypeError('addressType must be billing or shipping') }

Similarly, flattening dictionaries for readability introduces the chance of a random typo making our lives hell. A single typo in the repetitions below will be awful.

{ "auth.login.title": "Login", "auth.login.emailLabel": "Email", "auth.login.passwordLabel": "Password", "auth.register.title": "Login", "auth.register.emailLabel": "Email", "auth.register.passwordLabel": "Password", }

Typos aren’t unlikely. In a codebase I work with, we have a perpetually open ticket about how ARTISTS is mistyped as ATRISTS in a similarly flat enum.

The issue can’t be solved easily because the enum is now copied across several codebases. But the ticket has a counter for the number of developers that independently discovered the bug and it’s in the mid two digits.

usrusr

0 replies

13h10m

2024-09-03 05:22:12 UTC

Entrenched typos like ATRISTS are actually a greppability goldmine. Chances are there are more occurrences of pluralized people who are making art in the codebase, but only ATRISTS is the one from that enum.

I certainly would not suggest deliberately mistyping, but there are places where the benefit is approaching the cost. Certain log messages can absolutely benefit from subtle letter garbling that retains readability while adding uniqueness.

peeters

0 replies

10h49m

2024-09-03 07:43:19 UTC

The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.

I think it depends on whether the repetition is accidental or intrinsic. Does the table name happen to contain the address type as a prefix, or does it intrinsically have to? Greppability aside, when things are incidentally related, it's often better to repeat yourself to not give the wrong impression that they're intrinsically related. Conversely, if they are intrinsically related (i.e. it's an invariant of the system that the table name starts with the address type as a prefix) then it's better for the code to align with that.

kaelwd

0 replies

13h9m

2024-09-03 05:23:23 UTC

REFERER moment.

ctxc

0 replies

13h44m

2024-09-03 04:48:04 UTC

Agree with you.

What happens when translation files get too big and you want to split and send only relevant parts? Like send only auth keys when user is unauthenticated?

`return translations[auth][login]` is no longer possible.

Or just imagine you want to iterate through `auth` keys. _shudders_

Noumenon72

0 replies

13h53m

2024-09-03 04:39:18 UTC

Typos are find-and-fix-once, while unsearchability is a maintenance burden forever.

I don't think coupling variable names by making sure they contain the same strings is the best way to show they're related, compared to an actual map from address type to table name. There might be a lot of things called 'shipping' in my app, only some of which are coupled to `shipping_addresses`.

Shouldn't a linter be able to catch that there is no enum member called MyEnum.ATRISTS, or is it not an actual enum?

adpirz

5 replies

14h38m

2024-09-03 03:54:28 UTC

I've seen some pretty wild conditional string interpolation where there were like 3-4 separate phrases that each had a number of different options, something akin to `${a ? 'You' : 'we'} {b ? 'did' : 'will do' } {c ? 'thing' : 'things' }`.

When I was first onboarding to this project, I was tasked with updating a component and simply tried to find three of the words I saw in the UI, and this was before we implemented a straightforward path-based routing system. It took me far too long just to find what I was going to be working on, and that's the day I distinctly remember learning this lesson. I was pretty junior, but I'd later return to this code and threw it all away for a number of easily greppable strings.

ctxc

4 replies

13h40m

2024-09-03 04:52:01 UTC

Tangential: I love it when UIs say "1 object" and "2 objects". Shows attention to detail.

As opposed to "1 objects" or "1 object(s)". A UI filled with "(s)", ughh

petepete

0 replies

12h47m

2024-09-03 05:45:12 UTC

Moreso when it's not tripped up by "1 sheeps" or "1 diagnoses".

nox101

0 replies

4h5m

2024-09-03 14:27:05 UTC

Sounds like you're going to have a bad time

https://www.foo.be/docs/tpj/issues/vol4_1/tpj0401-0013.html

gnuvince

0 replies

7h49m

2024-09-03 10:42:36 UTC

I like the more robotic "Objects: 1" or "Objects: 2", since it avoids the pluralization problems entirely (e.g., in French 0 is singular, but in English it's plural; some words have special when pluralized, such as child -> children or attorney general -> attorneys general). And related to this article, it's more greppable/awkable, e.g. `awk /^Objects:/ && $2 > 10`.

ajuc

0 replies

4h53m

2024-09-03 13:39:28 UTC

Fun fact - I had to localize this kind of logic to my language (Polish). I realized quickly it's fucked up.

This is roughly the logic:

    function strFromNumOfObjects(n) {
      if (n === 1) {
          return "obiekt";
      }
      let last_digit = (n%10);
      let penultimate_digit = Math.trunc((n%100)/10);
      if ((penultimate_digit == 0 || penultimate_digit >= 2) && last_digit > 1 && last_digit <= 4) {
          return "obiekty";
      }
      return "obiektów";
    }

Basically pluralizing words in Polish is a fizz-buzz problem :) In other Slavic languages it should be similar BTW

WalterBright

5 replies

14h17m

2024-09-03 04:15:27 UTC

That's why D has a cast keyword:

    ubyte c = cast(ubyte)i;

instead of:

    unsigned char c = (unsigned char)i;

Casts are a blunt instrument that subvert the type system, and so they need to be greppable.

Having the cast keyword also removes the grammatical ambiguities in the expression syntax.

jenadine

4 replies

12h59m

2024-09-03 05:33:20 UTC

Do you often grep for casts? I never do that.

aa-jv

0 replies

11h46m

2024-09-03 06:45:53 UTC

Try to think about why you might want to do that. It makes a lot of sense, but if you're not doing it, that might be enlightening...

WalterBright

0 replies

11h33m

2024-09-03 06:58:54 UTC

I regard every cast as a bug in my own code and try to refactor it so there aren't any. I can't get rid of all of them, but they're always worth a second look.

I don't normally grep for them, but others have told me they did.

P.S. one thing about D is you can do things like this:

    ubyte b = i;            // error, losing bits
    ubyte b = cast(ubyte)i; // ugly cast
    ubyte b = i & 0xFF;     // no cast, no error!

It's just one of the nice little details that making programming in D a pleasure.

SnowflakeOnIce

0 replies

5h53m

2024-09-03 12:38:40 UTC

When doing appsec review in C or C++, yes!

EasyMark

0 replies

5h10m

2024-09-03 13:22:10 UTC

Honestly I know I don’t do that either. I mean if there was some special case where I remembered “oh yeah I had to cast that variable in this special case”. In general I avoid casting as much as I can in C/C++, but especially in C.

traxys

4 replies

11h16m

2024-09-03 07:16:35 UTC

I read parts of the Linux kernel source code pretty often, and getting the definition of a function is often pretty involved:

- I don't always know the return code type, as the calling code assigned a field whose definition I don't know to find either

- I don't know if it's a C function or a preprocessor macro

This often results in me searching for the exact function name, and combing through the uses in the drivers. You then need to re-start all that recursively to understand the function you just read.

I could use clangd for that, but I don't have the ressources on my laptop to compile a kernel

dvh

1 replies

11h13m

2024-09-03 07:19:00 UTC

Why not simply hold Ctrl and click on the name of the function?

GeneralMayhem

0 replies

10h48m

2024-09-03 07:44:26 UTC

I don't have the resources on my laptop to compile a kernel

trussy

0 replies

8h52m

2024-09-03 09:40:10 UTC

You might find this site useful: https://elixir.bootlin.com

gregjor

0 replies

8h52m

2024-09-03 09:40:06 UTC

ctags?

semiinfinitely

3 replies

13h55m

2024-09-03 04:37:04 UTC

why two 'p's - grep only has one

wging

1 replies

13h25m

2024-09-03 05:07:10 UTC

English inserts an additional ‘p’ in some cases; for precedent consider “stoppable”, “unflappable”, “skippable”.

See https://english.stackexchange.com/questions/30001/why-is-shi...

Hackbraten

0 replies

13h21m

2024-09-03 05:11:16 UTC

mckn1ght

0 replies

13h13m

2024-09-03 05:18:39 UTC

That’s usually how it’s done with words that end in one consonant when adding a suffix that starts with a vowel, so as not to change the pronunciation of the short vowel in the root word due to english’s rules around long and short vowels. See also map->mapped, bat->batted, tap->tappable etc

leetrout

3 replies

8h19m

2024-09-03 10:13:09 UTC

I encourage my teams to write logs / output with interpolation with the variables at the end for searchability

For example:

  Added %d users

Vs:

  Added users (%d)

Then it is much easier to track down where things come from without needing wildcards in the search or to care too much about what might be dynamic in cases where its not obvious.

davemp

2 replies

8h11m

2024-09-03 10:21:21 UTC

I’ve basically landed on the following form: ‘Short description. [foo={}, bar={}]’

Which will give you grepability and in theory parsability so you can automatically bisect for a value change or something along those lines.

medstrom

0 replies

7h23m

2024-09-03 11:09:18 UTC

I like it, but it may be painful in languages where long variable names are common.

leetrout

0 replies

7h35m

2024-09-03 10:56:39 UTC

Indeed. That is basically what the logging library from charm bracelet does.

alkonaut

3 replies

10h57m

2024-09-03 07:35:08 UTC

There are of course cases of dynamic data in every language (The table name is an apt example) but usually when I look in code I just expect to be able to follow definitions. If the language doesn't reliably allow me to find "usages of this type" without risking finding another type with the exact same name then I'm already starting up my static type system compiler for the rewrite.

There are exceptions of course: when searching git logs, comments etc doesn't help what the language or IDE does.

And when searching for an unknown symbol (type, function, variable) you don't know the name of, but you know _should_ look like "DogOrder" or "OrderDog" is a common task too. In this case I'd probably search for " Dog.Order\(" or " Order.Dog\(" if I'm looking for a function. The language trait that enabled it is that method names are Pascal Case and always have an opening ( at the end. But my IDE at least lets me search for members (variables, functions) separate from type names. There should be an index in the IDE though that lets you query this data. E.g. looking for types starting with foo could be done with search t:Foo, instead of having to grep for "(struct|class) Foo" or similar. Tooling is the key.

berkes

2 replies

10h47m

2024-09-03 07:44:39 UTC

The author uses JavaScript and Python as examples. So I presume they have (most?) experience with dynamic languages.

In static languages, greppability is hardly as much as a factor. Especially with the availability of LSPs and other such tools nowadays.

When I write rust, or Java, I hardly grep, I "go to usages" or "go to definition", "rename symbol" and so on. Similar, but not to that extent, with typescript. But when coding in Javascript, Ruby or Python, no matter how fancy or language-focused an IDE is, I'll be grepping a lot. Decades of Ruby and Rails "black magic" taught me to grep for partial patterns like the author shows, too. Or to just run the code-path entirely (through tests) because the table-definition of the database will change the available methods and behaviour of the code. Yes. I know.

An LSP (or linter, or checker) can only do so much when the available code, methods, classes, behaviour can be changed or added at runtime.

alkonaut

1 replies

10h39m

2024-09-03 07:53:13 UTC

I'm happy to use dynamic languages occasionally too (Bash, Javascript, Python, ..) but I have a rule of thumb that says if I can't see the entire codebase on one screen, then it's too large for dynamic.

pistoleer

0 replies

7h23m

2024-09-03 11:09:19 UTC

Would be great if the wider industry shared that view

trey-jones

2 replies

5h3m

2024-09-03 13:29:26 UTC

As someone who almost exclusively uses grep for finding what I need in codebases that are new to me and old to me, you can make whatever arbitrary rules you want, as long as you're consistent, I'll be pretty happy with it. If syntax is loose in some area (single vs double quotes, parens or braces or none), just do the same thing every time. Whitespace consistency isn't crucial, but it can't hurt (between function name and parens, for example).

necrotic_comp

0 replies

4h45m

2024-09-03 13:47:00 UTC

Agreed. So long as the code hits performance and business goals, there doesn't need to be an emphasis put on "newness" or any other sort of vanity metric - make the code obvious, searchable, and understandable so that in a time crunch or during an outage it's easy to search and find the culprit.

causal

0 replies

4h9m

2024-09-03 14:23:15 UTC

I'm also thinking long-context LLMs are going to make this advice seem pretty archaic in a few years. They're so good at reading code and extremely useful for asking questions of a code base.

That said, I completely agree with the author on not using clever string tricks to compose identifiers. That makes code both harder to search and to read.

creesch

2 replies

11h37m

2024-09-03 06:55:02 UTC

I fully understand the point the author is making. However, I am not going to sacrifice good JSON and make it flat just so someone can search for it more easily. With the example they give, it is still readable because it is a simple data structure. But with more complex data their flat structure to me does not make it easy to parse and easier to make mistakes as well.

smartmic

1 replies

11h18m

2024-09-03 07:13:48 UTC

It's ofter a matter of having the right tool for the job. In your case, https://github.com/tomnomnom/gron might be useful.

creesch

0 replies

10h42m

2024-09-03 07:50:28 UTC

Well, I'd say that in the author's case it might be more useful. ;) I never really had the inclination to grep for data like the author does.

I generally work from an IDE anyway, where it is clear that I am working with a value that is part of a JSON object and I can follow it back to the proper structure anyway. In fact, the more I think about it, the more I feel like the article is written for a very specific use case and perspective. Almost to the point where the saying "if all you have is a hammer, everything looks like a nail" is applicable. Where if it doesn't look enough like a nail it should be adjusted to look more like one instead of expanding your toolbox a bit.

arendtio

2 replies

9h35m

2024-09-03 08:56:41 UTC

I am firmly against the suggested changes. I love grepping through code too (often using -A -B -C), but I also like browsing the code, with tools where you can just click on a function and see its definition.

However, changing how the code should be written so that grepping becomes easier is optimizing for the wrong target. It is much more important that the code is easily readable and maintainable.

In addition, some tools are designed explicitly for grepping through code (from the top of my head ack is an example). If grep doesn't work, one should try a more sophisticated tool instead of using different coding styles.

lucideer

0 replies

9h8m

2024-09-03 09:23:56 UTC

Greppability is really a proxy metric here - these changes all have other benefits even if you never grep (mostly readability tbh).

    const getTableName = (addressType: 'shipping' | 'billing') => {
        return `${addressType}_addresses`
    }

This is a simplified example but in a longer function, readability of the `return` lines would be improved as the reader wouldn't have to reference the union type (which may or may not be defined in the signature). The rewrite is also safer as it errors out if a runtime `addressType` value doesn't match the union type (above code would not throw an error, just return an indeterminate value which would cause undefined behaviour).

"Flat is better than nested" also greatly improves readability in both examples: either reading the i18n line, or reading the classname at definition / call will be more readable when the name contains full context of function.

gregjor

0 replies

9h30m

2024-09-03 09:01:39 UTC

Nothing the author wrote would necessarily make code harder to read or maintain. Consistent naming of the same thing throughout, not constructing variables or table names dynamically, etc. benefit both readers/maintainers and searching.

I understood “grepping” to mean ripgrep (rg) or ack, not just plain grep. I think programmers who use command line tools or vim know about those. VSCode uses rg.

RodgerTheGreat

2 replies

14h36m

2024-09-03 03:56:10 UTC

one of the strangest and most grep-hostile approaches to identifiers that I have ever observed is Nim ignoring both case and underscores in an effort to allow everyone to write code in their preferred style:

https://nim-lang.org/docs/manual.html#partial-caseminusinsen...

uasi

0 replies

14h18m

2024-09-03 04:13:53 UTC

Nim even provides a dedicated grep-like tool to search for identifiers regardless of the style https://nim-lang.org/docs/nimgrep.html

planetis

0 replies

4h55m

2024-09-03 13:36:41 UTC

And it works pretty well, coming from 6+ years of experience. It's not that strange if consider case insensitive filesystems and email addresses. But on the internet you only hear the opinion of the loudest minority.

x3n0ph3n3

1 replies

13h19m

2024-09-03 05:13:13 UTC

No AI tooling was used in the creation of this article.

That was refreshing.

jimmaswell

0 replies

11h45m

2024-09-03 06:46:53 UTC

It just doesn't have that genuine artisan smell to it when someone uses ̶a̶ ̶p̶r̶i̶n̶t̶i̶n̶g̶ ̶p̶r̶e̶s̶s̶ a̶ ̶c̶o̶m̶p̶u̶t̶e̶r̶ ̶w̶i̶t̶h̶ ̶a̶u̶t̶o̶m̶a̶t̶i̶c̶ ̶t̶y̶p̶e̶s̶e̶t̶t̶i̶n̶g̶ s̶p̶e̶l̶l̶c̶h̶e̶c̶k̶ W̶i̶k̶i̶p̶e̶d̶i̶a̶ AI to help write their article.

vijucat

1 replies

10h21m

2024-09-03 08:10:36 UTC

One other thing I'd like to add is greppable comments! In the same vein as TODO and FIXME, I use hashtags in comments to drop hints to future me reading the code. #learning is a universal one:

// #learning: transparent color using color.new(color.white, 100). This is GREAT for hiding plot() lines during inapplicable periods (such as when no trade is on)

But project-specific hashtags are quite useful, too.

// #60within600: bunch API calls to not hit the 60 calls within 10 minutes limit

// This memoizes fn call results to prevent #60within600

The hashtagging was inspired long ago by del.icio.us, if you remember that. https://en.wikipedia.org/wiki/Delicious_(website)

philipwhiuk

0 replies

7h51m

2024-09-03 10:41:34 UTC

Seems like you're trying to implement a ticketing system in your code to me.

If you need to prevent 60 within 600, write a test.

shahzaibmushtaq

1 replies

11h35m

2024-09-03 06:56:49 UTC

This reminds me of the good practices and guidelines in coding when I was learning to code, which also includes "proper naming" so you can easily find what you are looking for throughout the codebase.

berkes

0 replies

10h56m

2024-09-03 07:35:57 UTC

Me too.

But that's also what makes me uncomfortable when reading this article. Proper Naming is truly an "art" of balancing trade-offs.

It takes domain expertise (Ubiquitous language), understanding of the users of the code (other devs, not end-users), and a lifetime of coding f*ups where naming something wrong turned out painful to balance these.

The author gives a nice example of a dynamic table naming. But their refactoring didn't keep the behaviour the same (the else/catch). So it's hard to argue the first is better. And in this case, even without the else/catch, I'd say the latter is better. But there will be cases where greppability is to balanced with readability, testability or refactorability. And in these cases, for me, greppability comes last.

poikroequ

1 replies

5h11m

2024-09-03 13:20:47 UTC

Grep is nice, but I would much prefer better tools for searching through code. Something that knows how to parse multiple languages and can infer the types of things. Not to mention indexing, for large code bases, grep'ing through possibly millions of lines of code can be awfully slow.

IDEs do a decent job but are typically lacking compared to the raw power of grep.

packetlost

0 replies

5h8m

2024-09-03 13:23:55 UTC

I mean, I prefer faster symbol, type, etc. based navigation too, but it doesn't work in all scenarios so grep is an extremely handy fallback.

larsrc

1 replies

11h3m

2024-09-03 07:29:02 UTC

As an avid grepper, I disagree with most of these specific recommendations. Use a tool that actual understands references. Don't make the code harder to read for humans just to please grep.

As for identifiers, use 'foo.?bar' case-insensitively.

medstrom

0 replies

7h15m

2024-09-03 11:16:38 UTC

Which of the examples are harder to read for humans?

jayd16

1 replies

3h46m

2024-09-03 14:45:58 UTC

I'm a big proponent of visual scripting (where it makes sense) but you really do miss the text-based tooling like grep.

One trade-off you can make is using text-based serialization so you're at least able to grep the yaml or JSON or whatever and get to the right file at least. This of course costs you some editor load time.

On the flip side you're basically always using an IDE to edit the visual script. In theory symantic search should be possible and built in although reality usually falls short.

Someone in a previous HN thread mentioned the idea of a standard graph syntax. Something that game engines and tools could store their graph-based assets in. If there was a standard syntax then standard tools could be made and we could end up seeing something like a graph grep. One one could imagine a visual studio graph editor type app with plug-in support. Even a standard merge tool would be a huge step up for non-text-based code assets.

A man can dream!

wizzwizz4

0 replies

3h40m

2024-09-03 14:51:38 UTC

There is a standard graph syntax: Graphviz DOT notation.

eterevsky

1 replies

9h14m

2024-09-03 09:17:50 UTC

This also applies to dependency injection. While it has significant benefits, it hurts clarity of the code. It becomes more difficult to see where each object is coming from.

mrkeen

0 replies

3h14m

2024-09-03 15:18:32 UTC

Magical dependency injection frameworks, that is.

Plain old putting-dependencies-in-the-constructor-instead-of-newing-them is great.

If you 'wire' it yourself, you see the top-level structure of the project in main, e.g.

  cache          <- createCache "./cache"
  workQueue      <- createWorkQueue parallelism
  projectFinder  <- createProjectFinder basePath
  gradleBuilder  <- createGradleBuilder cache
  normaliser     <- createNormaliser
  gradleParser   <- createGradleParser normaliser
  relationFinder <- createRelationFinder cache normaliser

At a glance I can see what uses normaliser, and what is used by normaliser.

Timwi

1 replies

5h8m

2024-09-03 13:23:57 UTC

All of the apostrophes in this article are wrong. The correct character is ’, but this article uses ‘ (open single quote) throughout.

riz_

0 replies

3h13m

2024-09-03 15:19:12 UTC

Thanks, fixed.

wrsh07

0 replies

7h55m

2024-09-03 10:36:39 UTC

Nice article. Two notes:

First, some of these suggestions will make it harder to introduce bugs when updating the code. That's good! Particularly tricky is when somebody splits up identifiers or function names. These types of things often occur at boundaries (calls between servers or to the db) which can make them tricky to test. Even if all your identifier combining is initially done in a single file, it's easy for someone to see the final shape of the identifier and accidentally hard code it somewhere else.

Second, In the spirit of Titus Winters' "software engineering is programming over time", a codebase should be greppable over time.

That means that if you rename a function, you might consider saving the old name of the function in a comment.

whirlwin

0 replies

12h3m

2024-09-03 06:29:07 UTC

Code grepping at build time can be useful.

Grepping at at runtime, if you can call it that, is also very powerful. If you have a binary, either your company or a third party one, but don't have the source code easily available, I have used the `strings` program from GNU binutils which shows tokens in binary code, e.g. hardcoded URLs, credentials and so on. It can also be useful for analyzing certain things in memory.

welder

0 replies

3h6m

2024-09-03 15:25:49 UTC

Also import traceability

trilbyglens

0 replies

6h31m

2024-09-03 12:00:53 UTC

Imo this is another big selling point of using tailwinds css. Those log stacks of classes become almost like UUIDs for markup that's discoverable from dev tools.

tlb

0 replies

1h41m

2024-09-03 16:51:31 UTC

An editor feature I'd like is that as I'm typing an identifier, a hover popup shows me how many other instances appear in my codebase. It should be easy to build a map of identifier->count for instant lookups. I generally know if I want 0 (a new unique identifier), 1, a small number, or a large number. For a few pixels, this would prevent a lot of dumb mistakes and ambiguous names.

t43562

0 replies

8h39m

2024-09-03 09:52:52 UTC

astgrep is a very useful tool when grep fails: https://ast-grep.github.io/

It's not as easy to use as grep but I think one can script it to be nearly so. It has huge power but without learning it all one can do searches that grep finds difficult. e.g. finding all the locations where a method is called and showing the parameters even if they are on multiple lines. You can then use rewrite rules to do CLI code refactoring.

I think it also has potential in a build toolchain e.g. to look for patterns you want to discourage as a pre-commit hook.

ultragrep - https://github.com/zendesk/ultragrep - I don't love this quite as much but it does have a way to build indexes so you can do fast greps across a big codebase. It also has a text mode UI if you want it and I find that almost worthwhile.

I use ripgrep most of the time but while I like it, there is a limit to how many grep tools I can remember and I should probably cut down to using ultragrep and astgrep.

plain gnu grep itself is something one has to know when one is on an unfamiliar machine.

svennidal

0 replies

5h7m

2024-09-03 13:24:58 UTC

This approach along with ack, instead of grep, has been a godsend to me.

settsu

0 replies

3h22m

2024-09-03 15:09:46 UTC

I'm ardently in favor of making code human readable as practically as possible. Personally it follows on my personal Rule #1: Be kind (i.e., in this case, to others and your future self.)

However, searchable =/= greppable.

Flat is better than nested

Context matters but, generally speaking, I would say that flatter is anti-grep.

rezaprima

0 replies

13h5m

2024-09-03 05:27:17 UTC

I had been bitten by ruby's metaprogramming on this.

recursivecaveat

0 replies

14h31m

2024-09-03 04:01:22 UTC

A small, but underappreciated benefit of grammar changes like from the form `mytype myfun()` to `keyword myfun() sigil mytype`

ralusek

0 replies

14h40m

2024-09-03 03:52:19 UTC

Especially in untyped languages, working with an old or unfamiliar codebase, sometimes the only way to know "was anything else using this code" is just to search for the name of a function or whatever.

r34

0 replies

7h14m

2024-09-03 11:18:01 UTC

Good point. I would refer to another (similar) metric, which could be called "IDE-search-ability): it extends greppability, by adding some more conventions which work well with your (your company's) IDE.

qwertox

0 replies

9h48m

2024-09-03 08:44:17 UTC

I use greppable strings explicitly, like

  requests.get(f'http://a.b.c.d/wol?device={wol_computer}&grep-id=wake-on-lan', timeout=3)

This way I find `grep-id` in the server logs as a reminder of what to grep for, then `grep-id=wake-on-lan` in the entire codebase to find the actual source of the call.

Or I add comments with a grepable token to the code.

pooriar

0 replies

5h6m

2024-09-03 13:26:18 UTC

I just shared this in the work Slack and everyone resoundingly agreed with the sentiment. Definitely going to pay more attention to this now, thanks for sharing!

peanut-walrus

0 replies

9h55m

2024-09-03 08:36:43 UTC

These are all extremely good suggestions. Especially the flattening bit - yes, it's verbose as hell, but it just makes so much sense whenever you have to deal with the code any time after writing it. Helm charts, please take note, the docs even say that "In most cases, flat should be favored over nested.", yet almost every time I have to deal with a Helm chart, it's a mess of nested structures.

nsonha

0 replies

11h18m

2024-09-03 07:13:39 UTC

is there something like an universal "semantic grep" for code? I think rating code based on some (limitation of a) tool, might not be the best way.

noufalibrahim

0 replies

9h56m

2024-09-03 08:36:30 UTC

I don't know how to validate this but this seems to be a specific case of "avoiding magic" where there's a lot of dynamically generated variables and things. Having the static text of the program more or less show its intent helps readability and searchability quite a bit.

I suppose the other extreme is to have a program generator with an input spec and you being left to read through the generated code without access to the input spec.

nottorp

0 replies

3h44m

2024-09-03 14:48:04 UTC

Funny, I started work on a legacy code base a couple months ago and yes, it has all the problems described in the article and that hinders our understanding of it.

nickjj

0 replies

6h56m

2024-09-03 11:35:41 UTC

Absolutely.

It's what I like about Rails when it comes to file names too. Having controllers/users_controller.rb as a path might sound wasteful because "you're already in the controllers directory, you don't need _controllers in the path".

But when you want to fuzzy find that file, it's really nice to type "users con" and get that file instead of also picking up views, models and other user related files with just a "users" search.

mrb

0 replies

12h3m

2024-09-03 06:28:41 UTC

I will always remember my professor explaining that greppability is the reason C++ casting operators use a long keyword: static_cast<...> const_cast<...>, etc as you can easily grep for "_cast" or the whole keyword.

moomin

0 replies

9h55m

2024-09-03 08:36:39 UTC

If you really do want your code to be searchable, here’s a couple of practices I’ve adopted:

1) Eliminate spelling mistakes. Eliminate alternative spellings. UK vs US English? Pick a side and stick to it.

2) Eliminate contractions. Or keep a very short list of allowable ones (We permit “info” for instance.)

The point of this is to increase the predictability of the names you use. If you’ve got “tradeable” and “tradable” in your code base, search for it is going to be a pain. You can supplement these rules with common coding standards like “We call these things providers.” but just getting the spelling consistent is huge.

mashlol

0 replies

14h19m

2024-09-03 04:13:17 UTC

Greppable commit messages and descriptions are also important, for a similar reason. If you want to learn where a feature exists in the codebase, searching the commits for where it was added is often easier than trying to grep through the codebase to find it. Once you've found the commit or even a nearby commit, it's much easier to find the rest.

mannycalavera42

0 replies

4h56m

2024-09-03 13:35:43 UTC

if only there was a language where code is data so that... hold on a sec! #LISP-languages

loeg

0 replies

13h26m

2024-09-03 05:05:37 UTC

Yeah, this is also a benefit of e.g. C identifiers vs C++, where namespace, class, and method/variable can all be listed in separate places, breaking the ability to locate non-unique method/variable names with grep.

ljsprague

0 replies

10h44m

2024-09-03 07:47:39 UTC

In other words: don't try to be clever?

linuxdude314

0 replies

1h7m

2024-09-03 17:24:38 UTC

The examples are pretty silly, especially the first one.

If you know that you need either a shipping or billing address and the user has specified which one they need, just query based on that.

There’s no need to introduce a function (getTableName) to detemplate a string or match on a case.

Instead just create a function that gets the item you want from the DB and has the table name as input.

On your UI make sure when the users specifies billing or shipping address the correct parameter is passed to the API.

kmarc

0 replies

10h28m

2024-09-03 08:04:11 UTC

Some good recommendations in the article.

Greppability is also helpful when you start scripting your editor. Vim has `includeexpr` and co. to implement some "intelligence" when trying to find declarations etc. This enabled me to write a couple line snippet that immediately could resolve Bazel starlark symbols even in "imported" (`load()`) files. At one point I realized I have better code navigation than any of my colleagues using IDEs.

This, and tools like ripgrep really help a lot. This is something that VS Code developers also realized when indlcluded ripgrep itself as their "backend" of searching in files.

klysm

0 replies

5h39m

2024-09-03 12:53:31 UTC

A good IDE makes up for this if the syntax of the language doesn’t lend itself to easy greppage. I lean heavily on JetBrains search with their editors

jongjong

0 replies

13h8m

2024-09-03 05:23:55 UTC

This is a great point. One of my pet peeves is seeing an error in the logs which I cannot find in the code for various reasons. Sometimes the error message is constructed in a complicated way with variables concatenated together or the error message is extremely generic and I get matches in 100 different places.

I'm an advocate for the idea that any aspect of a system which communicates either with end users or with sysadmins should be given high exposure in the code base. Typically, this means constructing abstractions in such a way that higher-level business logic and log messages are easily traceable from a single file. I make it so that the business layer sits above all other layers, as close to the program's entry point as possible.

jonathanyc

0 replies

11h56m

2024-09-03 06:36:00 UTC

Related—“Too DRY - The Grep Test” by Jamie Wong: http://jamie-wong.com/2013/07/12/grep-test/

jgrahamc

0 replies

8h8m

2024-09-03 10:23:38 UTC

In my very first job I wrote a spell checker/corrector for code comments. This was specifically to make greppability possible because some of my colleagues were appalling at spelling and it meant that the incredibly detailed comments we used to write were hard to search for key details.

jgalt212

0 replies

1h26m

2024-09-03 17:06:32 UTC

Not to pick on Black, as it happens with all code formatters, but max line length rules kill greppability.

Black defaults to 88 characters per line

https://black.readthedocs.io/en/stable/the_black_code_style/...

j45

0 replies

3h28m

2024-09-03 15:04:07 UTC

Code that is not written for others in the future may have a limited future

indymike

0 replies

7h13m

2024-09-03 11:18:38 UTC

Greppability adn debugability are two things that I look for in code reviews. If you ask, "How would you debug that?" and the answer stats with, "I'd rewite it to..." Maybe, just maybe you should write it that way.

hugodan

0 replies

7h54m

2024-09-03 10:38:11 UTC

Try to keep code reference indirections to a tasteful minimum. If a split is needed that’s one more indirection for whoever will be maintaining it in the future. This weight needs to be on the table.

Keeping things referentially transparent helps a lot here.

hilux

0 replies

3h0m

2024-09-03 15:32:22 UTC

Digital marketers have known this for a long time.

hcfman

0 replies

11h3m

2024-09-03 07:28:51 UTC

.* is your friend :)

guhcampos

0 replies

4h43m

2024-09-03 13:48:55 UTC

One situation that comes to mind is configuration of applications on containers using environment variables.

It's extremely valuable to be able to just `grep -r PREFIX_` on a codebase and be able to visualize all possible configuration values for that application.

This is encouraged by some frameworks like Django, where you are expected to list all the configuration values in a `settings` module, but is not standard for `viper`, `click` and `pydantic-settings`, which try to be too smart and auto-generate the variable names for you. It's one of these cases where "modern" frameworks and applications try to save a minuscule amount of work by automating some task, but end up reducing the maintainability of the code over time.

frabjoused

0 replies

5h30m

2024-09-03 13:01:47 UTC

This is why I’ve always fought against BEM in CSS. Tends to drive greppability to zero.

emblaegh

0 replies

10h23m

2024-09-03 08:08:49 UTC

Python people should think twice before implementing a `__call__` method if they want to improve greppability.

elijahbenizzy

0 replies

12h59m

2024-09-03 05:32:44 UTC

Heh, this was very much the design philosophy behind Hamilton (github.com/dagworks-inc/hamilton).

The basic idea was that if you have a data artifact (columns for dataframes initially), you should be able to ctrl-f and find it in your codebase. 1:1 mapping of data -> function.

People take a long time to figure out that the readability gains from having greppability is worth whatever verbosity that comes, largely because they think of code too much as a craft (make it as small/neat as possible) and not documentation for a live process...

eitland

0 replies

7h34m

2024-09-03 10:58:22 UTC

Working in spring, which accepts I don't know how many formats from ENV_VARS to yaml, this very much resonates with me, because as a general rule, if one can use a certain option, someone will do it.

Also the reason why I try to avoid Gradle when possible:

The possibilities are endless. At one place I think I found 21 wildly different Gradle configs out of 24 that I checked.

(For anyone that wonders, it was combinations of:

- placeholders vs straightforward depency (this is a thing in maven too)

- for loops doing things based on lists or maps instead of just calmly declaring them one after another, maybe to save some characters

- helper functions so you could declare dependencies like azure(<something>(<version>))

- order of declarations

- Kotlin vs Groovy syntax

I have probably forgotten a couple more but this is thankfully already a few years ago.)

ceritium

0 replies

6h3m

2024-09-03 12:29:03 UTC

I built the command line tool flatito just for the Rails i18n translations keys.

I am unsure if I like the author's approach because there are other cons, but it's a good point.

* https://github.com/ceritium/flatito

breck

0 replies

3h18m

2024-09-03 15:13:46 UTC

There's a new kind of language where the practice is to use whitespace, and only whitespace, as your syntax. Newlines separate blocks and spaces separate words.

One of the unexpected extremely powerful things this allows is finding function usage extremely easy in any text editor that supports regex. You just search for ^[functionName] . Since you know that function pretty much will only be used at the beginning of lines. You can thus make edits against the AST with regexes and without parsing the AST at all.

It's pretty amazing, and leads to quite faster development, and allows one to tackle bigger and more complex problems.

binary132

0 replies

14h22m

2024-09-03 04:10:09 UTC

This can even be as simple as not using multi-line error strings, or expanding variables in them.

atoav

0 replies

10h53m

2024-09-03 07:39:11 UTC

One very simple way to make code less greppable is to use only single leter variables or other short variables that are very likely to be contained in a ton of other words.

assanineass

0 replies

8h4m

2024-09-03 10:28:07 UTC

Sounds a little erotic…

anordal

0 replies

11h28m

2024-09-03 07:03:42 UTC

Setting a variable by split identifier is surprisingly common in CMake (because functions can't return a value):

set(${VAR}_VERSION ${VERSION})

This is the main reason I don't like CMake.

ajayvk

0 replies

14h34m

2024-09-03 03:57:58 UTC

Just spent an hour trying to figure out how a Hugo theme was picking up a shortcode definition. Grep did not help.

Turned out the shortcode name is based on the file name rather than file contents.

advael

0 replies

38m

2024-09-03 17:54:26 UTC

For my purposes, among the most important

Even a major refactor is relatively easy if you can find stuff in your codebase. Even a small bugfix can get complicated if there's a ton of ambiguity

TeMPOraL

0 replies

10h17m

2024-09-03 08:14:58 UTC

Grep is indeed a critical tool for navigating and understanding an unfamiliar codebase, but greppability should not be a goal unto itself. The article seems to be making that mistake - it's basically advocating improving greppability at the cost of making the codebase even larger, messier, and harder to read: i.e. reinforcing the problem that makes you reach for grep in the first place. It's a false economy. It's asking you to optimize your code for one specific scenario - trying to figure out where an unfamiliar string comes from; but that isn't the most important or most frequent thing people need to do with code anyway.

(If it is for you, congratulations, you're the janitor in the codebase. It sucks, but that's what you're being paid for. Maintenance is a means, not an end.)

In particular, one of the most important and frequent thing you do with code is read it in order to understand it (locally, at the abstraction level of interest), and the advice from this article compromise it badly - almost as if hoping that, on a greppable enough codebase, you could use grep to avoid reading or thinking entirely.

1. Don't split up identifiers

Don't split them up for the sake of splitting, sure. That's not helping anything. But in the example given, there's likely a good reason for it - for example, it codifies the intended coupling between tables. `billing_address` isn't an independent term in this code, nor are the other `_address` table names. There's a naming pattern there, encoded directly in the initial example. The proposed refactor obscures it and triples the amount of code in the process (all of which is low-value noise) and introduces possibility of making errors (typos, copy-paste) of the kind that isn't picked up by compilers (hope you have good tests!).

FWIW, the author's refactor may be eventually required - if and when the naming pattern in the original code no longer holds. But not before then.

2. Use the same names for things across the stack

Excessive data repackaging is bad, but that tends to be a symptom of having too many layers. A good layer has specific semantics that distinguish it from layers above and below it. This may necessitate renaming some thing, in which case even if such renaming is as trivial as in the example, it should be spelled out explicitly; you can't just return Layer 1 Address object instead of Layer 3 Address object, if the two layers mean something different by "Address"; the triviality of the mapping is incidental and may not hold over time. If it really feels trivial, chances are one of the layers is not necessary in the first place, so go fix that.

3. Flat is better than nested

Now that's just screwing with people, especially wrt. nesting namespaces. It's asking to reintroduce the visual noise that the person reading the code will then have to filter out again mentally.

The way I see it, if you grep for some log message or unrolled identifier and can't find it, you're supposed to keep grepping for parts of the string, until you hit a match. You then go look, and it's usually apparent that you're dealing with a compound identifier or an interpolated string - congratulations, you just learned something important about that part of the legacy codebase, which is the real job you're supposed to be doing.

Shorel

0 replies

4h42m

2024-09-03 13:49:42 UTC

I use grep and git grep all the time.

This post is very welcome, it sums up my own ideas about grep in a better way.

Mikhail_Edoshin

0 replies

9h26m

2024-09-03 09:06:33 UTC

Conceptually it is akin to having file names that sort well.

Grep is a simple tool, not too different from a simple string sort. It is better than no tool, but is it better than a tool that understands the notation? A strong side of grep is that it is universal and is not tied to a particular notation. Yet if you could easily define a specific notation and have a tool to immediately understand it, would you still prefer grep?

We tend to organize the code according to the tools we have. E.g. if a tool gives us a list of entities in alphabetic order, we will try to name the entities so that they form “logical” groups. This may pass as a local organizational principle and may be useful but it is always intimately coupled with the underlying tool.

KronisLV

0 replies

40m

2024-09-03 17:52:31 UTC

I work on a project where people decided to refer to translations by doing the equivalent of:

  :label="$translate(getProductSectionLabel('title'))"

where the logic is a bit like:

  const getProductSectionLabel = (code) => `myapp.sales.sections.products.${code}`

and then the actual values are in a nested structure, like:

  myapp: {
    sales: {
      sections: {
       products: {
         title: "Products"
         ...
       }
       ...
     }
     ...
    }
    ...
  }

People seem to have gone for that because writing that first part is simpler within the component, but I couldn't get across that this makes the codebase harder to navigate.

Meanwhile, my personal codebases are more like:

  :label"$translate('myapp-sales-products-title')"

and the translation file also has the equivalent of:

  myapp-sales-products-title: "Products"

which is way simpler at the expense of some more duplication (easily mitigated by compressing the translations).

IshKebab

0 replies

11h50m

2024-09-03 06:42:05 UTC

This is why I always recommend avoiding kebab-case as much as possible. You'll eventually need to convert it to snake_case and now you have broken grep. (Nobody is going to remember to use a regex every time.)

HeavyStorm

0 replies

2h27m

2024-09-03 16:05:23 UTC

Only read the title but if it's what I guess it is, then I finally met someone who'll understand that I always declare functions in js using the keyword function.

Groxx

0 replies

27m

2024-09-03 18:05:16 UTC

Along similar lines: I highly recommend making every metric and log in your system spelled out completely somewhere.

    Don't:
      base      = "abc"
      something = base + ".some.suffix"
    Do:
      something = "abc.some.suffix"

I've also had some luck with hard-coded UUIDs at call sites, e.g.:

    log.Info("something", "callsite", "DECAFBAD-000...")

because it makes it absolutely trivial to find a log, and unlike caller-lines (which are great! use them too!) it doesn't change when you refactor code.

0x69420

0 replies

7h2m

2024-09-03 11:30:21 UTC

sure. the reason i put a line break between return type and function name in c-likes is `grep ^fname`. but i seriously wish greppability wasn't important. the extensive line-orientedness of unix tools really puts a damper on the whole hose-of-bytes concept, and it's no wonder by the time of plan 9, there was a strong desire to do away with it—cf. "structural regular expressions", as deployed in sam(1), which, of all the places to put them, certainly has historical irony, as sam's (decidedly not line-oriented) editing language nonetheless descends from ed, the definitive line editor, and gave us such hits as "stream ed" and "simulate typing `g/regex/p` into ed".

just the other week i noticed a change in recommended formatting style in a project i contribute to regularly, and the result was source files got about 20% taller, 20% more of a pain in the ass to edit without some sort of syntax folding. the rationale? diff. making you reach for a syntax-aware editor to compensate for a deficiency in the syntax-awareness of a version control frontend is certainly a choice.

the business end of git as seen by most programmers is in fact diff city, sure, but deep down git is a bunch of snapshots. even deltas behave nothing like diffs. pull up the spec for the pack format and look for the word "line". you will not find it.

things could be so much better, but for now we live in a world where the headline is true.