The `die()` trick is good, but bash has an annoying quirk: if you try to `exit` while you're inside a subshell, then the subshell exits but the rest of the script continues. Example:
#!/bin/bash
die() { echo "$1" >&2; exit 1; }
cat myfile | while read line; do
if [[ "$line" =~ "information" ]]; then
die "Found match"
fi
done
echo "I don't want this line"
..."I don't want this line" will be printed.You can often avoid subshells (and in this specific example, shellcheck is absolutely right to complain about UUOC, and fixing that will also fix the die-from-a-subshell problem).
But, sometimes you can't, or avoiding a subshell really complicates the script. For those occasions, you can grab the script's PID at the top of the script and then use that to kill it dead:
#!/bin/bash
MYPID=$$
die() { echo "$1" >&2; kill -9 $MYPID; exit 1; }
cat myfile | while read line; do
if [[ "$line" =~ "information" ]]; then
die "Found match"
fi
done
echo "I don't want this line"
...but, of course, there are tradeoffs here too; killing it this way is a little bit brutal, and I've found that (for reasons I don't understand) it's not entirely reliable either.
Could killing the PID like that create zombies?
Perhaps, but in any case I would never write code like this.
First of all, sending sigkill is literally overkill and perpetuates a bad practice. Send `TERM`. If it doesn't work, figure out why.
Secondly, subshells should be made as clear as possible and not hidden in pipes. Related, looping over `read` is essentially never the right thing to do. If you really need to do that, don't use pipes; use heredocs or herestrings.
Fourth, if you cannot avoid subshells and you want to terminate the full script on some condition, exit with a specific exit code from the subshell, check for it outside and terminate appropriately.
Do enlighten me on why it is a bad idea to use loops over read; it's perhaps one of my favourite patterns in bash, and combined with pipes, appears to me one of the cleanest ways to correctly and concisely utilise parallelism in software.
Stéphane Chazelas has already explained this extensively, so I will link to two of his most information-dense posts:
https://unix.stackexchange.com/questions/169716/why-is-using...
https://unix.stackexchange.com/questions/209123/understandin...
Both of these posts must be read carefully if you really wish to write robust scripts.
The provided points don't seem to be reasons for generally avoiding the subshell loop pattern.
Reasons of 1) performance, 2) readability, and 3) security are provided as points against the pattern, and the post itself acknowledges that the pattern is a great way to call external programs.
I'd think that the fact that one is using shell to begin with would almost certainly mean that one is using the subshell loop pattern for calling external programs, which is the use case that your post approves of. In this case, subshells taking the piped input as stdin allows the easy passing of data streamed over a file-descriptor, probably one of the most trivially performant ways of data movement, and the pattern is composable, certainly easier to remember, modify, and to extend than the provided xargs alternative, without potential problems such as exceeding max argument length. Having independent subshells also allows for non-interference between separate loops when run in parallel, offering something resembling a proper closure. In these respects, subshell loops provide benefits rather than pitfalls in performance and readability. Certainly read has some quirks that one needs to be aware of, but aren't much of an issue when operating on inputs of a known shape, which is likely the case if one is about to provide them as arguments to another command.
Regarding "security", the need to quote applies to anything in shell, and has nothing specifically to do with the pattern.
> looping over `read` is essentially never the right thing to do
Why? I do it quite often, though admittedly usually in one-time scripts.
See my reply to qazxcvbnm and study the links carefully if you want to do robust shell scripting.
Just adding `set -e` also exits the script when a subshell exits with non-zero error code. I'm not sure why I would leave `set -e` out in any shell script.
grep has a bad interaction with set -e, since it (infuriatingly) exits with 1 if no lines are matched.
You can `set +e` before and `set -e` after every such command. I indent those commands to make it look like a block and to make sure setting errexit again isn’t forgotten.
But you probably still want an error if the input file does not exist. To handle grep correctly in a robust manner requires many lines of boilerplate, set +e, storing $?, set -e, distinguishing exit values 0, 1, and 2.
Well, otherwise `if grep "line" $file; then` wouldn't work, which in my opinion is the primary use case for grep in scripts.
I'd still prefer a `grep <args> || true` over not having `set -e` for the whole file.
I use `set -e` but it has its own quirks. A couple:
An arithmetic expression that evaluates to zero will cause the script to exit. e.g this will exit:
Calling a function from a conditional prevents `set -e` from exiting. The following prints "hello\nworld\n": Practically speaking this means you need to explicitly check the return value of every command you run that you care about and guard against `set -e` in places you don't want the script to exit. So the value of `set -e` is limited.More at https://mywiki.wooledge.org/BashFAQ/105