return to table of content

Bash Debugging

thaumaturgy
13 replies
1d13h

The `die()` trick is good, but bash has an annoying quirk: if you try to `exit` while you're inside a subshell, then the subshell exits but the rest of the script continues. Example:

    #!/bin/bash
    
    die() { echo "$1" >&2; exit 1; }
    
    cat myfile | while read line; do
        if [[ "$line" =~ "information" ]]; then
            die "Found match"
        fi
    done
    echo "I don't want this line"
..."I don't want this line" will be printed.

You can often avoid subshells (and in this specific example, shellcheck is absolutely right to complain about UUOC, and fixing that will also fix the die-from-a-subshell problem).

But, sometimes you can't, or avoiding a subshell really complicates the script. For those occasions, you can grab the script's PID at the top of the script and then use that to kill it dead:

    #!/bin/bash
    
    MYPID=$$
    
    die() { echo "$1" >&2; kill -9 $MYPID; exit 1;  }
    
    cat myfile | while read line; do
        if [[ "$line" =~ "information" ]]; then
            die "Found match"
        fi
    done
    echo "I don't want this line"
...but, of course, there are tradeoffs here too; killing it this way is a little bit brutal, and I've found that (for reasons I don't understand) it's not entirely reliable either.

whatindaheck
6 replies
1d12h

Could killing the PID like that create zombies?

ykonstant
5 replies
1d11h

Perhaps, but in any case I would never write code like this.

First of all, sending sigkill is literally overkill and perpetuates a bad practice. Send `TERM`. If it doesn't work, figure out why.

Secondly, subshells should be made as clear as possible and not hidden in pipes. Related, looping over `read` is essentially never the right thing to do. If you really need to do that, don't use pipes; use heredocs or herestrings.

Fourth, if you cannot avoid subshells and you want to terminate the full script on some condition, exit with a specific exit code from the subshell, check for it outside and terminate appropriately.

qazxcvbnm
2 replies
1d6h

Do enlighten me on why it is a bad idea to use loops over read; it's perhaps one of my favourite patterns in bash, and combined with pipes, appears to me one of the cleanest ways to correctly and concisely utilise parallelism in software.

qazxcvbnm
0 replies
1d1h

The provided points don't seem to be reasons for generally avoiding the subshell loop pattern.

Reasons of 1) performance, 2) readability, and 3) security are provided as points against the pattern, and the post itself acknowledges that the pattern is a great way to call external programs.

I'd think that the fact that one is using shell to begin with would almost certainly mean that one is using the subshell loop pattern for calling external programs, which is the use case that your post approves of. In this case, subshells taking the piped input as stdin allows the easy passing of data streamed over a file-descriptor, probably one of the most trivially performant ways of data movement, and the pattern is composable, certainly easier to remember, modify, and to extend than the provided xargs alternative, without potential problems such as exceeding max argument length. Having independent subshells also allows for non-interference between separate loops when run in parallel, offering something resembling a proper closure. In these respects, subshell loops provide benefits rather than pitfalls in performance and readability. Certainly read has some quirks that one needs to be aware of, but aren't much of an issue when operating on inputs of a known shape, which is likely the case if one is about to provide them as arguments to another command.

Regarding "security", the need to quote applies to anything in shell, and has nothing specifically to do with the pattern.

gpvos
1 replies
1d6h

> looping over `read` is essentially never the right thing to do

Why? I do it quite often, though admittedly usually in one-time scripts.

ykonstant
0 replies
1d3h

See my reply to qazxcvbnm and study the links carefully if you want to do robust shell scripting.

wolletd
5 replies
1d8h

Just adding `set -e` also exits the script when a subshell exits with non-zero error code. I'm not sure why I would leave `set -e` out in any shell script.

porlw
3 replies
1d7h

grep has a bad interaction with set -e, since it (infuriatingly) exits with 1 if no lines are matched.

dieulot
1 replies
1d3h

You can `set +e` before and `set -e` after every such command. I indent those commands to make it look like a block and to make sure setting errexit again isn’t forgotten.

usr1106
0 replies
4h38m

But you probably still want an error if the input file does not exist. To handle grep correctly in a robust manner requires many lines of boilerplate, set +e, storing $?, set -e, distinguishing exit values 0, 1, and 2.

wolletd
0 replies
1d7h

Well, otherwise `if grep "line" $file; then` wouldn't work, which in my opinion is the primary use case for grep in scripts.

I'd still prefer a `grep <args> || true` over not having `set -e` for the whole file.

js2
0 replies
1d1h

I use `set -e` but it has its own quirks. A couple:

An arithmetic expression that evaluates to zero will cause the script to exit. e.g this will exit:

  set -e
  i=0
  (( i++ )) # exits
Calling a function from a conditional prevents `set -e` from exiting. The following prints "hello\nworld\n":

  set -e
  main() {
     false  # does not return nor exit
     echo hello
  }
  if main; then echo world; fi
Practically speaking this means you need to explicitly check the return value of every command you run that you care about and guard against `set -e` in places you don't want the script to exit. So the value of `set -e` is limited.

More at https://mywiki.wooledge.org/BashFAQ/105

colordrops
12 replies
1d14h

Is there a reason bash is still the de facto shell scripting language other than sheer momentum of legacy? I'm able to get what I need done in it, but it's clunky and the syntax is horrid. I guess it forces you to move to a proper language once scripts grow to a certain size/complexity, so perhaps it's by design?

jimkoen
6 replies
1d14h

Are you sure it's bash? Most scripts on FreeBSD's are written for sh, which I feel is much more widely supported due to being part of the POSIX standard. Bash is just popular I think.

xp84
2 replies
1d11h

I suppose it’s an ambiguous designation.

I feel like when I see a shell script in my work, which is not in operating systems development of course, people are targeting bash. I agree many things are careful to target sh for certain reasons (e.g. a script that runs in a container where the base image doesn’t have bash installed) but i still think GP’s question is interesting because it’s not common to see, say, a zsh shell script, but seeing #!/bin/bash is super common.

ykonstant
0 replies
1d11h

I have done some delightful stuff in `zsh`, but I always lament how slow its numerical array traversal is. Frustratingly, experts told me it really doesn't have to be slow, the devs just don't seem to be bothering to revamp the underlying data structure because they are focusing more on associative arrays.

chasil
0 replies
1d2h

If you are on Ubuntu, and you must target #!/bin/sh then bash is not an option.

Android and Apple are in similar situations.

HankB99
1 replies
1d12h

Bash is pretty much expected to be installed on any Linux distro. On FreeBSD (and likely other BSDs) it is an optional install. If you want a script to run on either, use sh. If strictly Linux, bash is probably safe.

Bash/sh is good for when you need to combine some commands and what needs to be done can be accomplished mostly by CLI commands with a little glue to tie them together. Some times it is surprising what can be accomplished. I wrote a program to import pictures from an SD card on Windows using C#, copying pictures to C:\Pictures\YYYY\MM\DD according to the EXIF data or failing that, file time stamp. I tried to port it to Linux but ran into problems trying to connect to the EXIF library. After struggling with that, I rewrote it using sh, some EXIF tool and various file utilities. It took 31 lines, about half of which were actual commands and the rest comments or white space.

A much bigger project is a script to install Debian with root on ZFS. It's mostly a series of CLI commands with some variable substitution and conditionals depending on stuff like encrypted or not.

xk_id
0 replies
1d6h

Bash/sh is good for when you need to combine some commands and what needs to be done can be accomplished mostly by CLI commands with a little glue to tie them together.

Once I’ve learned bash, I realised how much more problems i could solve, in addition to a majority of old ones. It’s an entirely new level of “computer literacy”; and a more genuine one.

joveian
0 replies
19h31m

FreeBSD's /bin/sh is based on ash, like NetBSD's, although I'm not sure how much they have in common these days. dash was forked from NetBSD's version of ash and then simplified considerably and fixed up to be fully (? or at least mostly) POSIX compliant. A while after that NetBSD's shell also had a bunch of POSIX fixes. I'm not sure how FreeBSD's shell is in terms of strict POSIX compliance.

In my opinion, bash has two things (at least vs NetBSD's shell, possibly a few more vs POSIX) that make the average shell script (that I write) much easier. The first is &> which makes it easy to redirect both stdout and stderr to a file for logging. The standard 2>&1 can work but needs to be placed correctly or it doesn't work. That place isn't always the obvious place like it is with &> and running bash seems much preferrable to me than figuring that one out.

The second is ${var@Q} which prints var quoted for the shell, which is nice to use all over the place to make sure any printed file names can be copied and pasted.

My sense is that targeting POSIX is usually done for maximum portability or for use on systems that don't have bash installed by default. However, bash is quite widely available even if not by default and very widely used so I wouldn't say it is unreasonable to look at bash as the de facto standard and POSIX and other shells as being used in more limited circumstances.

theonemind
0 replies
1d12h

bourne shell scripting is good enough, which makes it nearly impossible to replace. Plan9's rc is a bit cleaner, and no one is going to switch for 'more of the same, but cleaner'. You haven't switched to something similar but better even though you could literally do it right now https://pkgsrc.se/shells , and it doesn't run any different for anyone else. It usually takes something several times better in some crucial aspect to replace an entrenched technology. For example, Plan 9 is better than UNIX-like systems, but not good enough to replace them. I don't think it's possible to make something good enough to replace bourne shell scripting in its niche because before you have something several times better, good enough to actually replace it, you're in a different ecological niche or problem domain, for real scripting languages like Perl, Python, and Ruby. It's a local maximum solution that sucks the air out of the room for potential competition closer to the theoretical global maximum solution for the narrow problem domain.

ndsipa_pomu
0 replies
1d5h

Its legacy usage is certainly a big part of its popularity. You can generally rely on having a newish version of it on any modern distro and you don't have to worry about the version unless you want to do stuff with arrays etc.

What I find compelling about bash is its position in relation to other languages and tools. It's ideal for tying together other tools and is close enough to the operating system to make that easy whilst also not requiring libraries to also be installed (c.f. python).

I often hear the opinion that more complex scripting should be moved to a language such as python, but that adds a layer of complexity that is probably not helpful in the long-run. I can take a bash script that I wrote twenty years ago and it'll still work fine, but a python programme from twenty years ago may well have issues with versions.

hyperadvanced
0 replies
1d13h

It really is just legacy and momentum. Recent additions build on sh/bash really well but in the end shell scripting is a means to an end that need to evolve much slower than standard programming languages.

I think bash/sh’s key feature is that they are anti-entropy, there’s no development or evolution so there’s no chance you need to mess with dependencies or new features, the stuff that worked 20 years ago will continue to be the “bread and butter”. By design, this results in a system that’s averse to change and incentivizes people to reach outside of its limits when they are met.

bregma
0 replies
1d3h

Perfection is the enemy of Good Enough.

AtlasBarfed
0 replies
1d10h

It's ubiquitous.

But bash is so bad I wrote a ton of namespace shortened utils for using groovy scripts.

Sooooooooooooooooo much better. use IDEs for dev, save library system, groovy smoothed almost all Java annoyances

mmmpetrichor
9 replies
1d13h

If I ever have to debug anything in bash. I stop using bash hah.

ykonstant
5 replies
1d10h

I find such reasoning backwards. Indeed, shell scripting is not friendly to debugging. But ensuring correctness of shell scripts is essential: usually, they touch part of your "$HOME" or system folders and do tons of I/O, some of it destructive. I find it baffling to see people write careless scripts; sometimes using `rm` for cleanup with unquoted parameters, or much worse, dangerous uses of `mv`.

bigstrat2003
4 replies
1d9h

I believe OP's point was that any shell script complex enough to require debugging should not be a shell script any more.

Brian_K_White
1 replies
1d7h

I believe P's point was that it doesn't matter how simple or complex a script (or anything else) is, everything requires debugging. And/or that you have to be ready to debug bash regardless if you like it or not, regardless what you choose to write your own stuff in.

OP's comment is not unfunny and not 100% untrue either though. But not 100% true either. A single word script still needs to be debugged.

BeetleB
0 replies
1d

And/or that you have to be ready to debug bash regardless if you like it or not, regardless what you choose to write your own stuff in.

Over a decade into my career, and I've successfully managed to avoid debugging Bash scripts.

There is hope for people who don't want to.

t-3
0 replies
1d8h

While that's certainly true for people trying to do very complex things in "pure" shell, when the tools you're using are possibly buggy, it's not very useful. Sometimes you have to debug and figure out where the problem is occurring, an then you can do the much simpler work of replacing one part rather than writing a bespoke program to replace all the boring functionality obfuscated away by shell and the working programs.

jon-wood
0 replies
19h58m

I hear this argument occasionally and it’s very contextual. While it’s certainly possible to rewrite any given shell script in Python, Rust, or whatever language you prefer there are some things which are just clearer in Bash.

I wouldn’t want to write an entire application in Bash, but equally I wouldn’t want to write a script which does relatively simple file operations in Python. Bash is a language which has been honed over many decades for precisely that sort of thing, and so can communicate what’s happening far clearer than Python does in my view.

sureglymop
2 replies
1d12h

Doesn't make sense. What if you get a script someone else wrote? Printing every command and confirming as you run every command is a great idea.

And, unfortunately shell has become the norm in CI/CD environments, pipelines etc. Can be convenient at times but can also be inconvenient and confusing as these scripts don't run in interactive shells.

bigstrat2003
1 replies
1d9h

And, unfortunately shell has become the norm in CI/CD environments, pipelines etc.

A pipeline which relies on shell is not worth using, tbh. That's how much shell sucks.

ndsipa_pomu
0 replies
1d5h

What language would you base a pipeline on?

I agree that bash sucks, but have yet to find anything to replace it that doesn't increase complexity and version problems.

memco
5 replies
1d15h

Good stuff! I use set-x frequently and have used a similar thing to die (but Julia’s version is nicer). I’ll consider using the debugger thing but stepping through a bash script line by line sounds a bit tedious. Perhaps less so than having to reread a log and rerun the script a bunch.

halostatue
3 replies
1d15h

I often add a fail-unless function:

    fail-unless() {
      local result
      "$@"
      result=$?

      if ((result != 0)); then
        echo >2&1 "Failed  ${result} with command '$*'."
        exit ${result}
      fi
    }
That way, I know exactly what failed in the script.

sureglymop
1 replies
1d12h

You do e.g. `fail-unless somecommand`. The result (exit/return code) is captured in the function and based on that, the function logs and exits or not.

ykonstant
0 replies
1d11h

You probably meant to reply to BeefySwain, right?

BeefySwain
0 replies
1d13h

How does this work exactly? What calls that function and when?

chatmasta
0 replies
1d15h

The line-by-line debugging would probably only be useful for a particular section of your script that you're trying to fix. In that case, you can remove the trap at the end of it with `trap - DEBUG`

klysm
5 replies
1d15h

I always put

set -euxo pipefail

at the top of my bash scripts. It makes some conditional testing more difficult but it has paid for itself many times over just because of pipefail

klysm
0 replies
17h21m

Thanks for the reference! Seems like a really good resource. I disagree with the reasoning about pipefail though. If I expect a command to return non-zero exit code I'd rather be explicit about it.

mellutussa
1 replies
1d4h

This is a lifesaver. Although I save the -x until I really need to see all the garbage debug output.

klysm
0 replies
1d3h

I find it’s usually worth the noise to catch weirdness

ddlsmurf
0 replies
1d15h

You can also set it for a bunch of lines then deactivate it with `set +x`. It gets rather tedious otherwise...

jph
3 replies
1d13h

Good info. You can improve your debugging by using exit codes like this:

    # die: print error message to stderr, then exit with error code.
    # example: die 69 "Service unavailable."
    die() {
            n="$1" ; shift ; >&2 printf %s\\n "$*" ; exit "$n"
    }
Many more shell script exit codes and helper functions:

https://github.com/SixArm/unix-shell-script-kit/blob/main/un...

ykonstant
2 replies
1d10h

That's a nice list; I guess every experienced user has their helper functions. However, I have a small criticism for the philosophy of that `die`: `die` functions should pass by default the exit code of the failed command, and not silence its error output. If I want to give my own meaning to the command failure in a large script for instance, I will use a different, more specialized `die`. My own die is roughly as follows:

    __errex() {
     printf 'Fatal error [%s] on line %s in '"'"'%s'"'"': %s\n' \
            "${1:-"?"}"                                         \
            "${2:-"?"}"                                         \
            "${3:-"unknown script"}"                            \
            "${4:-"unknown error"}" >&2                         ;
     exit "${1:-1}"
    }
    alias die='__errex "$?" "${LINENO}" "$0"'

jph
1 replies
1d2h

Thanks! Your way is better than what I have. Want to do a pull request? Or may I copy/paste your code?

ykonstant
0 replies
23h44m

Just copy paste!

asicsp
3 replies
1d15h

See also:

Why doesn't set -e (or set -o errexit, or trap ERR) do what I expected? https://mywiki.wooledge.org/BashFAQ/105

What are the advantages and disadvantages of using set -u (or set -o nounset)? https://mywiki.wooledge.org/BashFAQ/112

Safe ways to do things in bash https://github.com/anordal/shellharden/blob/master/how_to_do...

Better Bash Scripting in 15 Minutes https://robertmuth.blogspot.com/2012/08/better-bash-scriptin...

Writing Robust Bash Shell Scripts https://www.davidpashley.com/articles/writing-robust-shell-s...

ndsipa_pomu
2 replies
1d6h

You missed the most important tip - use ShellCheck on every script you write: https://www.shellcheck.net/

Personally, I'm a big fan of BASH3 boilerplate: https://github.com/kvz/bash3boilerplate

It's fine for BASH versions above v3 and provides decent logging though I typically extend the script so that I can pipe long running commands into its logging framework. It also ensures that you specify the "help" options correctly as it parses the usage information to process the command line arguments with support for short and long options.

asicsp
1 replies
1d4h

Oh yeah, shellcheck is a must have tool.

ndsipa_pomu
0 replies
1d

I think of it as a gateway to writing better scripts. When you first run it and it highlights what it considers to be a problem, you end up reading why it considers it to be a problem and that clues you in on some of the many footguns that Bash has.

schneems
2 replies
21h15m

I recommend shellcheck as well. It might not catch your problem, but it will point out possible problems.

Also I recommend: rewriting scripts in another language. At work we are converting bash scripts to rust and while it’s a high ramp-up time, the resulting code is much easier to maintain and I have a much higher level of confidence in them. Bash is still good for quick scripts but once you hit 100 lines or so you really deserve a language with stronger guarantees.

riperoni
1 replies
21h8m

I agree, but we gotta tell that to CI/CD engineering and yaml-pipelines

drizzleword
0 replies
1d13h

Another stack trace implementation [1] that allows you to write:

  some-command || fail "message"
to produce a stack trace and exit the shell in case of non-zero exit status from some-command, or write

  some-command || softfail "message" || return $?
in case you want to produce a stack trace and return from the function.

[1]: https://github.com/runag/runag/blob/main/lib/fail.sh

graton
1 replies
19h22m

Very useful is:

    PS4='+ ${BASH_SOURCE:-}:${FUNCNAME[0]:-}:L${LINENO:-}:   '
When using `set -x` this makes it so that it shows the filename, function name, and line number. Which in larger Bash scripts can be quite handy in debugging.

e40
0 replies
19h18m

WOW! Thanks! Power user of BASH for a long time and I did not know this.

ykonstant
0 replies
1d10h

The `trap DEBUG` thing is pretty interesting; I almost always write POSIX code, so I don't get to play with such tricks. Does anybody know of some wizardry that could mimic this in arbitrary POSIX compliant shells?

ketanmaheshwari
0 replies
1d4h

Plug and slightly related. I once created a bash pipeline debugger that preserves the intermediate outputs. Has a few limitations but maybe generally useful: https://github.com/ketancmaheshwari/pd

hn_acker
0 replies
21h38m

Off-topic: The alt text for the image says

Image of a comic. To read the full HTML alt text, click "read the transcript".

but I can't find any button relating to a transcript.

chlorion
0 replies
23h30m

Gentoo has a script that you can source at /lib/gentoo/functions.sh that provide various helper methods, mostly for printing messages, and it provides nice little green and red starts to indicate whether something has succeeded of failed.

I use functions.sh in all of my scripts that are known to be running on Gentoo only and it makes them feel Gentoo-y and is useful in general.

bubblebeard
0 replies
1d9h

This is neat, never considered one could step debug a bash script. Kudos for sharing!

E39M5S62
0 replies
1d13h

We use a couple nice home-grown functions in ZFSBootMenu to help debug things. We have a zdebug logging function that's peppered liberally throughout the code base - https://github.com/zbm-dev/zfsbootmenu/blob/master/zfsbootme...

Hitting ctrl-t on our main menu will, when booting with debug logging enabled, show a screen like this: https://i.imgur.com/Ge75zkP.png

We also have a flamegraph profiling mechanism that can be enabled with https://github.com/zbm-dev/zfsbootmenu/blob/master/zfsbootme... . That will dump data to a serial port, which when re-assembled, can be used to produce a graph like https://raw.githubusercontent.com/zbm-dev/zfsbootmenu/master...

Bash is suprisingly flexible.

Brian_K_White
0 replies
1d15h

I have almost that same die() in every script, except I call it abrt(). Maybe I'll switch to die() since it's shorter. Mine also prepends $0 and sometimes I use printf or echo -e so I can pass larger more complex messages with linefeeds and escape codes etc.