10 underused shell scripting techniques

By dkl9, written 2023-267, revised 2023-267 (0 revisions)

I saw someone who claims to be very experienced in Linux/Unix shell (/bin/sh, roughly equivalent to bash) write and invoke an external Perl script to strip suffixes from filenames. To me, that appears patently unnecessary: you can do that easily with shell builtins. That such a shell feature went unnoticed to someone with "thirty-five years" of experience makes me wonder what other shell features people neglect.

So here are 10 techniques I collected from my shell scripts that feel obscure or underused. You may already know some of these. Hopefully you learn something.

§ Stripping the end: % and %%

This is how you strip a suffix from a filename: given variable FNAME, access ${FNAME%.*}.

You can access a variable with $VAR, but equally well with ${VAR}, and the latter form invites extra features. ${VAR%pattern} and ${VAR%%pattern} both give VAR, but with pattern removed from the end. The pattern can be a literal string, but can also contain * for any substring — hence, the .* in ${FNAME%.*} matches anything starting with . — as well as some other globbing features. If pattern doesn't match anything at the end of VAR, you get VAR unchanged, as in $VAR.

Using just % forces the pattern to match the shortest string possible, while %% forces the longest match. If $FNAME contained multiple dots, ${FNAME%.*} just removes the final suffix, while ${FNAME%%.*} removes all dot-suffixes, leaving only the part before the first dot.

This doesn't change the original VAR, just the value you get when you access it with this syntax.

§ Stripping the start: # and ##

To get the suffix, access ${FNAME##*.}.

${VAR#pattern} and ${VAR##pattern} work essentially the same as % and %%, except that they match and remove matches from the start of VAR rather than the end. As with %, # looks for the shortest match, while ## looks for the longest match.

This method leads to a way to check if a variable string starts with a fixed string. For example, to check if VAR starts with "y", run the comparison [ "${VAR#y}" != "$VAR" ]. That comparison asks "is VAR, with 'y' removed from the start if it was present, different from VAR?" — which is equivalent to "does VAR start with 'y'?"

§ Null and non-null string tests: -z and -n

At the start of your script, you might want to check if the requisite arguments are missing, in which case, you might use [ -z "$1" ] || [ -z "$2" ].

test, the same command used in expressions like [ "$A" -lt "$B" ] or [ "$X" = y ], also supports unary operators such as -z — which checks if its operand is of zero length — and -n — which checks if its operand is nonempty, i.e. the logical negation of -z.

§ Default variable values: :-

If the user of bulk_webdl doesn't give a waiting interval, the script should use a default of one second, so instead of sleep "$2", I used another ${...} feature: sleep "${2:-1s}".

In general, ${VAR:-default} produces the value of VAR, unless VAR is empty, in which case it produces default instead (but without modifying VAR).

§ Word boundaries: grep '\b'

Looking for a word like "ls" and want to avoid longer words that contain it (like "else" or "intervals")? grep '\bls\b' file. I find this more useful in interactive shell use than scripts.

In general, \b matches a "word boundary", which roughly means what you'd intuitively want. '\bls\b' is more powerful than ' ls ', sith \b also handles start- and end-of-line without spaces.

§ Extracting strings: grep -o

ip route gives information about your networking configuration. Usually, one line says "default" and contains your gateway IP address; you could get that line with ip route | grep default. If you want just the IP address: ip route | grep default | grep -o '[0-9]+.[0-9]+.[0-9]+.[0-9]+'.

grep, by default, copies to output any line that contains a match to the pattern. grep -o copies to output just those parts of the lines that contain matches.

§ Merging lines: paste -s

Need to sum a newline-separated list of numbers? paste -sd '+' listfile | bc -l.

paste -s outputs the lines from its input joined with tabs. If you add -d 'CHAR', the lines will be joined with that character instead.

§ Parsing filenames: basename and dirname

For a "realistic" error message from your script, echo the error starting with $(basename "$0").

basename returns the filename at the end of its argument, after the final /. dirname returns the directory containing its argument.

§ Stripping with sed

sed 's/^pattern//' is a more powerful version of ${VAR##pattern}, and likewise sed 's/pattern$//' to ${VAR%%pattern}. In either case, a pattern — with the full power of regular expressions, not just some shell globbing — aligned to the start or end of a line of input is replaced with the empty string, i.e. removed.

§ Sectioning with sed

The part of a file after the first blank line is sed '1,/^$/ d' file; the part before the first blank line is sed '/^$/,$ d' file.

d is a sed command which deletes selected lines, and sed selectors can be not just regular expressions like /^$/, but also ranges of lines. The start or end of a range can be a regular expression, a line number, or the last-line indicator, $ (i.e. right here!).