White shell - whitespace in bash for C programmer


I will try to explain how to use whitespace characters and quoting in shells like bash. I'm programming in Python, C, Java, Perl and other normal computer languages. And I've found myself many times in a big problem just doing simple things in the shell. Why? I hope not because I'm bad programmer:)

Rule no. 1

Basic things we programmers need to know about shell languages can be very different from languages we are using on daily basis. The most important is this:

In shell everything is a string!

Not like for example in C where string is only text inside "" (double quotes) or stored under char* variable name. In shell everything is a string.

Lets check out this on the variable assignment example:

  X="hello"

Variable X is storing string now, but line below is exactly the same:

  X=hello

Frankly saying X is also a string. You can try:

  echo X

... click here to see output

just if added $ (dollar sign) shell will get string from memory storied before.

  echo $X

... click here to see output

  X=hello
  hello=X  
  echo $X $hello
:)

... click here to see output

So what we need "" quoting for? In this case for nothing. But when you type

  X="hello world"

is different than

  X=hello world

... click here to see output

Shell needs to know where starts every command argument (main duty of shell is invoking commands). So by default white spaces are separators for command name and arguments. In conclusion we need quoting to suppress this behavior.

Another proof that everything is a string is here:

  X="hello"" ""world"

Because everything is a string so string concatenation is as basic as letter itself (in words you put letter after letter, you concatenate them, don't you ?). In our example we have three quoted strings joined together.

But something unacceptable in normal programming languages here is legal:

  X=hello" ""world"

BTW, try also this funny looking command:

  "hello world"

... click here to see output

Yes, this line is a ordinary command. Another proof that everything is a string.

Furthermore, the most common source of problems is in understanding how quotes are treated.

When we write:

  X="hello world"

into memory, under the name X string hello world is copied. Quotes are not stored into memory!

So command

  echo $X

... click here to see output

works OK, but not like people thing it works. Variable is expanded to one string which is then again split in spaces positions. So that command has two arguments

  1. echo
  2. hello
  3. world

To test that try this:

  X="hello      world"
  echo $X

... click here to see output

There is no difference in output because space between words are added by echo command, it was not taken from the string.

So how to save those extra spaces? Quotes are not stored, so we need to used them again, when we reach for variable from memory:

  X="hello      world"
  echo "$X"

... click here to see output

I know, that invocation looks silly.

Actually, that extra quoting is the most common problem when working with file names containing spaces. For example try execute this command somewhere along files with spaces in names

  for f in *; do ls "$f"; done

... click here to see output

And now try without double quotes.

  for f in *; do ls $f; done

... click here to see output

Many gurus say that we should always double quote variable

  "$variable"

because we never know if spaces are inside. Sounds logical. Just those extra two character to type in...

Quoting

Everything in shell is a string we proved earlier. So quoting mechanism has to be different than in normal languages.

  • First we have single quotes

      echo 'I "have" 10$!'
    

    ... click here to see output

    Everything inside is allowed and treated as a character.

  • Escaping character \ pretty much like in other languages:
      echo I\ \"have\"\ 10\$!
    

    ... click here to see output

  • Double quotes
      echo "I 'have' 10$!"
    

    ... click here to see output

    Spaces are preserved but variables like that special $! are expanded. Single quotes inside double quotes are ordinary characters too.

      echo "I \"have\" 10\$!"
    

    ... click here to see output

    As you can see escaping character \ works here like outside quotes.

Can you repeat/eval, please?

Shell execution procedure is pretty naive. Not only everything is a string, but also these

strings are read by shell just once.

For example if under one variable we want store name of other variable and reach for its value, it's too much for shell. This case is like pointers from C or reference from other languages.
  USER_A=tom
  USER_B=geoff
  USER_C=anna
  CURRENT='$USER_B' # without single quotes it would be just value copy
  echo $CURRENT

... click here to see output

We can notice that expansion of variables occurred only one time. How about telling shell to redo code interpretation one more time:
  eval echo $CURRENT

... click here to see output

This is equivalent to two steps. First:

  cmd = shell_reparse_line("echo $X_POINTER");

... click here to see output

Then variable cmd is executed as normal command:

  echo $X

... click here to see output

Exactly what we need.

More useful example:

  SILENT_MODE='> /dev/null 2>&1'
  eval ls $SILENT_MODE
which is equivalent to:
  ls > /dev/null 2>&1

Without eval we have errors. Special characters are parsed only once, like expanding variables.

  SILENT_MODE='> /dev/null 2>&1'
  ls $SILENT_MODE

... click here to see output

Very nasty and common is case which needs both techniques: proper quoting strings with spaces and late evaluation.

For example:

  DIR="My Music"
  COMMAND="ls $DIR"
  sudo $COMMAND

... click here to see output

this doesn't work because we missed quoting of $DIR content. Lets try again

  DIR="My Music"
  COMMAND="ls '$DIR'"
  sudo $COMMAND

... click here to see output

Do you see slight difference in the outputs? Quoting of strings is parsed just once, at the beginning, so quotes stored in variable don't work. How it should looks?

  DIR="My Music"
  COMMAND="ls '$DIR'"
  eval sudo $COMMAND

... click here to see output

In conclusion, we need to eval again:

  • variables
  • control shell characters
  • quotes itself (but they are kind of control characters)

Conclusions

The first requirement for shell was to interpret human commands/needs and communicate with system (OS) kernel. If this is true and systems became more complex, shell functionality needed to grown. In a current version most shells are fully functional programming languages... But still they need to interpret human commands/behavior. So maybe this is reason why so simple platform (I mean shell) is so complicated in nuances of work/execution. Not mention that every shell is slightly different from each other!

I hope we've cleared something... a little... Unfortunately many aspects of shell code interpretation will be still not easy to understand. This is because every shell is little different from each other. And interactive mode is different than shell script rules. And of course documentation is too long and too complicated... But I believe we've discussed most misleading and fundamental things.