Splitting Pipelines

2018-07-13 - Rob

The problem

Some days ago I was looking at some configuration files that were presented in this form:

#
# This is a section
#
VALUE_ENABLED=y
# VALUE_DISABLED 

The problem in reading this stuff is that both comments and non-comments are relevant and none should be ignored since a disabled feature could cause as much trouble as an enabled one.

I would have liked to see comments in a different file than the rest, so I thought: “Well, this is an easy job for grep”.

The approach

Since I hate to waste processes while I use bash, regardless the significance in term of performance, my first idea was to write something like this:

information flow

The problem in this approach is that with tee it is not easy to just split a pipeline unless you know the >(command) trick (aka process substitution).

This is the bash version of the above graph:

#Take away empty lines and lines only containing white space
grep -Ev "^(|[[:space:]])$" configfile | 
#In a subshell Grep for lines beginning with '#'
tee >(grep -E "^#" >                    
	#Write result to file and resume the main shell 
	comments.conf) |                    
	#Grep for non-comment lines
	grep -Ev "^#" >                     
	#Write them to file
	non-comments.conf                   

In bash >(command) creates a file descriptor to which other commands can write and <(command) takes the output of a command and creates a file descriptor from which other commands can read.

What is being written to the file descriptor is instead redirected to stdin of command as if it came from a pipeline.

Since tee accepts files as parameters and not commands, this trick is needed if you want to avoid useless creation of FIFOs.

Complication

This solution is mostly fine, it does what it is supposed to do, but doesn’t do something that I realized later being useful: keeping sections titles even in the non-comment file.

This:

#
# This is a section
#
VALUE_ENABLED=y
# VALUE_DISABLED 

Should’ve become this:

# This is a section
VALUE_ENABLED=y

Since all variables are composed only by upper-case text I changed the solution to be:

grep -Ev "^(|[[:space:]])$" configfile | 
tee >(grep -E "^#" > comments.conf) | 
grep -E "(^[^#]|^# [A-Z][a-z])" > non-comments.conf 

Conclusion (TL;DR)

If you want to split a pipeline or generally write to the stdin of a command as if it was a file use >(command) i.e.

# all 3 commands receive the output of somecommand
somecommand | tee >(command1) >(command2) | command3

Even the reverse can be useful:

# Obtain differences between the outputs of two commands
diff <(command1) <(command2)

This process is called process substitution and it is almost never useful, but when it is it can save up a great amount of time.

If you are intrestes in some advanced features of bash redirections you can go to the REDIRECTION section of man bash.