Pipes: Named and Unnamed
Pipes are cool. We all use them, but have you ever considered what's happening behind the scenes? Additionally, did you know there's a way to persist them to act as simple queues, facilitating interprocess communication? I'll be delving into pipes today. Let's go!
Pipe
A Unix pipe is a form of redirection that allows data to flow from one command to another, connecting the output of one command to the input of another command without using an intermediate file. Pipes are a powerful feature of Unix-like operating systems and can be used to create complex command pipelines for achieving higher-level tasks.
I'm certain many of you have used pipes extensively. Consider a common example where you want to navigate JSON in a human-readable format:
curl https://dummyjson.com/products | jq . | less
For some pipe appreciation, consider what this might look like without a |
.
curl https://dummyjson.com/products -o products.json &&\
jq . products.json > products.json &&\
less products.json &&\
rm products.json
When chaining many commands, pipes become essential to our mental health. Under
the hood, the |
is doing a pipe()
syscall that reads the data and allows
the kernel to do some trickery by introducing a set of file descriptors and
facilitate this through a
buffer.
Visually this looks like:
Assuming the next process can read standard in, it will take it and operate on
it. Sometimes scripts or tools don't inherently read from standard in, in which
case there are other tricks we could use, such as xargs
. When you're writing
scripts or command-line tools, I highly recommend supporting standard in since
it makes your tool interoperable with the broader ecosystem.
Let's demonstrate this with a simple tool, jsonchk
, built in Go, that
determines whether JSON is valid or not. As an argument, it expects a file but
also supports being piped into. The following code achieves this, with comments
explaining some of the standard library uses:
package main
import (
"bufio"
"encoding/json"
"fmt"
"io"
"os"
"time"
)
const (
invalidJSONMsg = "invalid JSON"
validJSONMsg = "valid JSON"
)
// WARNING: code simplified and errors not properly
// considered for brevity.
func main() {
var jsonData []byte
// read pipe via stadard in when present
stat, _ := os.Stdin.Stat()
// check fileMode is 0, or DIRECTORY
// check input is Unix Character Device
// When both ^ are true; we have a pipe
if (stat.Mode() & os.ModeCharDevice) == 0 {
jsonData, _ = io.ReadAll(os.Stdin)
} else {
// when no standard input existed:
// expect argument 1 to be a file (or named pipe)
f, err := os.Open(os.Args[1])
if err != nil {
panic(err)
}
defer f.Close()
bRead := bufio.NewReader(f)
for {
line, _, err := bRead.ReadLine()
jsonData = append(jsonData, line...)
if err != nil {
break
}
}
}
// check wether JSON is valid
if json.Valid(jsonData) {
fmt.Printf("[%s] received at %s\n", validJSONMsg, time.Now())
os.Exit(0)
}
fmt.Printf("[%s] received at %s\n", invalidJSONMsg, time.Now())
os.Exit(1)
}
To build the above:
go build -o jsonchk .
Now we can test a few pipe use cases:
curl -s https://dummyjson.com/products | ./jsonchk
[valid JSON] received at 2023-03-20 09:44:33.580404 -0600 MDT m=+0.256167251
echo "{{ seems Wr0nG}" | ./jsonchk
[invalid JSON] received at 2023-03-20 09:44:57.091382 -0600 MDT m=+0.000460459
This demonstrates the interoperability of our new command with curl
and
echo
.
However, our usage of pipe is clearly ephemeral. What if we want to keep a pipe open over time, perhaps like a channel?
Named Pipes
Named pipes are an extension of this pipe model, where a buffer is create and
presented as a file to enable reading and writing from processes. They act as
first in first out (FIFO) queues and can be created using mkfifo
. This command
is available on most *nix environments. Another cool aspect is that we can
largely treat these as files we’re reading from, they just happen to be cleared
when read.
Let’s create a named pipe where processes can write JSON to and jsonchk
can
report what it found over time.
mkfifo /tmp/jsonBuffer
With the pipe file existing, let’s attach jsonchk
to it in a continuous loop.
while true
do ./jsonchk /tmp/jsonBuffer
done
Now from curl
and echo
, lets test the same idea, but redirect output to the
named pipe:
curl -s https://dummyjson.com/products > /tmp/jsonBuffer
echo "{{ seems Wr0nG}" > /tmp/jsonBuffer
After running these 2 commands, we can return to the jsonchk
loop and view the output:
[valid JSON] received at 2023-03-20 09:49:57.739516 -0600 MDT m=+137.128599542
[invalid JSON] received at 2023-03-20 09:49:57.766027 -0600 MDT m=+0.008085168
Along with these example, you could also pass a file, such as testData.json
to
./jsonchk
. Meaning it’ll treat files and named pipes similarly!
Conclusion
Pipes are rad, we all know this. Hopefully you learned something new in this post or, at least, grew your appreciation for this Unix primitive we often take for granted 🙂. Lastly, next time you’re writing a command line tool or script, consider accepting piped input!