joshrosso

Pipes: Named and Unnamed

Pipes are cool. We all use them, but have you ever considered what's happening behind the scenes? Additionally, did you know there's a way to persist them to act as simple queues, facilitating interprocess communication? I'll be delving into pipes today. Let's go!

Pipe

A Unix pipe is a form of redirection that allows data to flow from one command to another, connecting the output of one command to the input of another command without using an intermediate file. Pipes are a powerful feature of Unix-like operating systems and can be used to create complex command pipelines for achieving higher-level tasks.

I'm certain many of you have used pipes extensively. Consider a common example where you want to navigate JSON in a human-readable format:

curl https://dummyjson.com/products | jq . | less

For some pipe appreciation, consider what this might look like without a |.

curl https://dummyjson.com/products -o products.json &&\
  jq . products.json > products.json &&\
  less products.json &&\
  rm products.json

When chaining many commands, pipes become essential to our mental health. Under the hood, the | is doing a pipe() syscall that reads the data and allows the kernel to do some trickery by introducing a set of file descriptors and facilitate this through a buffer. Visually this looks like:

Standard
Out
Standard...
curl
curl
Standard
In
Standard...
Standard
Out
Standard...
jq
jq
less
less
Standard
In
Standard...
buffer
(created by
kernel)
buffer...
buffer
(created by
kernel)
buffer...
Text is not SVG - cannot display

Assuming the next process can read standard in, it will take it and operate on it. Sometimes scripts or tools don't inherently read from standard in, in which case there are other tricks we could use, such as xargs. When you're writing scripts or command-line tools, I highly recommend supporting standard in since it makes your tool interoperable with the broader ecosystem.

Let's demonstrate this with a simple tool, jsonchk, built in Go, that determines whether JSON is valid or not. As an argument, it expects a file but also supports being piped into. The following code achieves this, with comments explaining some of the standard library uses:

package main

import (
	"bufio"
	"encoding/json"
	"fmt"
	"io"
	"os"
	"time"
)

const (
	invalidJSONMsg = "invalid JSON"
	validJSONMsg   = "valid JSON"
)

// WARNING: code simplified and errors not properly
// considered for brevity.
func main() {
	var jsonData []byte

	// read pipe via stadard in when present
	stat, _ := os.Stdin.Stat()
	// check fileMode is 0, or DIRECTORY
	// check input is Unix Character Device
	// When both ^ are true; we have a pipe
	if (stat.Mode() & os.ModeCharDevice) == 0 {
		jsonData, _ = io.ReadAll(os.Stdin)
	} else {
		// when no standard input existed:
		// expect argument 1 to be a file (or named pipe)
		f, err := os.Open(os.Args[1])
		if err != nil {
			panic(err)
		}
		defer f.Close()
		bRead := bufio.NewReader(f)
		for {
			line, _, err := bRead.ReadLine()
			jsonData = append(jsonData, line...)
			if err != nil {
				break
			}
		}
	}

	// check wether JSON is valid
	if json.Valid(jsonData) {
		fmt.Printf("[%s] received at %s\n", validJSONMsg, time.Now())
		os.Exit(0)
	}
	fmt.Printf("[%s] received at %s\n", invalidJSONMsg, time.Now())
	os.Exit(1)
}

To build the above:

go build -o jsonchk .

Now we can test a few pipe use cases:

curl -s https://dummyjson.com/products | ./jsonchk

[valid JSON] received at 2023-03-20 09:44:33.580404 -0600 MDT m=+0.256167251
echo "{{ seems Wr0nG}" | ./jsonchk

[invalid JSON] received at 2023-03-20 09:44:57.091382 -0600 MDT m=+0.000460459

This demonstrates the interoperability of our new command with curl and echo.

However, our usage of pipe is clearly ephemeral. What if we want to keep a pipe open over time, perhaps like a channel?

Named Pipes

Named pipes are an extension of this pipe model, where a buffer is create and presented as a file to enable reading and writing from processes. They act as first in first out (FIFO) queues and can be created using mkfifo. This command is available on most *nix environments. Another cool aspect is that we can largely treat these as files we’re reading from, they just happen to be cleared when read.

Let’s create a named pipe where processes can write JSON to and jsonchk can report what it found over time.

mkfifo /tmp/jsonBuffer

With the pipe file existing, let’s attach jsonchk to it in a continuous loop.

while true
  do ./jsonchk /tmp/jsonBuffer
done

Now from curl and echo, lets test the same idea, but redirect output to the named pipe:

curl -s https://dummyjson.com/products > /tmp/jsonBuffer
echo "{{ seems Wr0nG}" > /tmp/jsonBuffer

After running these 2 commands, we can return to the jsonchk loop and view the output:

[valid JSON] received at 2023-03-20 09:49:57.739516 -0600 MDT m=+137.128599542
[invalid JSON] received at 2023-03-20 09:49:57.766027 -0600 MDT m=+0.008085168

Along with these example, you could also pass a file, such as testData.json to ./jsonchk. Meaning it’ll treat files and named pipes similarly!

Conclusion

Pipes are rad, we all know this. Hopefully you learned something new in this post or, at least, grew your appreciation for this Unix primitive we often take for granted 🙂. Lastly, next time you’re writing a command line tool or script, consider accepting piped input!

Contents