Pipe (computer)

A pipe is a means of connecting the output of one program to the input of another program. That makes it a form of interprocess communication.

Ken Thompson, the author of the very influential Unix kernel first introduced this feature into the earliest versions of Unix. When he wrote the first Unix shell, he made it trivial for ordinary users to connect programs, on the command line.

pipelines

A shell user can request the invocation of a series of programs, where each program's output becomes the input of the next program. The following example tells the shell to invoke three common programs, awk, sort, uniq in a pipeline. In this example assume ccr is a file where each line describes a credit card transaction, and the first and second fields of each line are the credit card number and the client's name. The vertical bar symbol | tell the shell to link the preceding program to the next program with a pipe. This is called a pipeline.

It can be written all on one line, like this:

awk ' { print $1, $2 } ' ccr | sort | uniq -c | sort -r

`awk ' { print $1, $2 } ' ccr \|`	`awk` prints the first two fields of every input line
`sort \|`	the `sort` program takes its input, and sorts it, usually in standard alphanumeric order
`uniq -c \|`	the `uniq` program takes its input, and looks for consecutive duplicate lines, printing each duplicate once, optionally preceded by the number of times the line was duplicated.
`sort -r`	this `sort` will print the names in order by how many purchases the customer made, the `-r` says list them in reverse order, biggest spenders first

So, in this example, we saw someone summarize a list of credit card transactions, reporting on who used their credit cards most often, without having to write a specific program to do so.

Pipe is a system call

In computer science it is desirable to insulate programs from the details of interacting with the computer's devices. Programers should be able to open a printer, to print something, without knowing the details, of how the printer operates, so the program will still work if the printer is replaced, or the program is run on another computer system, with a different kind of printer. Programmers do this through system calls. Typically their programs will make what looks like an invocation of a subroutine, that actually turns over control of the computer to the operating system. The operating system will then tell the hardware to do whatever the program needs it to do. When that program gets its turn again calling the system call was indistinguishable from a normal library subroutine. Four of the most commonly used system calls for files and devices are open, read, write, close.

The system calls one needs to set up a pipeline are pipe, fork, exec. They should be called in that order. Pipe initializes a buffer that can be shared between two programs. The fork tells the computer to make a second running copy of the current program. These two copies are identical, except that they know if they are the original, or the copy. Normally, the final step in the pipeline would be for both of the copies to overlay themselves and begin using another program. That is where the exec system call comes in. But, after the fork and before the exec, each version of the program has to do something to the pipe. The copy that is going to write to the pipe has to close the input side, and the copy that is going to read from the pipe has to close its output side.

The pipe's buffer

When operating systems connect relatively fast programs to slow devices, like line printers, hard drives or card readers, they rely on the operating system to take their output, and put it into a buffer, and later write it to the device when it is ready. The program does not have to wait for the device, unless the output buffer is full. Similarly, the operating system attempts to fill an input buffer, when a file or device is opened, then the program won't have to wait on the device, unless the buffer has been exhausted.

When the operating system sets up two programs, that are connected by a pipe, it sets up a buffer they can both access. While the program writing to the pipe's buffer still has to pause when the buffer is full, and the program writing to the pipe's buffer. In every operating system prior to Unix, if a second program was going to operate on the output of an earlier program, the first program would have to store its output in a file on the computer's hard drive. The second program then has to open that temporary file and read it in. Pipelines have several advantages. First, reading and writing to disk is very time consuming. On an ordinary operating system, almost all the work might have been reading and writing those temporary files. With a pipeline all the delays of using temporary files is skipped. Secondly, if the series of programs are operating on large files, they might fail because it totally fills up the hard drive with temporary files.