subprocess: From Python to Go
One of the things that makes the UNIX toolbox so powerful is the ability
to run a command and capture its output. If you now think, “why, of course
you can!”, realize that this seemingly trivial operation requires complex
features in the very core of the operating system to make this possible.
On top of that, the application layer needs to offer the functionality,
preferably wrapped in a pleasant to use interface.
While Golang has a pretty decent API for running external commands,
I particularly love Python’s subprocess module; it’s incredibly
versatile, and the interface is easy to remember (especially after having
used it countless times).
Really, capturing command output in Go is as simple as:
output, err := exec.Command(cmd).Output()
And yet, I don’t want to code it this way. It’s too baked, set in stone,
for my tastes. What I wish to have is a wrapper function that offers some
flexibility on how to capture output, e.g. capture stdout, and redirect
stderr to /dev/null; much like how Python’s subprocess.run() works.
When you study the Python documentation, you’ll find that subprocess.run()
takes over a dozen arguments (I’m not exaggerating…!) to tweak its
behavior. Python is flexible in that functions may have default
arguments, that can be omitted when calling the function.
We don’t have that luxury in Go, therefore we won’t port each and every
feature in detail. That said, we will have a very capable Go function
that mimics the Python API close enough.
package subprocess
func Run(argv []string, stdout, stderr DupFd) (Result, error)
The Run() function takes an argv as command.
The stdout/stderr DupFd are special constants that define how we wish
to redirect stdout and stderr. The Result is a structure that contains
the captured output and exit code. The Go idiom is to return an error
upon error.
type DupFd int
const (
_ DupFd = -iota
DevNull
Pipe
Stdout
Stderr
)
The DupFd constants are just integers. Giving them an explicit type makes
the code more robust. Note use of negative iota; the idea is that you can
still pass a (type casted) open file descriptor, redirecting to file.
(Not shown in this post for brevity, however).
type Result struct {
Stdout []byte
Stderr []byte
ExitCode int
}
The returned Result contains captured output and the exit code.
You may prefer having strings rather than bytes. A handy way of converting
bytes to lines of text is:
func Lines(b []byte) []string {
return slices.Collect(strings.Lines(string(b)))
}
Now let’s get to the meat of implementing Run().
It’s possible to write this function as one long stretch of code, but
for neatness (and re-usability) break it down into three steps:
// pseudo code
func Run(argv, stdout, stderr) {
proc := Popen(argv, stdout, stderr)
proc.Communicate()
proc.Wait()
return result
}
By now it should be clear that the trivial task of capturing command output is actually quite involved.
The name “Popen” comes from the C library function popen(), which
creates a pipe connected to a newly spawned child process.
We do not call the C library function, in fact our version imitates
the Python subprocess.Popen() function.
func Popen(argv []string, stdout, stderr DupFd) (Process, error)
The returned Process structure holds some data on the started child process,
and the read ends of any pipes connected to the child’s stdout/stderr.
import "os/exec"
import "io"
type Process struct {
Cmd *exec.Cmd
StdoutPipe io.ReadCloser
StderrPipe io.ReadCloser
}
With this in place we can implement Popen() as follows.
First we create (but do not yet start) the command:
cmd := exec.Command(argv[0], argv[1:]...)
Again, the arguments stdout, stderr DupFd are constants that determine
how we wish to redirect.
var pipefd io.ReadCloser
var err error
switch stdout {
case DevNull:
// redirect to /dev/null
f, err := os.OpenFile(os.DevNull, os.O_WRONLY, 0o600)
if err != nil {
return Process{}, err
}
cmd.Stdout = f
case Pipe:
// capture output via pipe
pipefd, err = cmd.StdoutPipe()
default:
// do not redirect
cmd.Stdout = os.Stdout
}
Next, start the process and return the Process structure:
proc := Process{
Cmd: cmd,
StdoutPipe: pipefd,
StderrPipe: pipefd_stderr,
}
// start the process!
err := proc.Cmd.Start()
if err != nil {
return proc, err
}
return proc, nil
The production version of Popen() includes some things that I left out
here for brevity;
- a similar
switchas above, but for redirecting stderr - redirecting stderr to stdout, and vice versa
- redirecting stdin for input
- redirecting to an open file descriptor for writing to an on-disk file
Now that we have a running the process with some pipes, not much else is happening by itself, really. What we need to do is read from the pipes to obtain the data. That’s easy enough, but there is a caveat; the process potentially writes lots of data to both stdout and stderr. If we go about this naively, then it’s well possible that internal buffers fill up and the whole process deadlocks. Moreover, if we throw input on stdin into the mix as well, then it’s easy to see we have to juggle multiple I/O streams concurrently.
If there is one thing Go is good for, it’s concurrent programming.
Mimicking the Python API again, the function that does the I/O is named
Communicate().
func (self *Process) Communicate() ([]byte, []byte, error)
Communicate reads from the pipes and returns the output as bytes. Not shown here: writing input data to the input pipe.
// captured output data
var b_stdout []byte = []byte{}
var b_stderr []byte = []byte{}
// separate error variables
var err_stdout, err_stderr error
// a waitgroup to synchronize concurrent goroutines
var wg sync.WaitGroup
// note that `self` is a Process struct
if self.StdoutPipe != nil {
wg.Go(func() {
b_stdout, err_stdout = io.ReadAll(self.StdoutPipe)
})
}
// ... left out: same thing for stderr
// wait for all goroutines to finish
wg.Wait()
// ... error handling goes here
// and return the data
return b_stdout, b_stderr, nil
The funny thing about goroutines is that they’re not necessarily running multi-threaded. (Since these are blocking I/O calls, they will be threads). It is absolutely possible to write this function in a single-threaded fashion (using I/O multiplexing). Using goroutines however just makes it so much easier.
The final step is cleaning up the process and collecting the exit code.
func (self *Process) Wait() (int, error)
The implementation of Wait() is straightforward, but we need to
take care of the error handling. The Go Command.Wait() method returns
an error of type ExitError when the process exits with a non-zero
exit status. I don’t like that; in my book a process is free to have any
return code, I don’t treat it as a hard error.
err := self.Cmd.Wait()
if err != nil {
// test whether error is an ExitError
exiterr, ok := err.(*exec.ExitError)
if ok {
exitcode := exiterr.ExitCode()
// non-zero exit is not an error
return exitcode, nil
}
// some other error occurred
return -1, err
}
// exit code 0
return 0, nil
Note the syntax of the Go type assertion. The error type is actually
a Go interface, implying that there may be many kinds of error.
The type assertion uses (runtime) introspection to verify that this
specific error is an ExitError.
Given all this, it’s easy to put together a fully functional
subprocess.Run() function. One last gotcha: the process must always be
waited on, otherwise we risk creating a zombie process. Either use Golang’s
defer keyword, or avoid returning too early on error.
This post explains the code beginning to end. It’s not natural to get everything right the first time around; writing code is an iterative process where you refine and refactor until arriving at where you want to be. There was no time and space in this post to describe the iterations in detail. Other than that, note the features unique to Go that make this code tick.