Monday, April 1, 2013

strlcpy(): The unsafety of a safe string copy function

Every C programmer should know that the strcpy() function is insecure; if the destination buffer is too small to hold the string, strcpy() will happily overflow the buffer and copy over whatever was there in memory. Other than corrupting variables, it may overwrite return addresses in stack memory, which spells doom for system security because it allows an attacker to inject code into the running program, commonly known as ‘exploiting a vulnerability.’

Sure enough the strcpy() function is a very dangerous thing. OpenBSD thought up the strlcpy() and strlcat() safe string functions to counter the problem. With these, you always must supply the size of the destination buffer. Strings that are too long will not overflow the buffer, any attempts at buffer overflow are simply stopped. strlcpy() will terminate the copied string at the end of the buffer and thereby plugging the hole.

A remarkably simple but effective solution. The strlcpy() and strlcat() functions are widely used in OpenBSD and were adopted in other operating systems as well, like FreeBSD and Mac OS X. But I wouldn’t have known about these functions in the first place if I hadn’t run into a code that would not compile on Linux, where strlcpy() is missing from the GNU C library, and maybe righteously so.

However superficially brilliant strlcpy() seems, it is all too easy. The function may truncate the copy of the string, so … It copies the string, except when it doesn’t, then it only partially copies it. Many people consider this incorrect. It isn’t logical to have a copy function that truncates the copy. You can get really weird things from this, for example when an UTF-8 string gets truncated in the wrong position, it will result in a corrupted string. What if the string is a file path or a URL that is truncated?
Of course, you should check the return value of the function. It’s standard programming practice that also applies to the traditional string handling functions. But in a strange way, these safe string functions are actually unsafe. A false sense of security creeps in. Actually, not copying the string at all would have been better than truncating.

Nevertheless, strlcpy() remains in use in various codes (like OpenSSH and rsync). Wouldn’t it be nice if these functions were available just for the sake of portability. In the Linux world, that argument just isn’t good enough.

So, what are your options? For portability, you might want to use autoconf and ifdef HAVE_STRLCPY, but note that it doesn’t really help you. It’s sugar coating that looks advanced, but it doesn’t make it any better. My advice is to steer clear of strlcpy(), just don’t use it. Stick with the traditional string functions and keep checking those buffer lengths.
You might use an external string library that supports growable strings. Personally, I’m good with my_strcpy() function which is basically strlcpy() with a twist: call abort() when the destination buffer is too small. It’s not user-friendly, but it gets you out of a bad situation quickly.
Other than that, accept that C is maybe not the best choice for implementing userland code written by mere mortals. Try a different language, like golang. It has very robust string handling.

Next time we’ll have a look under the hood and examine how buffer overflows work.