Hacking in with stack overflows
Last week my assignment in Systems Programming was to take a vulnerable program- for which we were not provided the source code, only the executable binary- and figure out how to hack it and make it do what we want. Specifically, the program, called “server”, would ask us to put in our NetID (Cornell’s equivalent to a username), and it would spit it back out and terminate, like so:
[alex@linus ~]$ server
What is your NetID? ais46
Because there is an environment variable on Cornell’s Linux machines that is set to your NetID when you log in, the program could detect and respond accordingly if you entered something that was not your NetID. Behold:
[alex@linus ~]$ server
What is your NetID? ahc45
Nice try ais46, but you can't fool me!
My mission was to get server to print something else, namely, “All your base are belong to ais46”, proving that old internet memes never die, they just get recycled into college assignments by bored grad students.
How to go about this? The string “All your base are belong to” doesn’t appear anywhere in the program’s code, and it’s not as if the string I enter can contain hidden code, because all it does is go to a variable which is compared for equality with “ais46″… right?
Well, in a perfect world, yes. We don’t live in a perfect world, though: we live in a world where it is sometimes possible to exploit a buffer overflow. A buffer overflow occurs when data is placed in a variable that is too small to contain it, and the data “overflows” into other portion of memory. For example, if I were to declare an array big enough to hold ten characters, and then put the string, “My name is Alex Slover” into that array, I would be causing a buffer overflow. What happens to the part of memory that is overflowed into? Much of the time it will contain another variable in use by your program, which will now be changed to some new (essentially random) value, causing unexpected program behavior or a crash. This in itself is bad enough, but it is sometimes possible to use a buffer overflow to not only make a program crash, but take it over entirely.
(You may be wondering why buffer overflows are even possible, and why computers aren’t designed to simply keep track of how big a program’s variables are, and refuse to allow writing data beyond the boundary of a variable. The answer is that this is possible, and in fact it’s mandatory across all of the newer programming languages in existence: Java, Python, C#, etc. The problem is that enforcing variable boundaries (called bounds checking) takes extra time and extra memory, which is not acceptable in situations where code needs to run as fast as possible, such as in the kernel of an operating system. Thus, older languages (like C) and languages that are newer but designed to be used in low-level systems programming environments (like C++) do not require bounds checking, and and instead place the onus on the programmer to be extremely careful to not allow buffer overflows to occur. This does not mean that all programs written in C and C++ are vulnerable, of course, just that a great deal of extra caution is required; it would’ve been pretty easy to write the server program so that it did not create a weak spot and allow someone to break in, but that would’ve made for a rather unfair assignment.)
The next question is: how can a buffer overflow let you control a program and make it do whatever you want? Surely variables and executable code are stored in distant enough locations in memory as to make manipulating the executable code just by changing variables impossible. The answer is that code and data are indeed stored far, far apart; in fact, changing the executable data of a program once it has started running is impossible: that segment of memory is always marked as read-only and any attempt to write to it will cause a program to crash. (You may remember, back in the bad old days of Windows, getting that inexplicable error message “This program has performed an illegal operation and will be terminated”, and wondering whether your software had broken the law. What was actually going on was that the program, probably due to a careless programmer oversight, had attempted to write into a segment of memory it was not allowed to write to, and the operating system had killed it for security reasons.) So if we can’t change the executable program, how can we inject our own code? The answer lies in the stack, which is how all (or virtually all) programs running on a computer are organized in memory.
To understand the stack, remember programs running on a computer typically consist of functions that call other functions. In C, for example, all programs start at the main() function. If you were in the main function and wanted to print something to the screen, you’d call the printf() function, which would cause the program flow to jump to that function, execute whatever instructions form the prinf() function, and then jump back to main() when finished. But how does the program know where to jump back to when finished? The answer is that this value is stored on the stack, which is a special part of program memory. Every time a function is called, that function gets its own stack frame, which is a self-contained chunk of memory on the stack. So if I call the function printf(), then printf() will get its own stack frame, which is created when printf() starts and destroyed when it ends. The stack frame for a function contains all the local variables which are used by that function and that function alone (global variables, accessible between functions, are handled in a different manner), and it also contains the return address, which is where program flow jumps back to once the program is finished. For example, if I am in the main() function, and I call printf() when I am at location 0x84ae0000, the return address would be set to (for example) 0x84ae0004, which is where main() picks up and resumes executing once printf() is finished. (If the preceding notation was unfamiliar to you, seek guidance on Wikipedia)
So: return addresses (which is where the program will jump to when the function has finished) are stored on the stack, and local variables, which can overflow in some cases, are also stored on the stack. Are you seeing the answer? The trick to hijacking the server program is to put so much data into the variable which is supposed to contain my NetID, that it overflows onto the return address. For example, if I were to make my injected data just a long string of zeroes, then instead of the real return address, the program would try to jump to address 0x00000000. Now, forcing the program to jump to return address 0x00000000 isn’t very useful: I can’t write any evil code there, it would cause a program crash identical to the one described two paragraphs earlier. What to do? Well, through careful analysis and use of a debugger, I can figure out the memory address corresponding to the beginning of the variable that I am overflowing. Why not use that as the return address? Thus the answer becomes clear: turn the evil code to be executed into a series of bytes and use those bytes themselves as the padding to fill up the stack and let me write over the existing return address, substituting for that address the address of the beginning of my code. It basically loops back on itself!
Because my systems programming class may use this assignment in the future and probably wouldn’t take kindly to me giving away the answer, and because I think figuring things out for yourself is always more fun and useful than having someone else feed it to you, I’m not going to give a play-by-play of what I did. Suffice to say, I carefully crafted some data corresponding to the evil instructions I wanted to execute, then appended a return address to the end of that data. I then passed in that data where the program was expecting my NetID, and bam: my data overflowed its variable, spilling onto the rest of the stack, causing it to be corrupted. Nothing happened immediately, but when the current function ended, it did its job and looked at the return address on the stack to know where to jump back to, which had been replaced with the address of my malicious code. It dutifully jumped to that location, began executing the instructions it saw, and that was that. Of course, the only thing my instructions did was cause server to print a goofy message, but in the real world, it would not have been difficult to craft code that did something genuinely dangerous: such as getting “root” privileges to have full control over the entire computer.
This sort of attack was only made possible because of the artificial nature of the assignment; in the real world, things are not so simple. Nevertheless, buffer overflow attacks, even if they are more complicated than the one I have just described, are among the most common entry vectors for a hacker with bad intentions to gain control of a computer system. We’re doing assignments like this not because Cornell wants to secretly train a private army of hackers, but because by understanding exactly how these attacks work, we’re better equipped to stop them.
I’ll probably post some more things along these lines in the future, but there are a vast quantity of resources available if you want to learn more. I especially recommend The Art of Exploitation, a no-nonsense book that teaches you exactly how stacks, programs, and computer networks work, and how to look for ways to exploit them. This book is particularly good, because it’s not a “cookbook” that says, “Do this to hack into a computer” (such a book would be useless anyway, as whatever vulnerabilities it explained would quickly be patched and made useless). Rather, it explains how computers work at a fundamental level, why vulnerabilities occur, and where to find them. The book is useful even if you’ve never done a day of programming in your life: the first chapter consists of enough C lessons to get you moving. (Just to be perfectly clear: this is an academic interest on my part. I would never do anything illegal with the knowledge I have, and neither should you. Hacking is like the Force, it should be used for knowledge and defense, never for attack.)
Filed under: school | Leave a Comment
Tags: hacking, programming