01/08/2016
A basic overview of computers, including their components both hardware and software are detailed here. While most of this is known, it is included as part of background information.
Knowledge of how a computer works and processes instructions is key to a solid career in general IT, and especially true in Security Research. One does not need to develop the knowledge of an Electrical Engineer or Computer Engineer, with an intimate knowledge of how every piece of hardware works and interacts but an ability to trace at the software level is a must.
With the Internet of Things, and the embedded device, lower level attacks will become more and more prevalent. The attack on Iran’s Nuclear Program in 2010 is the perfect example. The Stuxnet worm forced centrifuges to spin at a much higher rate while appearing to run normally. Attacks on pace makers, airplanes, and other small devices could cause death and destruction. Imagine a hospitals life support systems attacked by malware and held for ransom with threats to turn off the devices unless millions are paid.
Computer Components
A computer, whether that is a Raspberry Pi, an Apple Mac, or a standard PC, has a number of common components.
- The Motherboard – the motherboard connects all the components of a computer, hence the name mother.
- The CPU – Central Processing Unit – the brains of the computer and where all instructions are processed.
- The RAM – Random Access Memory – volitale memory that holds data for fast read and write access.
- The Cards – There are a number of cards connected to the motherboard like audio and graphics. Sometimes how these systems connect can be vulnerable.
- Hard Drives – The hard drive is where long term memory is stored.
Operating System Components
The main piece of software on a computer is the Operating System. Examples include Windows, Unix, FreeBSD, GNU/Linux and Minix. The OS handles memory management, the drivers used between the OS and hardware, I/O management like input from the keyboard and output to the screen, and the application programmer interface (API) that allows the software applications on the computer to communicate with the OS. This API will come into play when we develop software or look for ways around security mitigations. The device drivers sometimes can be a way to get to an administrator account.
The CPU and RAM
The above information is well known but this allows us to communicate on the same level with common nomenclature. Now a more rigorous definition and discussion on computing can be derived.
John Von Neumann was a mathematician turned computer scientist that lived from 1903-1957. He made numerous contributions in many areas and worked on the Manhattan project during WWII.
The Von Neumann Architecture lays out a theoretical definition for a modern computer system. The image to the right shows the standard architecture, and with the exception of the Control Unit and Arithmetic Logic Unit, the other terms are well known. The CU and ALU are now contained in the CPU. The green portion can be thought of as the motherboard with the lines representing a Bus, or connections for sending and receiving data.
The control unit essentially manages all communication within the CPU. The CPU works on a Fetch, Decode, Execute process. The CPU is made up of registers called data, pointer or index registers.
Here the focus will be on the Intel Processor and Instruction Set Architecture that Intel uses. The ISA describes the codes and syntax that a processor uses. This language is known as machine code or assembly language.
The CPU Registers
With the Intel processor there is the x64 or 64 bit architecture and X86 or 32 bit architecture. The 64 bit registers start with an ‘R’ while the 32 bit registers start with an ‘E’. The focus here will exclusively be in the x86 architecture.
EAX – The AX is the accumulator register and is used in math and I/O operations.
EBX – The BX is the base register used in index addressing.
ECX – The CX is the counter used for looping and to rotate instructions.
EDX – The DX is used for I/O and math ops on large numbers.
Each register can farther be broken down into 2 sub registers.
The Pointer Registers
The pointer registers also use the R for 64 bit and E for 32 bits and can be farther decomposed. The IP and SP are two of the more important registers and ones we will focus on during exploitation of stack based buffer overflows.
EIP – The Instruction pointer stores the *address* of the next instruction to be executed.
ESP – The Stack pointer points to the top of the stack as is what we will first begin attempting to overwrite.
EBP – Base pointer is also known as stack base pointer or frame pointer that points to the base of the stack
The Stack and Heap Structures
In order to understand how the registers work and how instructions are processed, two structures must be understood.
The Stack is a data structure used in computer science and can be understood as a last in first out data structure. If you have ever visited a lunch room, and seen a stack of plates where the top plate is pulled off, you have seen this common structure in action!
When a program runs, it must store data somewhere. Certain types of data used in a program are stored on the stack. The plates here can be thought of as the data that is pushed on and popped off. The Instruction Pointers discussed above are how the location of the current executing program are managed.
The stack is divided into stack frames. Stack frames are the logical sections the stack is divided into. The Base Pointer, points to the base of the current frame and the Stack Pointer, points to the top of the current frame. If a new stack frame is added, the previous pointers are pushed on to the stack so that once the new frame finishes executing, the pointer is popped off the stack and used for completing the previous frame.
In this sample stack, we can visualize how the stack looks and behaves. First thing to note is the numbers on the right go from highest to lowest. This is because a stack moves from a higher address to a lower address. In other words, it grows toward smaller address.
The second thing here to note is the active frame. The active frame here N, is used and once this section is done, the yellow field helps us point back to the area the program will return to.
The final detail to take note of is the pointer. This pointer will be very important in our stack smashing coming up.
One thing noted here, is that addresses are written in hexadecimal. So following the above, a stack may grow from 0xF (15d) to 0xA (10). The highest memory address on a stack is 0xBFFFFFFF. At this point, a general understanding of the stack is all that is necessary. The stack will continue to be discussed over many articles.
The Heap
The heap is another area data can be stored for a program during execution. The heap is far more complex in implementation and in exploitation. Farther, the way a heap is implemented is different based OS and language. The heap is used for our dynamically allocated memory. For now, let’s consider a heap to be a data structure known as a tree.
The top node, here the 100, is our root node. From the root node, we have two choices, we can go left or right. The two choices indicate this is a binary, or two leaves, tree. The node has at most two leaves. There is a way to traverse the tree nodes along the paths, or lines, and the nodes can be moved or rearranged.
The implementation of a heap can become quite complex. The heap can use any number of data structures each with a rigorous set of mathematics to back it up. In fact, whole branches of math, like graph theory, set theory and combinatorics each add to tree structures.
Regardless of the implementation, of which we will view actual implementations in the future, the head is used to store dynamic memory. The heap grows from lower addresses to higher addresses unlike the stack. The heap is often used when a program is not sure how much data it must be prepared to take in.
Some additional properties of the heap include the memory here is stored in RAM, just like the stack, but is much slower to create and destroy. Data here must be manually created and destroyed which is often a problem leading to fragmentation and memory leaks.
The heap will be explored in much more depth in later articles. The reason for exclusion at this point, is that once the stack has been used and exploited in depth, the heap will be much easier to understand.
Up next we begin learning about PE and ELF files, forms of executables and some basic assembly.