Note, this articale first goes into the elementary properties of projects and quickly evolves to covering more advanced topics and tools. The basics section is here to put us all on the same page and make the project more accessible hackers that might have missed some of the basic but could benefit from more details
The Basics of a Program
In the Windows world, a compiled program is known as an assembly or executable. The main difference between an extension of .exe and .dll is a .exe has a Main() method that tells the program where to start. A .dll is just an extension that allows an .exe to reference it and extend the program. Typically when developing a program we divide up the program into logical sections, or projects. The collection of projects is known as a solution.
To the left we can see a common example of a solution. So the database project contains the logic to talking to the database and is compiled as database.dll. Business Logic is our project that will reference Database, and use it to call for data (or to send data back) and then operate on it. Finally Main would be our interface, it would receive the request from the user, and call business logic project to handle it. The data is then presented in the Main program.
Example. Main has a grid to display users. A user asks to see all users that are female and types such in a search field, then clicking a button. That request is sent to Business Logic that creates the search request, then passes it to Database to make the actual call. Database makes the call, receives the data back and places it into something called a Model or a Data Transfer Object. This is then passed backed to Business Logic. Business Logic might transform it into a Presentation Model or View Model, think combine First and Last Name and combine them as Full Name. Lastly the formed data is now sent to Main and presented to the user.
While that is a very simple example, it high lights how we separate different logical sections of code. In a small program, this could be segregated by folders. A subtly here, is that Main can’t call database directly, only business logic can call it. Database can’t call any project (note it has the network connect to the actual database).
Damn Vulnerable Thick Client Example
Since we just came from testing’s DVTA, let’s use that solution as our first example. We see in the main folder a DVTA.sln, this is an XML file that tells us about the solution, i.e which projects are included. The subfolders DB, DBAccess and DVTA are the projects of the solution. Within each is projectname.csproj, another xml file telling us about the project.
Building and Running an Assembly (under the hood)
All of the .NET languages are compiled just not in the standard sense, into a Microsoft Intermediate Language (IL). This is not like C or C++ which compiles to actual assembly language. This IL can then be compiled again into true native code (assembly language) in a process known as Just In Time compilation when the program is ran. This entire process is managed by the Common Language Runtime (CLR) and why the language and process are known as managed.
IL does have some resemblance to assembly and some to C#/VB. When the compiler converts C# to IL, it produces metadata about the solution and the actual IL code. This is known as a netmodule. A netmodule is a collection of compiled code that can not run on it’s own but must be linked together with other netmodules into an assembly. An assembly is what is deployed as an EXE or DLL and is also known as a Portable Executable (PE).
The metadata described above, which we will see in more detail is just a datatable that explains what is in the IL code, like data types, members, references, etc. The two can not be separated when put into the assembly and expect to run.
The CLR
The Common Language Runtime manages the entire execution of a .net assembly, from running the program to loading the services the program needs. The languages that run under the CLR are thus known as managed since this process manages the entire interaction of the program with the OS/User. The CLR is able to provide cross language integration, cross language exception handling, garbage collection (memory management), avoids the use of pointers by using delegates and provides common types, known as the common type system.
We have seen how the CLR uses the IL, thus allowing multiple languages to interact. It even provides an “unsafe” method that allows unmanaged code to be called. Garbage Collection is a feature of Java and .NET that allows for objects to be created and destroyed without the need to call malloc(), calloc() or new and destroyed without the need for free() or delete. Once an object goes out of scope it is marked for deletion and eventually recollected by the OS for use.
Delegates are reference types, .NET only has value types and reference types, that is similar to a function pointer. However these delegates are type safe and not prone to the dangers of pointers like in C and C++.
There is an entire specification for C# called ECMA, and specifications for the Common Intermediate Language (CIL), the virtual machine Common Language Runtime, and even a full specification for the Common Type System.
So the entire process of building and running the application can now be defined.
First, using a text editor or Integrated Development Environment (IDE) like Visual Studio, a dev writes valid C# (or other language). The C# is then compiled into the intermediate language and some metadata is created. This is placed within the windows PE/COFF format (see below).
Next when the user clicks on the .exe file, a process is performed by the CLR where the IL along with the metadata to help interpret it compile the code again in a process called just in time (JIT) compilation. This produces native assembly code that is then run just like a native application.
While this seems like a long process that would slow the program down, it really only results in a slightly larger PE file with near native execution speed.
The Portable Executable File
A portable executable (PE) file is a format that Windows follows for any code that the OS will execute. PE is based off of the Microsoft Common Object File Format (COFF). All C and C++ compiled programs use this format. .NET uses this format, just with a slight modifications. Other common programs like .ocx use this as well. Note the image below is 64 bit. We will stick with 32 bit as much as possible.
Headers
The first header is the DOS Header. It starts with e_magic = MZ and ends with a pointer, named e_lfanew, to our next section. Otherwise the DOS Header is really useless for our needs.
typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header
WORD e_magic; // Magic number
WORD e_cblp; // Bytes on last page of file
WORD e_cp; // Pages in file
WORD e_crlc; // Relocations
WORD e_cparhdr; // Size of header in paragraphs
WORD e_minalloc; // Minimum extra paragraphs needed
WORD e_maxalloc; // Maximum extra paragraphs needed
WORD e_ss; // Initial (relative) SS value
WORD e_sp; // Initial SP value
WORD e_csum; // Checksum
WORD e_ip; // Initial IP value
WORD e_cs; // Initial (relative) CS value
WORD e_lfarlc; // File address of relocation table
WORD e_ovno; // Overlay number
WORD e_res[4]; // Reserved words
WORD e_oemid; // OEM identifier (for e_oeminfo)
WORD e_oeminfo; // OEM information; e_oemid specific
WORD e_res2[10]; // Reserved words
LONG e_lfanew; // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
The section is DOS stub. The DOS Stub is an actual executable image that prints an error “This program cannot be run in DOS mode”. Literally a program in the program! If you copy the hex code save it to a file.exe you can actually run the program.
Next is the NT Header, which is composed of three subsections: Signature, File Header, and Optional Header. Signature just specifies the file is a PE, the file header is of the COFF format and holds some basic PE data, and finally the optional header, optional because some formats do not require it, but executables do, and this section provides important information to the OS loader. In the data directory section of the header we have an address to the .NET header.
The secret Rich Header
There is actually a header between DOS Stub and NT Header. It is not documented and is only inserted when a PE is built by Visual Studio. The header can be zeroed out with no consequences. The end of the header is marked by the word RICH. The struct for this PE was only known after a leak of the Windows 2000 source code.
Sections
The section header basically tells us the sections of the PE file the address of the sections and size among other important data. This is considered an array of sections.
Here we can see this executable has a .text, a .rsrc, and a .reloc sections. The sections, of which there can be many, are where the actual executable exists. Each section of the array is named struct _IMAGE_SECTION_HEADER[n] where n is an integer. So _IMAGE_SECTION_HEADER[0] is our .text section.
.text is typically at array[0] and is where our code is stored. A starting address of the first instruction of the program is located in this section.
.rsrc is a resource container and contains objects like images or icons used in the program.
.reloc contains information about the addresses of the relocation table.
In the table above, we see the Characteristics column. This tells us information like is the section executable, read only, read and write, and so on. The section’s names are limited to 8 ASCII characters but can be renamed and the sections can be defined by the user.
Other Sections
While a program may have as many sections as can be defined, there are 9 predefined sections.
- Executable Code Section (commonly named .text)
- Data Sections (of which .data, .rdata, .pdata, .bss are types)
- Resources Section (commonly named .rsrc)
- Export Data Section (commonly named .edata)
- Import Data Section (commonly named .idata)
- Debug Information Section (commonly named .debug)
dnSPY
The first tool we can use on a .NET assembly is dnSPY, Running the tool we can import the DVTA.exe, right click the assembly and go to View MD table.
First notice e_magic. 0x5A4D if we break this down, we have the 0x indicating hexidecimal. Then 5A and 4D. If we understand little endian vs big endian, we see 5A is Z and 4D is M. This is little endian here, so we actually have MZ. MZ is the initials of Mark Zbikowski and is a carry over from the 16 bit DOS days.
ILSpy
ILSpy is almost identical in it’s features to dnSPY. I like how ILSpy shows the metadata immediately from the start. In ready to run images you can also choose between intel and att assembly language syntax.
CodeTrack
Code track is a very versitile profiler that can debug and analyze code. You can capture events like garbage collection and other options.
After loading the program, you use it like normal, then exit and analyze. There you will find a full analysis of what occured.
You can see the decompiled code as well. The flame tab shows all the mouse clicks, methods called and windows form events like callbacks. The timeline event can help profile and find slow spots. This is one amazing tool.
GarbageMan
GarbageMan is a set of tools for performing heap analysis. Anything a program can touch, this program can find and analyze the streams.
This is an excellent tool for malware and malware analysis or IL generation to process injection.
.NET Reactor Slayer
This is a deobfuscation tool with many powerful features. Quickly decrypt strings, remove confusing junk, find embedded assemblies.
A simple obfuscator for studying can be found here: https://github.com/pjc0247/lookatme this produces more confusing code and when observed in dnSPY it is harder to understand, though not impossible.
Now, let’s download and build Yet Another Obfuscator and run the program.
Let’s open in dnSPY and check what we see.
Now that is certainly confusing. But for a determined attacker is that enough? Considering we can add a breakpoint an step through the code, we could certainly retrieve the code back. In fact, one could argue just a standard C++ compiled program with native assembly is probably going to be far harder for the average person.
Let’s try out our deobfuscator and see how close we can get. We get nowhere, it crashes.
The tool looks very cool, but I can not seem to get it to deobfuscate anything.
pestudio
PeStudio is a tool that is used extensively in malware analysis. Loading an assembly gives a quick clean overview of the exe.
CFF Explorer VIII
CFF Exp is another app that gives a quick overview of a PE and it’s structures.
Notice that the MZ starts the app and the phrase “This program cannot be run in DOS mode” are always at offset < B0, usually exactly where they are located now.
PE Bear
PE Bear is yet another fine tool for working with PE files. The best feature I found is the compare feature. Sometimes the ability to compare two objects, especially two different assemblies can save massive amounts of time
DIE
Detect It Easy is a simple app that can give you a brief information about an executable.
WinDbg
WinDbg is a native windows debugger. It is not just for unmanaged apps. As can be seen by the running of DVTA. This app takes a long time to master and will be a requirement with C/C++ apps. Including it here, since this app can be a major help in a tight spot.
Console Based Applications
de4dot is a console based deobfuscator and unpacker that can almost get back to an original project code.
RunDotNetDLL allows a single method from a dll to be ran!
sfextract allows a single file to be extracted from an executable.
FLOSS extract obfuscated strings from an exe (malware).
Telerik JustDecompile
This tool has saved me in my professional career multiple times. The program can take an assembly and completely decompile it into a new solution. Just an amazing tool from an amazing company.
As an aside, I was at Microsoft Ignite or Tech, the big developer conference in Texas in 2014 and Telerik handed out a one year pass for their MVC components. The add-ins are amazing and can make any website look like an enterprise dashboard of your dreams. You may have also used their testing proxy tool Fiddler.
Final Thoughts
There are many tools for working with .NET assemblies beyond visual studio. A whole host not mentioned here. The best advice I can give is use all the tools and find out which ones work best for your situations. This way you can call them up at a moments notice and perform the task at hand.
One final mention is the Microsoft SysInternals. They are well worth digging into and easily downloaded from Microsoft.