This page was last updated on March 13th, 2020(UTC) and it is currently May 30th, 2023(UTC).
That means this page is 3 years, 78 days, 3 hours, 47 minutes and 53 seconds old. Please keep that in mind.
30 - Actually Doing Something in Assemby
Next, you're going to want some reference materials (hopefully wikipedia gets it's shit together with it's political BS so I don't have to worry about it going down). I would suggest sticking to the 80386 instruction set (which incldues 80286 and everything before). I'm going to be referencing instrucions from that page and only commenting on what isn't obvious. We need to set up a function that prints out numbers (we could cheat and use printf, but then I'd have to explain printf, which means explaining ints, floats, etc). From here on out, for simplicity's sake, we're not going to be using 16bit x86 running modes unless we absolutely have to. This cuts down on what you have to learn right now, which is for the best if you aren't doing lots of DOS or kernel development.
The generla purpose registers in this mode are the ax, bx, cx, and dx registers. They are 16bit, but have "extended forms" if you refer to them with "e" as a prefix (for example, "eax"). The "low byte" can be accessed by replacing "x" with "l" ("al"), or you could grab the "high byte" via "h" instead of "l" ("ah"). Keep in mind that x86 is a Little-Endian processor, meaning that when writing to RAM, you'll find that the by order is swapped: "mov WORD PTR [0], 0x1234" will have 0x34 at the first byte of ram, followed by 0x12 (there's alot of complaints about this, but if you keep it in mind, you'll find this rarely matters, and when it does, it's not as difficult to fix as it is to simply remember that the issue exists). Some combinations of instructions are shorter when assembled if you use the registers for their named tasks (AX for math, BX for pointers, CX for counting and loops, and DX for extra data).
Next are the index registers: si, di, sp, bp. Of course, you normally use these in their extended forms (e prefix), but they do not have 8bit forms like the general registers. SI is for "source index", while DI is for "destination index." They're used most usefully with the "string instructions", but can be used as general registers in a pinch. ESP is used for the PUSH and POP instructions, and, in theory, could be used as a general purpous register, but then you basically break the stack (the push, pop, call, ret, int, and iret instructions). Please, do not do this. EBP can be used as a general purpous register if you want, but I recommend not doing so, no matter how tempting.
Next we have the segment selectors. Unlike the above registers, these registers cannot be used directly with "mov" and a value: must be a register. These are mostly used for the protection modes of the OS, so unless you're doing kernel dev, leave these alone, else face the wrath of security features. IP (and EIP) are not directly accessible, but you can control it with jmp, call, ret, etc. And lastly, we have flags/eflags. Only certain special instructions give you access to this register, and I honestly wouldn't recommend doing anything with it beyond experimentation. Now there are also "machine registers" and "memory mapped registers" but we aren't going to have access to those, and they can't be guaranteed to be present. If you're on a modern x86, whether that's 64bit or higher, be aware that these registers exist there, too, and more.
With that said, in addition to the "offset" we also have "dword ptr" (and "word ptr" and "byte ptr") to specify sizes where ambiguous. Instructions with the [] around them are referencing data stored at a memory location (with the exception of "lea" which uses the brackets for, what is essentially, a mov operation for the bracketless version, with some minor differences depending on context). Now, let us try to use this information to make a function that will print numbers onto the screen. Also keep in mind, GNU AS as 2 types of comments: the ones that begin with "#" and continue until the end of the line, and ones that begin with "/*" and end with "*/" and can span over many lines.
.intel_syntax noprefix
.section .text
/***********************************************************************\
| EXPORTS |
\***********************************************************************/
.global _main
.global _printnum
/***********************************************************************\
| int main(int argc, char** argv) |
\***********************************************************************/
_main: pushd offset format
call _puts
add esp, 4
pushd 12345
call _printnum
add esp, 4 #CABI says we clean up our own mess.
xor eax, eax #CABI also says we return 0 if the program is OK.
ret
/***********************************************************************\
| void printnum(unsigned int num) |
\***********************************************************************/
_printnum:
#Initialization
pushf #Let's preserve flags to be safe since we're directly playing
xor edx, edx
mov ebx, 10
mov eax, [esp+8]
mov edi, offset _printnum_string+31
std #Makes string instructions work backwards.
#Convert to string.
1: xor dl, dl #Clear remainder
div ebx
xchg al, dl
add al, 0x30
stosb
xchg al, dl
or eax, eax
jnz 1b
#Now actually print the result.
inc edi #Off by 1
push edi
call _puts
add esp, 4
popf
ret
/***********************************************************************\
| DATA |
\***********************************************************************/
.section .data
format: .asciz "The number is "
_printnum_string: .asciz " "
#The stack is normally used for data like the string, but this is easier.
The only stuff that should surprise you is the printnum function. Since we don't know much about the DOS ABI (Application Binary Interface: The standard by which we mix ASM, C, and C++), I think it's smart to play it safe and preserve the flags register, since we're actually going to manipulate it. We could use "cld", but how do we know what clib expects it to be? Does it ever use the string functions? We'll "popf" at the end to get it off the stack. The way this works is, "div" does a 64bit division on eax and edx, where edx is the high 4 bytes and eax is the low 4 bytes, and it cannot take an "immediate value" (number) as a source oprand (edx and eax are implied destination oprands). EAX will hold the result of the division, while edx will hold the "modulous" or the "remainder" from the divison. So we want to clear edx as an extended register to be safe, then we put the number we're dividing by (10) into ebx so we can use it as the source oprand. esp+8 (esp+4 is flags) holds the number we want to covnert (esp+4 is the return address that gets popped int EIP), and we want to start at almost the end of _printnum_string (but we don't want to overwrite the 0). 32 is just an arbitrary number that I chose for the size of the null-terminated string. Next we set the direction flag so it all goes backwards, then we start down "the loop."
A "loop" is a programming concept where code is executed over and over and over again, as if driving on a looping road, like a NASCAR race track. In this loop, we first make sure the remainder is erased, then we do the division, which puts the remainder in edx (dl, since our divisor is small enough), and our division result in eax. Now, the rub, is that stosb, a really convenient function, doesn't like source oprands other than eax, ax, and al, so we're temporarily exchanging al and dl, adding the ascii value for the character representing zero, then doing the stosb, then exchanging dl and al back so we can continue dividing. After which, we then use "or eax, eax" to set the zero flag if eax is empty. If it's not empty (Jump if Not Zero), we go back to the previous 1.
Then we add 1 to edi, because it naturally goes 1 too far, then we push edi as an oprand for puts, then we call puts, clean up, and return back to main. The only thing left here is that when you use the nubmers "1" through "9" as labels, you can use them multiple times without the thing foreseeing a conflict. You use "b" (back) to refer to the previous label using that number, and you use "f" (forward) to refer to the next one of that number. This setup should provide you with a basic setup for testing out the various things you can do with the instructions avialable to you. A nifty little trick is that dosbox will give you register dumps if the program crashes, and one nice way to do this is to use the breakpoint instruction ("int3" aka 0xCC). You can use that to get hex output. Beware, this printnum function is incapable of dealing with negative numbers. You should definitely experiment before moving on. There's GNU AS directives you don't know (like ".include") and you don't know most of the instructions. Now is a good time to play around before moving on, because we're moving into the territory of C and C++ next.
Get your own web kitty here!
©Copyright 2010-2023. All rights reserved.