Data Transfer,Addressing,Arithmetic
Operand types
There are three types of operands
- Immediate : uses a numeric literal expression
- Register : uses a named register in the CPU
- Memory : references memory location
Direct Memory Operands
Direct memory operand is a reference to storage in the memory.
.data
var1 BYTE 10h
The MOV instruction
The MOV instruction copies data from source to destination. format: MOV destination,source
Rules to use MOV instruction are
- Both operands should be of same size
- Both operands cannot be memory operands.
- The instruction pointer(IP,EIP,RIP) cannot be a destination. list of standard MOV formats
- MOV reg,reg
- MOV mem,reg
- MOV reg,mem
- MOV mem,imm
- MOV reg,imm
- Memory to Memory
- We cannot directly move a value from a variable to another variable.First we have to assign that value to register and then after that we can assign it to the other variable.
.data
var1 WORD xxxx
var2 WORD xxxx
mov ax,var1 ; first move from variable 1 to register
mov var2,ax ; now moving from register to variable2
MOVZX or move zero extend
When we copy a smaller value to a larger destination the MOVZX
instruction fills(extends) the upper half of the destination with zeros. To use this
- The destination must be a register
- The instruction only is used for unsigned integers
- source operand cannot be constant.
MOVSX or Sign Extension
- The
MOVSX
instruction fills the upper half of the destination with copy of the source operand sign bit.
mov var1,100000111b
movsx ax,var1 ;sign extension
- The destination must be a register
- The instruction only used with sign integers.
LAHF and SAHF instruction
The LAHF(Load status flags into AH) instruction copies the low byte of the EFLAGS register into AH.Using this instruction you can easily save a copy of flags for later use.
.data
saveflag BYTE ?
.code
lahf ; loads flags into AH
mov saveflag,ah ; save them to a variable for later use
The SAHF (store AH into status flag) instruction copies AH into the low byte of EFLAGS.forexample we can retrieve the values of earlier variable:
mov ah,saveflags
sahf ; copying into flag register
XCHG instruction (exchange data)
The XCHG exchanges the content of two operands.Rules of XCHG are same as MOV.The XCHG exchanges the values of two operands.At least one must be a register.
.data
var1 WORD 1000h
var2 WORD 2000h
xchg ax,bx
xchg var1,bx
To exchange two memory operands use a register as temporary container and combine MOV with XCHG:
mov ax,val1
xchg ax,val2
mov val1,ax
Direct-Offset Operands
We can add displacement to the name of a variable creating a direct-offset operand.This lets you access the locations which may not be available explicitly.
arrayB BYTE 10h,20h,30h,40h,50h ; creating an array of size BYTE
mov al,arrayB ; now the first byte in the array will be automatically moved ;; to al i.e. al=10hmov al,[arrayB+1] ; adding a direct offset to access next value in the array
i.e AL= 20h
mov al,[arrayB+2] ; AL=30h
A program that rearranges the values of three double word values in an array
.data
arrayD DWORD 1,2,3
mox eax,arrayD
xchg eax,[arrayD+4] ; exchanging the values
xchg eax,[arrayD+8] ; exchanging
mov arrayD,ax
Addition and subtraction
INC and DEC instructions
The INC(increment) and DEC(decrement) instructions are use to add or subtract from a register or a memory operand.The syntax is INC reg/mem ; one is added
DEC reg/mem ; one is subtracted
Some examples are
.data
myWord WORD 100h
.code
inc myWord ;myWord = 1001h
mov bx,myWord
dec bx ; BX=1000h
Add instruction
ADD simply adds to a operand or register. ADD dest,source
Here is an example
.data
var1 DWORD 10000h
var2 DWORD 20000h
.code
mov eax,var1 ; EAX=10000h
add eax,var2 ; EAX=30000h
SUB instruction
The SUB instruction similar works similar to add. SUB des,source
Here is a short example
.data
var1 DWORD 30000h
var2 DWORD 10000h
mov eax,var1 ; EAX=30000h
sub eax,var2 ; EAX=10000h
NEG instruction (negate)
The NEG (negate) instruction reverses the sign of the number by converting the number to its two’s complement. The following are permitted : NEG reg
or NEG mem
Example :
.data
valB BYTE -1
valW WORD +32767
.code
mov al,valB ; AL = -1
neg al ; AL = +1
neg valW ; valW = -32767
Zero Flag (ZF)
The zero flag is set when the result of the operation produces zero in the destination operand.
mov cx,1
sub cx,1 ; as cx=0 the ZF becomes 1
Sign Flag
The Sign flag is set when destination operand is negative.The flag is clear when the destination is positive.
mov cx,0
sub cx,1 ; CX = -1 so SF is set or SF=1
add cx,2 ; CX = 1 , SF =0
Carry flag
The carry flag is set when the result of an operation generates an unsigned value that is out of range (to big or small for destination operand).
mov al,0FFh
add al,1 ; CF = 1 , AL =00;; Trying to go below zeromov al,0
sub al,1 ; CF =1 , AL = FF
Overflow flag (OF)
The Overflow flag is set when the signed result of an operation is invalid or out of range.
mov al,+127
add al,1 ; OF = 1 , AL = 128
;; Example 2
mov al,7Fh ; OF = 1 , AL = 80h
add al,1
Remember Overflow flag is only set when
- Two positive operands are added and their sum is negative
- Two negative operands are added and their sum is positive.
Data Related Operations and Directives
OFFSET operator
OFFSET operator returns the data in bytes of a label from the beginning to its closing segment. Protected mode is 32 bits. Real mode is 16 bits.
Protected mode can write programs using only a single segment(flag memory model).
For example
.data
bval BYTE ; BYTE is of 8 bits or 1 Byte
wval WORD ; WORD is of 16 bits or 2 Bytes
dval DWORD ; DWORD is of 32 bits or 4 Bytes
dval2 DWORD ; //Now suppose that segment begins at 00404000h, so OFFSET will increment its value with respect to the data type being stored in it.mov esi,OFFSET bval ; ESI = 00404000
mov esi,OFFSET wval ; ESI = 00404001 as 1 byte added from earlier
mov esi,OFFSET dval ; ESI = 00404003 as 2 bytes are added from earlier
mov esi,OFFSET dval2 ; ESI = 00404007 as 4 bytes are added from earlier
PTR operator
It overrides the default type of a label (variable) and provides flexibility to access part of a variable.
.data
myDouble DWORD 12345678h
.code
mov ax,WORD PTR myDouble ; loads 5678h as Word is of 8 bits
mov WORD PTR myDouble,4321h ; saves 4321h
Type operator
The TYPE operator returns the size of a single element of a data declaration.
.data
var1 BYTE ?
var2 WORD ?
var3 DWORD ?
var4 QWORD ?
.code
mov eax,TYPE var1 ; 1 Byte
mov eax,TYPE var2 ; 2 Byte
mov eax,TYPE var3 ; 4 Bytes
mov eax,TYPE var4 ; 8 Bytes
LENGTHOF Operator
The LENGTH of operator counts the number of elements in a single data declaration.
.data
byte1 BYTE 10,20,30 ; LENGTHOF will be 3
SIZEOF Operator
The SIZEOF operator returns a value that is equivalent to multiplying LENGTHOF by TYPE.
.data
byte1 BYTE 10,20,30 ; SIZEOF is 3 as TYPE x LENGTHOF = 1 x 3
LABEL Directive
- Assigns an alternate label name and type of an existing storage location.
- LABEL does not allocate any storage of its own
- Removes the need for PTR operator.
.data
dwlist LABEL DWORD
wordlist LABEL DWORD
intlist BYTE 00h,10h,00h,20h
.code
mov eax,dwlist ;20001000h
mov cx,wordlist ; 1000h
mov dl,intlist ; 00h
Basic Program Structure/Directives
Following is a basic program structure:
.model ; total memory program would take
.stack 100h ;specifies storage stack(100 bytes hexadecimal in this case)
.data ; variables are defined below this directive
.code ; specifying that code is about to start
Main proc ; main procedure or function
Main endp ; end of main procedure or function
End main ; end of program
Declaration of String
.Data
str db "This is a string",'$' ; directive db defines a byte size variable
; dw defines a word sized variable (16 bits)
; dd defines a double word(32 bits) variable.
;$ is a string terminator Displaying offset
; Offset tells assembler to load data, starting from specific address
; Display of string :
.code
mov edx,offset str ; mov the address of string to edx
call writestring
Stacks
Stacks are used for
- Temporarily saving register values
- Local variables are stored in stack
- Function parameters and return addresses
Registers
EAX, EBX, EDX : These are all generic registers and can be used for any integer, boolean ,logical or memory operation. ECX : Used as counter for repetitive instructions ESI / EDI : Generic frequently used as source / destination pointers in the instructions that copy memory. (SI stands for source index, DI stands for destination index) EBP : Can be used as generic register, but is mostly used as stack base pointer. ESP : This is a CPU stack pointer.
Function Calls
Function calls are implemented using two basic instructions in assembly language. The CALL instruction calls a function, and the RET instruction returns to the caller. The CALL instruction pushes the current instruction pointer onto the stack (so that it is later possible to return to the caller) and jumps to the specified address. The function’s address can be specified just like any other operand, as an immediate, register, or memory address. The following is the general layout of the CALL instruction. call {function address}
CMP
This instruction is CMP which compares the two operands specified. cmp ebx,0xf020
jnz 10025009
The first instruction is CMP, which compares the two operands specified. In this case CMP is comparing the current value of register EBX with a constant: 0xf020 (the “0x” prefix indicates a hexadecimal number), or 61,472 in decimal. As you already know, CMP is going to set certain flags to reflect the out- come of the comparison. The instruction that follows is JNZ. JNZ is a version of the Jcc (conditional branch) group of instructions described earlier. The specific version used here will branch if the zero flag (ZF) is not set, which is why the instruction is called JNZ (jump if not zero). Essentially what this means is that the instruction will jump to the specified code address if the operands com- pared earlier by CMP are not equal. That is why JNZ is also called JNE (jump if not equal). JNE and JNZ are two different mnemonics for the same instruction — they actually share the same opcode in the machine language.
mov edi,[ecx+0x5b0]
mov ebx,[ecx+0x5b4]
imul edi,ebx
This sequence starts with an MOV instruction that reads an address from memory into register EDI. The brackets indicate that this is a memory access, and the specific address to be read is specified inside the brackets. In this case, MOV will take the value of ECX, add 0x5b0 (1456 in decimal), and use the result as a memory address. The instruction will read 4 bytes from that address and write them into EDI. You know that 4 bytes are going to be read because of the register specified as the destination operand. If the instruction were to reference DI instead of EDI, you would know that only 2 bytes were going to be read. EDI is a full 32-bit register (see Figure 2.3 for an illustration of IA-32 reg- isters and their sizes).
The following instruction reads another memory address, this time from ECX plus 0x5b4 into register EBX. You can easily deduce that ECX points to some kind of data structure. 0x5b0 and 0x5b4 are offsets to some members within that data structure. If this were a real program, you would probably want to try and figure out more information regarding this data structure that is pointed to by ECX. You might do that by tracing back in the code to see where ECX is loaded with its current value.
The final instruction in this sequence is an IMUL (signed multiply) instruc- tion. IMUL has several different forms, but when specified with two operands as it is here, it means that the first operand is multiplied by the second, and that the result is written into the first operand. This means that the value of EDI will be multiplied by the value of EBX and that the result will be written back into EDI.
push eax
push edi
push ebx
push esi
push dword ptr [esp+0x24] call 0x10026eeb
This sequence pushes five values into the stack using the PUSH instruction. The first four values being pushed are all taken from registers. The fifth and final value is taken from a memory address at ESP plus 0x24. In most cases, this would be a stack address (ESP is the stack pointer), which would indicate that this address is either a parameter that was passed to the current function or a local variable. To accurately determine what this address represents, you would need to look at the entire function and examine how it uses the stack.
Global Variables
Consider, for example, the sequence that accesses a global variable.
MOV EAX, DWORD PTR [pGlobalVariable]
The preceding instruction is a typical global variable access. The storage for such a global variable is stored inside the executable image (because many variables have a pre initialised value).