Basics of Assembly language : Part 3

A51F221B

9 min readApr 9, 2022

Data Transfer,Addressing,Arithmetic

Operand types

There are three types of operands

Immediate : uses a numeric literal expression
Register : uses a named register in the CPU
Memory : references memory location

Direct Memory Operands

Direct memory operand is a reference to storage in the memory.

.data
   var1 BYTE 10h

The MOV instruction

The MOV instruction copies data from source to destination. format: MOV destination,source Rules to use MOV instruction are

Both operands should be of same size
Both operands cannot be memory operands.
The instruction pointer(IP,EIP,RIP) cannot be a destination. list of standard MOV formats
MOV reg,reg
MOV mem,reg
MOV reg,mem
MOV mem,imm
MOV reg,imm
Memory to Memory
We cannot directly move a value from a variable to another variable.First we have to assign that value to register and then after that we can assign it to the other variable.

.data
var1 WORD xxxx
var2 WORD xxxx
mov ax,var1 ; first move from variable 1 to register
mov var2,ax  ; now moving from register to variable2

MOVZX or move zero extend

When we copy a smaller value to a larger destination the MOVZX instruction fills(extends) the upper half of the destination with zeros. To use this

The destination must be a register
The instruction only is used for unsigned integers
source operand cannot be constant.

MOVSX or Sign Extension

The MOVSX instruction fills the upper half of the destination with copy of the source operand sign bit.

mov var1,100000111b
movsx ax,var1 ;sign extension

The destination must be a register
The instruction only used with sign integers.

LAHF and SAHF instruction

The LAHF(Load status flags into AH) instruction copies the low byte of the EFLAGS register into AH.Using this instruction you can easily save a copy of flags for later use.

.data 
saveflag BYTE ?
.code
    lahf ; loads flags into AH
    mov saveflag,ah ; save them to a variable for later use

The SAHF (store AH into status flag) instruction copies AH into the low byte of EFLAGS.forexample we can retrieve the values of earlier variable:

mov ah,saveflags
sahf  ; copying into flag register

XCHG instruction (exchange data)

The XCHG exchanges the content of two operands.Rules of XCHG are same as MOV.The XCHG exchanges the values of two operands.At least one must be a register.

.data
var1 WORD 1000h
var2 WORD 2000h
xchg ax,bx
xchg var1,bx

To exchange two memory operands use a register as temporary container and combine MOV with XCHG:

mov ax,val1
xchg ax,val2
mov val1,ax

Direct-Offset Operands

We can add displacement to the name of a variable creating a direct-offset operand.This lets you access the locations which may not be available explicitly.

arrayB BYTE 10h,20h,30h,40h,50h ; creating an array of size BYTE
mov al,arrayB ; now the first byte in the array will be automatically moved             ;;                to al i.e. al=10hmov al,[arrayB+1] ; adding a direct offset to access next value in the array
i.e AL= 20h
mov al,[arrayB+2] ; AL=30h

A program that rearranges the values of three double word values in an array

.data
arrayD DWORD 1,2,3
mox eax,arrayD
xchg eax,[arrayD+4] ; exchanging the values
xchg eax,[arrayD+8] ; exchanging
mov arrayD,ax

Addition and subtraction

INC and DEC instructions

The INC(increment) and DEC(decrement) instructions are use to add or subtract from a register or a memory operand.The syntax is INC reg/mem ; one is added DEC reg/mem ; one is subtracted

Some examples are

.data
myWord WORD 100h
.code
    inc myWord  ;myWord = 1001h
    mov bx,myWord 
    dec bx      ; BX=1000h

Add instruction

ADD simply adds to a operand or register. ADD dest,source Here is an example

.data
var1 DWORD 10000h
var2 DWORD 20000h
.code
    mov eax,var1        ; EAX=10000h
    add eax,var2        ; EAX=30000h

SUB instruction

The SUB instruction similar works similar to add. SUB des,source

Here is a short example

.data
var1 DWORD 30000h
var2 DWORD 10000h
mov eax,var1        ; EAX=30000h
sub eax,var2        ; EAX=10000h

NEG instruction (negate)

The NEG (negate) instruction reverses the sign of the number by converting the number to its two’s complement. The following are permitted : NEG reg or NEG mem Example :

.data
 valB BYTE -1
 valW WORD +32767
 .code
     mov al,valB        ; AL = -1
    neg al            ; AL = +1
    neg valW        ; valW = -32767

Zero Flag (ZF)

The zero flag is set when the result of the operation produces zero in the destination operand.

mov cx,1
sub cx,1  ; as cx=0 the ZF becomes 1

Sign Flag

The Sign flag is set when destination operand is negative.The flag is clear when the destination is positive.

mov cx,0
sub cx,1    ; CX = -1 so SF is set or SF=1
add cx,2    ; CX = 1 , SF =0

Carry flag

The carry flag is set when the result of an operation generates an unsigned value that is out of range (to big or small for destination operand).

mov al,0FFh
add al,1    ; CF = 1 , AL =00;; Trying to go below zeromov al,0
sub al,1    ; CF =1 , AL = FF

Overflow flag (OF)

The Overflow flag is set when the signed result of an operation is invalid or out of range.

mov al,+127
add al,1        ; OF = 1 , AL = 128
;; Example 2
mov al,7Fh        ; OF = 1 , AL = 80h
add al,1

Remember Overflow flag is only set when

Two positive operands are added and their sum is negative
Two negative operands are added and their sum is positive.

Data Related Operations and Directives

OFFSET operator

OFFSET operator returns the data in bytes of a label from the beginning to its closing segment. Protected mode is 32 bits. Real mode is 16 bits.

Protected mode can write programs using only a single segment(flag memory model).

For example

.data
bval BYTE    ; BYTE is of 8 bits or 1 Byte
wval WORD    ; WORD is of 16 bits or 2 Bytes
dval DWORD    ; DWORD is of 32 bits or 4 Bytes
dval2 DWORD ; //Now suppose that segment begins at 00404000h, so OFFSET will increment its value with respect to the data type being stored in it.mov esi,OFFSET bval        ; ESI = 00404000
mov esi,OFFSET wval        ; ESI = 00404001 as 1 byte added from earlier
mov esi,OFFSET dval        ; ESI = 00404003 as 2 bytes are added from earlier
mov esi,OFFSET dval2    ; ESI = 00404007 as 4 bytes are added from earlier

PTR operator

It overrides the default type of a label (variable) and provides flexibility to access part of a variable.

.data 
myDouble DWORD 12345678h
.code 
    mov ax,WORD PTR myDouble    ; loads 5678h as Word is of 8 bits
    mov WORD PTR myDouble,4321h ; saves 4321h

Type operator

The TYPE operator returns the size of a single element of a data declaration.

.data
var1 BYTE ?
var2 WORD ?
var3 DWORD ?
var4 QWORD ?
.code
    mov eax,TYPE var1    ; 1 Byte
    mov eax,TYPE var2    ; 2 Byte
    mov eax,TYPE var3    ; 4 Bytes
    mov eax,TYPE var4    ; 8 Bytes

LENGTHOF Operator

The LENGTH of operator counts the number of elements in a single data declaration.

.data
byte1 BYTE 10,20,30   ; LENGTHOF will be 3

SIZEOF Operator

The SIZEOF operator returns a value that is equivalent to multiplying LENGTHOF by TYPE.

.data
byte1 BYTE 10,20,30  ; SIZEOF is 3 as TYPE x LENGTHOF = 1 x 3

LABEL Directive

Assigns an alternate label name and type of an existing storage location.
LABEL does not allocate any storage of its own
Removes the need for PTR operator.

.data
dwlist LABEL DWORD
wordlist LABEL DWORD
intlist BYTE 00h,10h,00h,20h
.code
    mov eax,dwlist    ;20001000h
    mov cx,wordlist ; 1000h
    mov dl,intlist ; 00h

Basic Program Structure/Directives

Following is a basic program structure:

.model ; total memory program would take
.stack 100h ;specifies storage stack(100 bytes hexadecimal in this case)
.data ; variables are defined below this directive
.code ; specifying that code is about to start
Main proc ; main procedure or function
Main endp ; end of main procedure or function
End main ; end of program

Declaration of String

.Data
str db "This is a string",'$'  ; directive db defines a byte size variable
; dw defines a word sized variable (16 bits) 
; dd defines a double word(32 bits) variable.
;$ is a string terminator Displaying offset
; Offset tells assembler to load data, starting from specific address
; Display of string :
.code
mov edx,offset str ; mov the address of string to edx
call writestring

Stacks

Stacks are used for

Temporarily saving register values
Local variables are stored in stack
Function parameters and return addresses

Registers

EAX, EBX, EDX : These are all generic registers and can be used for any integer, boolean ,logical or memory operation. ECX : Used as counter for repetitive instructions ESI / EDI : Generic frequently used as source / destination pointers in the instructions that copy memory. (SI stands for source index, DI stands for destination index) EBP : Can be used as generic register, but is mostly used as stack base pointer. ESP : This is a CPU stack pointer.

Function Calls

Function calls are implemented using two basic instructions in assembly language. The CALL instruction calls a function, and the RET instruction returns to the caller. The CALL instruction pushes the current instruction pointer onto the stack (so that it is later possible to return to the caller) and jumps to the specified address. The function’s address can be specified just like any other operand, as an immediate, register, or memory address. The following is the general layout of the CALL instruction. call {function address}

CMP

This instruction is CMP which compares the two operands specified. cmp ebx,0xf020 jnz 10025009

The first instruction is CMP, which compares the two operands specified. In this case CMP is comparing the current value of register EBX with a constant: 0xf020 (the “0x” prefix indicates a hexadecimal number), or 61,472 in decimal. As you already know, CMP is going to set certain flags to reflect the out- come of the comparison. The instruction that follows is JNZ. JNZ is a version of the Jcc (conditional branch) group of instructions described earlier. The specific version used here will branch if the zero flag (ZF) is not set, which is why the instruction is called JNZ (jump if not zero). Essentially what this means is that the instruction will jump to the specified code address if the operands com- pared earlier by CMP are not equal. That is why JNZ is also called JNE (jump if not equal). JNE and JNZ are two different mnemonics for the same instruction — they actually share the same opcode in the machine language.

mov edi,[ecx+0x5b0] 
mov ebx,[ecx+0x5b4] 
imul edi,ebx

This sequence starts with an MOV instruction that reads an address from memory into register EDI. The brackets indicate that this is a memory access, and the specific address to be read is specified inside the brackets. In this case, MOV will take the value of ECX, add 0x5b0 (1456 in decimal), and use the result as a memory address. The instruction will read 4 bytes from that address and write them into EDI. You know that 4 bytes are going to be read because of the register specified as the destination operand. If the instruction were to reference DI instead of EDI, you would know that only 2 bytes were going to be read. EDI is a full 32-bit register (see Figure 2.3 for an illustration of IA-32 reg- isters and their sizes).

The following instruction reads another memory address, this time from ECX plus 0x5b4 into register EBX. You can easily deduce that ECX points to some kind of data structure. 0x5b0 and 0x5b4 are offsets to some members within that data structure. If this were a real program, you would probably want to try and figure out more information regarding this data structure that is pointed to by ECX. You might do that by tracing back in the code to see where ECX is loaded with its current value.

The final instruction in this sequence is an IMUL (signed multiply) instruc- tion. IMUL has several different forms, but when specified with two operands as it is here, it means that the first operand is multiplied by the second, and that the result is written into the first operand. This means that the value of EDI will be multiplied by the value of EBX and that the result will be written back into EDI.

push eax  
push edi  
push ebx  
push esi  
push dword ptr [esp+0x24] call 0x10026eeb

This sequence pushes five values into the stack using the PUSH instruction. The first four values being pushed are all taken from registers. The fifth and final value is taken from a memory address at ESP plus 0x24. In most cases, this would be a stack address (ESP is the stack pointer), which would indicate that this address is either a parameter that was passed to the current function or a local variable. To accurately determine what this address represents, you would need to look at the entire function and examine how it uses the stack.

Global Variables

Consider, for example, the sequence that accesses a global variable.

MOV EAX, DWORD PTR [pGlobalVariable]

The preceding instruction is a typical global variable access. The storage for such a global variable is stored inside the executable image (because many variables have a pre initialised value).