Selecting Instructions to Implement

Read time: 32 minutes (8172 words)

We have focused a lot of attention to simple movement of data from one place into another in our machine. This was done on purpose. We need to get used to the idea that we cannot reach in and make things happen in this machine. We need to set up simple components, connect them with wires, and somehow control the actions in the machine that end up with results we humans can interpret using our programming languages.

Let’s look at a few really simple programs and see what instructions are used:

Moving Data

Something as simple as this code will show how we move data:

testmov.c
1
2
3
4
5
6
7
int x=5;
int y;

int main( void ) {
    y=x;
}

Note

I used global data here on purpose. We will learn about function local data later.

Running this program will not produce anything interesting, but we can see what assembly code the compiler produces from this using a nice parameter available on all gcc based compilers.

$ avr-gcc -S testmov.c

The compiler is being directed to produce an assembly language output file, normally named with the same base name as the source file, with a .s extension.

Note

The -S option is available on all gcc based compilers, and is a nice way to look at assembly language for a particular chip. I use this to study code produced from C/C++ programs running on Pentium, AVR, and even ARM processors.

Here is the testmov.s file produced by that command:

testmov.s
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
	.file	"testmov.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
.global	x
	.data
	.type	x, @object
	.size	x, 2
x:
	.word	5
	.comm	y,2,1
	.text
.global	main
	.type	main, @function
main:
	push r28
	push r29
	in r28,__SP_L__
	in r29,__SP_H__
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
	lds r24,x
	lds r25,x+1
	sts y+1,r25
	sts y,r24
	ldi r24,0
	ldi r25,0
/* epilogue start */
	pop r29
	pop r28
	ret
	.size	main, .-main
	.ident	"GCC: (GNU) 7.2.0"
.global __do_copy_data
.global __do_clear_bss

This is assembly language for the AVR chip, but the code you see was not something a human would write. It was produced by the compiler and intended to immediately be processed into a form closer to the machine, using the assembler that is part of the gcc tool chain. Much of what you see here is not relevant to what we want to study.

However, we can find a few important parts.

Defining Data

Lines 7-13 are where the global data is setup. One of these is uninitialized and one is initialized with the value 5. If you look closely, you discover that simple integers in this chip are 16-bit items. That means two bytes, and we will see that in action later. Notice line 9 which has .data on it. That is a “directive” telling the compiler that we are setting up initialized data here. The variable x is used as a label, and a .word (16-bit) value of 5 is recorded at that adxress. We will ignore how y was set up in this example, since we will not do things this way in out code.

Note

We will set up our data in a much more human friendly way, so do not worry about this notation (remember, only the compiler loves this code!)

Data Movement

Lines 26-29 are where that data actually moved. Since we are moving 16 bits in an 8-bit machine, we do things in two steps. The compiler chose to load the current value stored in X (and X+1) into two internal registers: R24 and R25. The LDS instruction is being used to “load from storage”. Next the data stored in those two registers is moved back into memory using the STS instructions (“store to storage”). In all of these lines the variable names X and Y are being used. Internally, those names refer to addresses in memory where the variables will live as the program runs. At this point the names are convenient aliases for that final address.

Note

My convention in my lecture notes is to use upper case to show Mnemonics for instructions. In actual code, I use lower case. This just helps when reading the notes.

Coding the IF Structure

Let’s use the same approach to figure out how an If-Statement works:

testif.c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
unsigned char x=5,y=0,z;

int main( void ) {
    if(x == y) {
        z=1;
    } else {
        x=2;
    }
}


Here is the compiler’s code:

testif.s
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
	.file	"testif.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
.global	x
	.data
	.type	x, @object
	.size	x, 1
x:
	.byte	5
.global	y
	.section .bss
	.type	y, @object
	.size	y, 1
y:
	.zero	1
	.comm	z,1,1
	.text
.global	main
	.type	main, @function
main:
	push r28
	push r29
	in r28,__SP_L__
	in r29,__SP_H__
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
	lds r25,x
	lds r24,y
	cp r25,r24
	brne .L2
	ldi r24,lo8(1)
	sts z,r24
	rjmp .L3
.L2:
	ldi r24,lo8(2)
	sts x,r24
.L3:
	ldi r24,0
	ldi r25,0
/* epilogue start */
	pop r29
	pop r28
	ret
	.size	main, .-main
	.ident	"GCC: (GNU) 7.2.0"
.global __do_copy_data
.global __do_clear_bss

This time the compiler generated another useful line in setting up our data.

Line 14 is a common directive that sets up a block of “uninitialized data”. This kind of data is separated from initialized data so the object file produced from this code can hold simple data saying how big a chunk of uninitialized data is needed. Initialized data has to be included in the object file, so the memory can be properly set up as the code gets loaded when it is run.

Note

Why is it called .bss? That is a historical convention you can look up. I have my own theory, which is not found in the literature!

Comparing Two Values

The IF statement begins by performing a comparison. The two data items are loaded into registers in lines 32 and 33. Note that this time, the data is simple 8-bit data which fits in a single register.

The actual compare is done with the CP instruction. This instruction performs a subtraction, but only sets status flags. It does not store the result of the subtraction.

Branching

Processors always support at least two styles of branching instructions. One is unconditional, and we just move to a new address for our next “fetch”. The other branching instruction uses status flag values to decide if we branch, or simply fall through to the next instruction in sequence. This example code shows both forms of branching.

As we saw when doing the C Program Tear-Down, we check the result of our comparison, and conditionally branch around what will be the block of code to be processed if the comparison resulted in a True value. The AVR uses the BRNE (branch if not equal” to jump over the “true” block if the result was False. This send the processor to the start of the “false block” at label .L2.

At the end of the “true block”, we see an unconditional branch to a label at the end of the entire statement (.L3). We add this branch so we do not fall into the “false block” after processing the “true block”

The code to process if the condition evaluates to “true” is lines 36-37. There is some weird notation here. What is that lo8(1) stuff about? The compiler treats all literal numbers as 16-bit values, and we only want the “low 8” bits from that number to store in the variable named Z. That is what the magical assembly notation the compiler generated so we can do this. We will not be using that kind of notation in our simple assembly language.

DO we have enough instructions to build a loop? Let’s see!

Coding a While Loop

Here is our next example code:

testwhile.c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
unsigned char x = 0;
unsigned char y = 0;
unsigned char z = 0;

int main( void ) {
    while( x == y) {
        z++;
    }
}

This will run for a while, and I bet we overflow z pretty quickly. What do you think will happen when that occurs?

And here is what the compiler produced:

testwhile.s
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
	.file	"testwhile.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
.global	x
	.section .bss
	.type	x, @object
	.size	x, 1
x:
	.zero	1
.global	y
	.type	y, @object
	.size	y, 1
y:
	.zero	1
.global	z
	.type	z, @object
	.size	z, 1
z:
	.zero	1
	.text
.global	main
	.type	main, @function
main:
	push r28
	push r29
	in r28,__SP_L__
	in r29,__SP_H__
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
	rjmp .L2
.L3:
	lds r24,z
	subi r24,lo8(-(1))
	sts z,r24
.L2:
	lds r25,x
	lds r24,y
	cp r25,r24
	breq .L3
	ldi r24,0
	ldi r25,0
/* epilogue start */
	pop r29
	pop r28
	ret
	.size	main, .-main
	.ident	"GCC: (GNU) 7.2.0"
.global __do_clear_bss

What the compiler produced is interesting, but not something a human would have written. The generated code works, but looks odd. The loop starts at line 35, with a branch to line 40. There we see the same setup for the CP instruction we saw before. However, this test is being done to see if we loop or not. If we are to loop (“true”), we need to find the loop body. That starts at line 36 (.L3). As arranged here, after we process that body, we will fall into the comparison code again and see if we keep looping.

I suspect you can convince yourself that this is working correctly, but you would not have set things up this way. The compiler’s job is to produce “correct” code, not “friendly” code!

The only really new instruction we see in this example is the BREQ (“branch if equal” instruction. That instruction again checks the Z flag to see if it should branch.

What About the Increment?

I skipped the loop body, where I simple wanted to increment a simple variable. This is definitely weird! The compiler decided to implement that operation using a strange approach!

On line 37, it loads the current value stored in Z (in memory) into the R24 register. It then subtracts a -1 from that register using the SUBI instruction!

What?

Once again, this is technically correct, but very human unfriendly. I cannot explain why it decided to do that, when we have a perfectly acceptable alternative instruction, INC, available. Go ask the compiler designers why they did this!

We will implement the INC` instruction instead, and we might as well implement the DEC instruction at the same time!

ALU Instructions

You explored these in your homework assignment.

Instruction Summary

In summary, we need these instructions (plus the ALU instructions we decide to add) to build simple programs for our simulator:

Mnemonic Operands|Operation FLAGS Cycles  
LDS Rd,K Rd <- [K] None 2
STS K,Rr [K] <- Rr None 2
CP Rd,Rr Rd-Rr Z, N,V,C,H 1
RJMP K PC <- PC + K + 1 None 2
BRNE K if (Z = 0) then PC <- PC + k + 1 None 1/2
BREQ K if(Z = 1) then PC <- PC + K + 1 None 1/2
INC Rd Rd <= Rd + 1 Z,N,V 1
DEC Rd Rd <= Rd - 1 Z,N,V 1

Note

Conditional branches take one clock cycle if the PC is not modified (meaning we do not branch) and two clock cycles if it is modified (and we branch).

This set of instructions is sufficient for the construction of our simple simulator. We will derive our system based on this set. If needed, we may add additional instructions later.

..note:

I started using this technique years ago when I first started working with
these small systems. That was back when the tools were very new. This has
been how I learned how to craft pure assembly language programs for all of
the systems I own. I do not give up on "structured programming", just
because I am working with assembly language. Good coding techniques keep
you out of trouble. Assembly language lets you get into all the trouble
you can stand, if you are not careful. It makes no sense to make things more
difficult on ourselves by introducing weird code no human can follow.
Thanks to Dijkstra, we gave up that kind of programming decades ago!