Modeling Our Simulator Components

Read time: 167 minutes (41804 words)

Each of the actual components we will need to model in our system shares the basic structure we have used in the Component design. What makes each one unique is the behavior of each component. We need to provide a way to create each needed component part, while retaining the interconnection strategy we have been using up to now.

The answer is to use C++ inheritance to create each part.

C++ Inheritance

If we need all of the features of one class, but we need ot change some aspect of how things work by adding new attributes, or by providing methods that operate differently, we can “derive” a new class from an existing one. We call the existing class the “parent” class, or the “base” class, and the new class the “child” or “derived” class.

The child class can add attributes and methods, but it cannot remove any attributes or methods from the base class. The child class can choose to alter how those methods operate by “overriding” them with new implementations. It is even possible to create a new method that uses the old one to do some of the needed work, then completes actions needed.

Our needs will be pretty simple. We will not add any new attributes, but we will provide a new implementation of the existing tick method. That is where all the action occurs!

Here is the basic scheme:

class Component {
    ...
};

class PC : public Component {
    ...
};

Actually, we have control over which attributes methods in the base class the derived class can access. Here is a guide that tries to explain how this works:

Class Access Control

C++ has three kinds of access specifiers:

  • public
  • private
  • protected.

Here is a simple diagram showing how derived classes can use data from the base class:

Access public protected private
members of the same class yes yes yes
members of derived class yes yes no
not members yes no no

This table says that you can create methods in the base class that are protected, and derived class methods can call those methods to get at the private data in the base class. Clear! (Yeah, right!)

Since we will want our derived components to access all of the connection attributes and methods, we will need to provide access to them. So, we need to change the protection we are using a bit.

We will do that as we build the individual parts, and we will introduce the specialized component as we work through the four steps of processing.

Component Models

Here are the basic components we will use to build out machine:

ALU parts

These are used for incrementing and for math and general boolean functions:

../_images/ALUcomponent.png

Multiplexors

We will need several multiplexors in the system. This one selects from one of two signals to pass through to the output. We will need a four-input MUX as well.

../_images/MUX2component.png

Registers

There are simple registers in the system. They hold single values.

../_images/REG2component.png

Memory Components:

We have several kinds of memory modules in the system. The instruction memory is normally read-only, but we will need to be able to load it from a data file. The general data memory supports notmal read/write operations. We may add a separate stack memory to simplify the logic needed for the stack. Finally we will need a dual-port register memory to model the internal registers. This last memory supports dual reads on a single operation.

../_images/MEM2component.png

Each stage is separated by a multi function “barrier” that marks the end of a single step in the four-step sequence.

../_images/BARcomponent.png

re a few additional parts needed, but these are enough for our basic machine as a start.

Simulator State Machine

The classic four-step loop processed by the machine can be expressed as a state machine, a common way to define how machines work.

The machine is in some “state” at any given moment in time. Some event will happen while the machine is in that state that causes the machine to transition to another state. There can be more than one event recognized in each statThe machine, causing the machine to transition ot several other possible next states.

The machine remains in that new state until yet another event occurs that causes another transition.

Four Steps

Our machine has several states, four of which we have already discussed. There are a couple of additoonal states to consider.

Here is our state machine:

../_images/state.png

Reset Logic

When the machine first powers up, the internal digital circuits come to life in some unknown condition. It is common to add circuitry that sets a special signal to the machine to get it going. In our simple machine, all we need to do is “reset” the program counter to a predefined address, so the first fetch can start looking for an instruction at the right place in the instruction memory.

The attiny85 begins looking at address zero. The Pentium, on the other hnd, begins looking for an instruction in the BIOS chip, which lives at a very different address.

We will call this state RS, the machine has been “reset”

Fetch Logic

The “reset” signal causes the program counter to be properly set. That address is held in the PC register. Registers cause the work of the machine to stop until the next clock tick comes along. At that point, the register copies its input signal to its output port, and the next stage will begin work.

When that next clock tick comes along, we transition to the fetch state (IF). Here the goal is to retrieve the next instruction to process. That process involves passing the address on to the instruction memory. In many machines, the actual fetch will be from slow memory, and that process can take many clock ticks to complete.

Note

That problem led to the invention of the caching system, designed to speed up this delay.

In the atiny85 chip, the internal instruction memory is fast enough we can ignore that delay, and treat instruction memory as a combinational device. It takes an address in, and dellivers the 16-bit instruction out. That instruction lands in the instruction register (IF/ID) where it waits for the next clock tick.

For simplicity, we will add a bit of logic to this step to calculate the next address for a normal sequential instruction flow, but we will allow for that next address to be overridden by some other address, calculated later in the four-step process.

Decode Logic

With the instruction in hand, we are ready to decode the instruction. This step is critical to later stages. We must validate the instruction, and signal if an illegal instruction is to be processed. We must break down the instruction, and route the components of that instruction along different “wires” to later stages. Finally, we must configure the control unit so that the various parts of the instruction move along to the right places. This is where the machine gets a bit messy, since there are different pathways to follow for each instruction in the machine!

To figure all of this out, we need to analyze each instruction in detail.

SInce many instruction reference the internal register s, we will pass the decoded register addresses on to the register memory in this phase of processing.

Register memory is designed so two simultaneous registers can be “read” at the same time. However, only one register can be written at any givem time. The teo data items read are sent along to the next step for processing there.

THe decoded instruction, together with the register data, all land in a decode register (ID/IE) where it waits for further processing.

Execute Logic

The execute step is where the ALU action occurs. The ALU in our simple machine is driven by an opcode derived from the instruction, and register data also obtained during the decode step. Some instructions may contain literal data that provided data for processing, so we need to select between a register and this immediate value if needed.

The ALU is another combinational device, so it does its work and delivers the result to the next step. The data ands in another holding register (IE/WB)

Write-Back Logic

The last step in normal processing send the final data wherever it needs to go. Some data will be routed back to the registers, some may head back as a new address to be used when “branching”. Other data will end up in the data memory.

When this step completes, we transition back to the IF state and continue work.

Halting

In this simple system, the machine never halts. It will run until power is removed. Some machines accept a halt instruction, which may well bring processing to a halt. Internally, the clock could still be running, but essentially, nothing is going on.

Interrupts

A very important change in this classic four-step process can occur after we complete work in the WB state.

Most machines will examine a set of possible signals, looking for a trigger that is to cause processing to change. The best way to view this event is that an asynchronous subroutine will be called in response to this signal. What happens next depends totally on what that special subroutine needs to do.

Since the machine may go off an do totally new processing, we need to protect the internal state of the machine from harm, otherwise the normal program will break.

Saving the current state must occur before we hand the machine to that special subroutine, which we will call am interrupt handler. Once hat handler finishes work, we restore the internal state of the machine to the condition is was before, and then transition ot eh IF* state for further processing.

We will examine interrupts in more detail in a late lecture.

This discussion sets up the actual design of our machine. In the next few steps, we will examine each “state” and see what real parts we need tor each one to do the required work.

Before leaving this discussion, lets look at some example C++ code we can use to control a state machine:

C++ State Machine Code

Normally, we will name each state using a C++ enumeration. This construct in C/C++ lets you set up a collection of names and associate each name with a simple integer number. The numbers are not important, but the names are. We can use those names later in our code, as we shall see.

Here is an example of state machine code we can use in our simulator:

state-machine.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <iostream>

int main(void) {
    std::cout << "CPU State Machine" << std::endl;

    enum State {
        RESET,
        FETCH,
        DECODE,
        EXECUTE,
        WRITEBACK,
        INTERRUPT
    };

    State state = RESET;
    int cycle = 0;
    while(true) {
        switch(state) {
            case RESET:
                std::cout << "\tRESET" << std::endl;
                state = FETCH;
                break;
            case FETCH:
                std::cout << "cycle = " << cycle << std::endl;
                std::cout << "\tFETCH" << std::endl;
                state = DECODE;
                break;
            case DECODE:
                std::cout << "\tDECODE" << std::endl;
                state = EXECUTE;
                break;
            case EXECUTE:
                std::cout << "\tEXECUTE" << std::endl;
                state = WRITEBACK;
                break;
            case WRITEBACK:
                std::cout << "\tWRITEBACK" << std::endl;
                state = FETCH;
                if(cycle == 2) state = INTERRUPT;
                cycle++;
                break;
            case INTERRUPT:
                std::cout << "\tInterrupt hhandler" << std::endl;
                state = FETCH;
                break;
            default:
                std::cout << "\tIllegal state - resetting" << std::endl;
                state = RESET;
        }
        if(cycle == 4) break;
    }
    std::cout << "Machine halted" << std::endl;
}

Here is the output from this code:

g++ -o test state-machine.cpp
rblack@MacTeach ~/_acc/cosc2325/attiny85sim/code/states ->
 07:41 AM Mon Apr 09$ ./test
CPU State Machine
        RESET
cycle = 0
        FETCH
        DECODE
        EXECUTE
        WRITEBACK
cycle = 1
        FETCH
        DECODE
        EXECUTE
        WRITEBACK
cycle = 2
        FETCH
        DECODE
        EXECUTE
        WRITEBACK
        Interrupt hhandler
cycle = 3
        FETCH
        DECODE
        EXECUTE
        WRITEBACK
Machine halted

Step 6: The PC Register

Our first real component is the register we will use as the PC. This register has an input, an output, and a reset signal. We need to consider how we will build this part, using the basic Component class we have developed.

Here is the header file for the Component class, augmented a bit for this step:

Component.h
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#pragma once
#include "Pin.h"
#include <string>
#include <map>

class Component {
    public:
        // constructor
        Component();    // default

        Component(std::string n);

        // destructor
        ~Component();

        // input pins
        std::map<std::string, uint8_t> in_pins;

        // output pins
        std::map<std::string, uint8_t> out_pins;

        // mutators
        void add_in_pin(std::string name);
        void add_out_pin(std::string name);
        void tick(void);

        // accessors
        void dump_pins(void);
        std::string get_name(void);

    private:
        std::string name;
};

Notice that I have added a default constructor. The compiler complained when I created a derived class without this. The default constructor does nothing here, since we are really only using the constructor which takes a component name as a parameter.

The Register Class

We need a new class for a Register, and we will make is a general class we can use elsewhere. Here is the header we need:

Register.h
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#pragma once
#include "Component.h"
#include <string>

class Register : public Component {
    public:
        Register(std::string name);
        ~Register();
        void tick(void);
        void reset(void);
};

In this specification, we see a constructor that will be used to name the register, just as we have been doing with the Component class. The name parameter will need to be passed along to the base class constructor.

The Register class is derived from the Component class, and really only overrides the tick method in that class. This is where we will create the specific behaviour of a new part for the simulator.

At present, I am not setting up any input or output ports here. That will be done when we put the machine together.

The implementation of this class is pretty simple for now. All we need to do is copy the input register (and there should only be one of those) to the output register (again only one). I am not adding code to check that we have the proper number of registers, so be careful.

Here is the implementation file:

Register.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#include "Register.h"
#include "Component.h"

Register::Register(std::string n) : Component(n) {
}

Register::~Register() {
}

void Register::tick(void) {
    out_pins[0]->set_val(in_pins[0]->get_val() + 1);
}

void Register::reset(void) {
    in_pins[0]->set_val(0);
}

The constructor with the name string passes the incoming name string on the the COmponent class constructor. It does no additional work, but it could.

The tick method copied the input value on in_pin[0] to the output pin out_pins[0]. For now, it does not check that these are available, which is a bad design, but sufficient for this test code.

Notice that I incremented the value copied in this code. That is just so I can see that something happened in the output this test will generate.

I made one additional change in the Wire class code. I want to see the vallue moved along the wire when it does its work. Here is the modified code:

Wire.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include "Wire.h"
#include "Pin.h"
#include <cstddef>
#include <iostream>

Wire::Wire() {
    driven = nullptr;
}

Wire::~Wire() {
}

void Wire::attach_driven(Pin * pin) {
    driven = pin;
}

void Wire::attach_drives(Pin * pin) {
    drives.push_back(pin);
}

void Wire::tock(void) {
    val = driven->get_val();
    for(int i=0; i < drives.size(); i++) {
        drives[i]->set_val(val);
        std::cout << drives[i]->get_name()
            << " <-(" << val << ")- " << driven->get_name()
            << std::endl;;
    }
}

Finally, here is the new driver code that will exercise this new setup:

main.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <iostream>
#include <iomanip>
#include "argParse.h"
#include "Register.h"
#include "Wire.h"
#include "Machine.h"


int main(int argc, char *argv[]) {
    std::cout << "attiny85sim (v0.1)" << std::endl;

    // crete basic machine - from hdl file
    Machine cpu;

    int exit_code;
    int argcode = argParse(&cpu, argc, argv);
    if(argcode != OK) {
        argError(argcode);
    } else {
        std::cout << "\trunning..." << std::endl;
        cpu.boot();
        exit_code = cpu.run();
        if(exit_code == 0) {
            std::cout
                << "Unexpected error"
                << std::endl;
        } else if(exit_code < 0) {
            std::cout
                << "\tprocessor fault"
                << std::endl;
        }
        std::cout
            << "\tProcessing finished..."
            << std::endl;
    }
    return 0;
}

Running this example produced this output:

$ make run
./attiny85sim
attiny85sim (v0.1)
MADDR <-(1)- PCO
Pin state:
    PC.PCI: 0
    PC.PCO: 1
Pin state:
    IM.MADDR: 1
    IM.IMO: 2

This is a good start on deriving a new part from the basic component class.

Note

I chased an ugly bug in writing this code. I was getting an error about trying to free an object that was not allocated. This error was coming from a destructor somewhere, but no clue was offered about which object was broken, other than the address of the offending object. To find this bug, I needed to identify the object being deleted, which I did by adding output code to the destructor in each class that showed the address of the object being deleted. In the end, I discovered that I was using incorrect code to copy data across the register and zapped the pointer that was being used to point to a pin. Fixing the problem was easy after finding the offending object. That was fun (not!)

Step 1: Fetch

The focus in this stage is on the program counter and the instruction memory. In the attiny85, instruction memory is 16-bits wide, and we have 8 kilobytes (4 kilowords) of instruction memory. The program counter needs to generate an address big enough to span this entire memory area, which means it needs to be at least 12 bits big. The AVR family uses 16-bit registers, and some AVR chips have much more memory. The Arduino for example, has 32 kilobytes of memory available. You will be surprised in what you can do with just 4096 instructions in the machine.

Note

You Windows programmers probably have never constructed a real program this small, because actually interfacing with the OS takes a lot of code. Oh well, that is why we have gigabytes of RAM these days!

Lets start off with a simple register which will be our Program Counter. PC for short:

Note

That small triangle at the bottom of this component indicates that this is a “clocked” component, one that does its work only when the clock “tick” signal is received. This is a “sequential” component. Any component not showing this triangle is a “combinational” component, one that simply does its internal work when input signals change.

The program counter is a simple register, meaning that when the clock tick comes along, what is on the input side (next addr) is copied, untouched, to the output side (next addr). We already know that the output from the PC register passes on to the instruction memory unit to retrieve the next instruction to process.

We need to consider what comes in on that input side. Obviously, that will be the next address to be fetched. For most instructions in our attiny85 chip, that instruction is the next “word” in the instruction memory. Since we will address out instruction memory in “word” chinks (not bytes) that means we need to increment the current address and put it back into the next address input. How we do this depends on how we want to design the machine.

Building an increment unit, which is a very simple ALU that only knows one trick, is so easy, it is common to put such a device near to the PC so that next address calculation can happen quickly, even on the same clock tick.

So, we could build something like this:

This arrangement is fine if all instructions are exactly the same size. However, we know that some instructions are two words in size, so this will not work. Even worse, to allow for branches, we need to allow the PC to be updated from some incoming source, probably determined further down the processing chain.

Let’s introduce an incoming address and allow a control signal to determine if we simply increment, or use the new value. This is a job for a multiplexor, controlled by a control signal.

The addition of a “branch address” input requires a control signal that decides if we use the “next address” calculated by the increment component, or this new address signal coming from elsewhere in the system. The “multiplexor” will select the correct “next address” source based on the control signal “branch”.

To complete this setup, we need to add in the “instruction memory” unit. Since this is a read only unit (at this point), it can be treated like a combinational unit, meaning that we feed in an address and it immediately (well close) delivers a new instruction.

A high level view of this entire setup looks like this:

Note

We will not actually build this unit this way, since we need to work through how to construct the machine out of very simple parts.

This is a fetch unit we can work with. We can show this setup as a verilog module that looks like this:

fetch unit
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
module fetch(clk, baddr, inst, reset);
    output reg [15:0] inst = 0;
    input wire signed [15:0] naddr;
    input wire clk, reset;

    always @(posedge clk) begin
        if (reset) begin
            $display("reset")
            out <= 0;
        end else begin
            $display("pc = %d", naddr);
            out <= naddr;
        end
    end
endmodule

Notice there is one more signal in this code, not shown in the diagram. A reset line is typically added to allow the PC to be set to zero, which for the AVR family, is where all code starts.

Building the Fetch Unit

We have enough of a definition for this stage to build the fetch unit. Your next lab project will begin building this unit by setting up a program counter register.

Step 2: Decode

The Instruction Register holds the first part of any instruction. That part is 16-bits wide and contains all the bits needed to uniquely identify each instruction

The actual encoding of these bits is pretty complex, and is driven by a number of factors involved in laying the chips out for manufacturing. We will not worry about that detail here, but we do need to look at the encoding forms Atmel came up with for this chip.

Note

Although we do not really need to worry about how the real chip is encoded, there is no real reason not to use the scheme the manufacturer came up with. That will let us use the AVR assembler, with a limited instruction set, to program our simulator.

In our work, only a few instructions actually need a second 16-bits to provide the complete instruction. Unfortunately, detecting that second chunk is not as simple as using one bit in the instruction.

Instruction Encoding

The AVR Supports a number of addressing “modes” which group instructions according to how they use SRAM or registers. Here are the modes supported for our attiny85 chip:

  • Direct register addressing, single register
  • Direct register addressing, single register with bit Address Operand
  • Direct register addressing, two 8-bit registers
  • Direct register addressing, Two 16-bit registers
  • Direct I/O addressing (including SREG)
  • Direct SRAM addressing
  • Immediate 8-bit source with register destination (R15 < Rd)
  • Immediate 6-bit constant, 16-bit register (r24:r25,X,Y,Z)
  • Immediate, 4 bit constant
  • Indirect SRAM with displacement
  • Indirect SRAM with pre and post increment
  • Indirect program memory addressing
  • Control transfer direct addressing
  • Control transfer indirect addressing
  • Control transfer relative addressing
  • MCU controls

That is a lot of different ways you can access memory, registers, or constants. We are certainly not going to deal with all of these. In fact, out of all the available instructions in this chip, we will select only 32 of them (for now). Here is a table showing the ones we will initially use (more might be added if we find a need for them).

Mnemonic OP1 OP2 RTL  
IN Rd PORT Rd <- PORT
OUT PORT Rr PORT <- Rr
         
LDI Rd k Rd <- k
LDS Rd k Rd <= [k]
STS k Rr [k] <- Rr
MOV Rd Rr Rd <- Rr
SBI PORT b PORT[b] <- 1
CBI PORT b PORT[b] <- 0
LSL Rd   Rd[n+1] <- Rd[n],Rd[0] <- 0
         
RCALL k   [STACK] <- PC, PC <- PC + k + 1
RET     PC <= [STACK]
PUSH Rr   [STACK] <- Rr
POP Rd   Rd <- [STACK]
         
RJMP k   PC <= PC + k + 1
BRNE K   if (Z == 1) PC <- PC + k + 1
SBRS Rr b if (Rb[b] == 1) PC <= PC + 2 or 3
SBRC Rr b if(Rr[b] == 0) PC <- PC + 2 or 3
         
NOP    
 
DEC Rd   Rd <- Rd + 1
INC Rd   Rd <- Rd - 1
ADD Rd Rr Rd = Rd + Rr
SUB Rd Rr Rd <- Rd - Rr
         
AND Rd Rr Rd <- Rd AND Rr
OR Rd Rr Rd <- Rd OR Rd
EOR Rd Rd Rd <- Rd XOR Rr
COM Rd   Rd <- OxFF - Rd 1(1)
         
SEI     I <- 1
CLI     I <- 0
RETI     PC <- [STACK]

Instruction Encoding

The actual encoding Atmel came up with is detailed in this ref. Note that not all of the instructions in this reference are actually available in the attiny85

N1 N2 N3 N4 MNEM OP1 OP2 W16
1001 010d dddd 0000 com Rd    
1001 010d dddd 0011 inc Rd    
1001 010d dddd 0110 lsr Rd    
1001 010d dddd 1010 dec Rd    
               
1111 111r rrrr 0bbb sbrs Rr b  
               
0000 11rd dddd rrrr add Rd Rr  
0001 01rd dddd rrrr cp Rd Rr  
0001 10rd dddd rrrr sub Rd Rr  
0010 00rd dddd rrrr and Rd Rr  
0010 10rd dddd rrrr or Rd Rr  
0010 11rd dddd rrrr mov Rd Rr  
0010 01rd dddd rrrr eor Rd Rr  
               
1011 0AAd dddd AAAA in Rd A  
1001 1AAr rrrr AAAA out A Rd  
               
1001 1000 AAAA Abbb cbi A b  
1001 1010 AAAA Abbb sbi A b  
               
1001 0100 0111 1000 sei      
1001 0100 1111 1000 cli      
               
1001 000d dddd 0000 lds Rd k k16
1001 001r rrrr 0000 sts k Rr k16
               
0101 kkkk dddd kkkk subi Rd k  
0110 kkkk dddd kkkk ori Rd k  
0111 kkkk dddd kkkk andi Rd k  
               
1001 0101 0000 1000 ret      
1001 0101 0001 1000 reti      
               
1100 kkkk kkkk kkkk rjmp k    
1101 kkkk kkkk kkkk rcall k    
               
1111 00kk kkkk k001 breq k    
1111 01kk kkkk k001 brne k    
               
               
1001 001r rrrr 1111 push Rr    
1001 000d dddd 1111 pop Rd    
               
0000 0000 0000 0000 nop      

This table was created using Microsoft Excel. Here is that spreadsheet for reference:

Register Movement Instructions

Most ALU instructions, and data movement instructions specify two registers as operands. Here is the encoding for those:

Here, the bits colored blue represent the specific instruction opcode. The green bits select the destination register, and the yellow bits the source register.

Decode Logic

After deciding how instructions are to be encoded, we can set up the machinery needed to decode tan instruction. Here is the addion we need for this step:

../_images/decode.png

This is a bit incomplete. We have not shown how the decoded manages to go back and get the constant K from the instruction memory.

Step 3: Execute

In the next phase of processing we activate the ALU to do the “number crunching”. We should already understand how that unit works, and modeling it is simple. However, we do need to make sure that we understand how data flows through the machine. Let’s take this opportunity to trace one instruction completely through the machine. We will use “Register Transfer Language” to do this.

For this exercise, we will trace the ADD instruction through the machine.

  • ADD R0, R1 ; R0 <- R0 + R1

This instruction is encoded as follows:

  • 0000-11rd-dddd-rrrr

Timing

In the discussion that follows we will indicate clock ticks with a time marker t. After those clocks, data will move over wires during a wire “tock” indicated by a w marker.

Fetch:

During the fetch phase, we transfer the instruction from instruction memory to the F.D barrier:

  • t0: PC.out <- PC.in (next addr)
  • w0: IM.addr <- PC.out, INCR1.in <= PC.out
  • t1: IM.out <- [addr]
  • w1: F/D.in1 <- IM.out

At this point, we have the first 16-bit chink of instruction code available for decoding.

Decode:

The decode phase is responsible for breaking out the instruction parts. In this instruction we have two register numbers to decode:

  • t2: F/D.out1 <- F/D.in1
  • w2: DCD.in1 <- F/D.out1 ; instruction is at decoder block
  • t3: DCD.op1 < R0, DCD.op2 <- R1, DCD.aluop < ADD
  • w3: REG.in1 <- R0, REG.in2 <- R1, D/E.in3 <- ADD
  • t4: REG.out1 <= [R0], REG.out2 <= [R1]
  • W4: D/E.in1 <= [R0], D/E.in2 <- [R1]

Execute:

In this phase, we perform the calculation in the ALU:

  • t5: D/E.out1 <- R0, D/E.out2 <- R1, D/E.out3 <- ADD
  • w5: ALU.op1 <- R0, ALU.op1 <- R1, ALU.op = ADD
  • t6: ALU.res <- R0 + R1, ALU.Z <- zero?
  • w6: E/W.in1 <- res, E/W.in1 <- Z, E/W.in3 <- R0

Write-Back:

Finally, we need to save the result back in a register

  • t7: E/W.out1 <- res, E/W.out2 <- Z, E/W.out3 <- R0
  • w7: REG.in3 <- res, REG.in1 <- R0, REG.wr <- write, CTRL.z <- Z
  • T7: [R0] <- res, CTRL.[z] <- Z

Cycle Ends

At this point a complete instruction cycle has been completed. All calculations have been safely stored somewhere and the machine is “idle”. It is at this point that we can begin another instruction, or interrupt the normal p=instruction flow, and send the machine to another instruction entirely.

Let’s do another one, this time one that bypasses the ALU:

  • LDS Rd, k ; Rd <- [k]
  • 10001 000d dddd 0000 k16

This one is more complicated, since it needs another chink from the instruction memory to do the job. It also involves the data memory component

Fetch

There is not much different in this step:

  • t0: PC.out <- PC.in (next addr)
  • w0: IM.addr <- PC.out, INCR1.in <= PC.out
  • t1: IM.out <- [addr]
  • w1: F/D.in1 <- IM.out

Decode

For this instruction, we do not have two registers identified, but we do have the extra complication of needing another work from the IM.

  • t2: F/D.out1 <- F/D.in1
  • w2: DCD.in1 <- F/D.out1 ; instruction is at decoder block
  • t3: DCD.op1 < R0

There is more to do here. We need the PC so we can go back and get another word. That means we really need to move the PC across that first barrier as well. Add this RTL on the w0 step:

  • w0: F/D.in2 = PC

And on t2 add this:

  • t2: F/D.out2 <- F/D.in2

We need to increment the PC and get another word from the IM:

  • w3: INCR1.in1 <- FD.out2 (PC)
  • t4: INCR1.out <- PC + 1
  • w5: IM.addr <- INCR1.out (PC_n)
  • t6: IM.out <= [addr] = K

Now, we have all the parts needed for this step. Notice that we do not extract anything from register memory, since the register specified is just the destination.

  • w7: D/E.in1 = R0, D/E.in3 = IM.out (k)

Note

Notice that the RTL for each stage ends by placing signals in the barrier register. That register stops the actions within a stage, and prepares the machine for the next step. The next clock tick will make that happen.

Execute

There is nothing to do involving the ALU in this step. In our current thinking, the only thing needed here is to fetch the required data from the memory.

Some designers introduce another stage here, one that only accesses the data memory. We will not do that, but will do the work here to get the required data:

  • t8: D/E.out1 <- R0, D/E.out2 <- k
  • w5: DM.addr <- k, DM.RD = 1
  • t9: DM.out <- [k]
  • w6: E/W.in1 <- [k], E/W.in3 <- R0

Write-back

Finally, we again need to save the result back in a register

  • t7: E/W.out1 <- [k], E/W.out3 <- R0
  • w7: REG.in3 <- R0, REG.in1 <- R0, REG.wr <- write
  • T8: [R0] <- res

There is one missing piece to this puzzle. I said that each step ends with a wire action. We have not included that in either of these instructions. In fact, we need to make sure the PC is updated for the correct nect instruction.

For the ADD instruction, we already have the next address calculated. It was placed on the increment component back in the fetch stage. All we need to do there is complete the action to move that value back into the PC register.

  • t9: PC.in <- INCR1.out

However, for the LDS instruction, we need to figure out the next address.

  • w9 INCR.in <- PC_n
  • t10: INCR1.out <= PC_n + 1

Now we feed this value to the to the program counter:

  • t10: PC.in <- PC_n

Examining Instructions

In reviewing this RTL definition of how one instruction works, we see that the needed wires are defined by where signals need to travel. If we build a matrix of these pieces of information, we can deduce how the machine is to be constructed. That will introduce both simple wires, and multiplexors to the system. If we see that two wires in this set need to place their values on the same component input pin, we need a multiplexor, set by the control unit, to figure out which wire should get through for this particular instruction. The control unit will be responsible for that, and the needed signals will be established during the decoding step.

Ultimately, we need to perform this kind of analysis for every instruction we propose including in the machine. It is not difficult, but it is tedious to do. This is part of the reason the HDL languages have been developed. These languages let you describe the flow f the data through the machine. The HDL compiler figures out the needed wires, and add in the multiplexors as needed to complete the machine.

Neat!

Notice that we discussed this data flow without referring to a wiring diagram. We did postulate the existence of a set of basic parts, and defined the input and output points on each component. The signals defined must travel over wires between the two component “ports”.

Streamlining the RTL

Let’s simplify the RTL and get rid of port designations and data values. We will focus just on register names. We will also skip passing data across a register, and just show the operations involved inside (the math).

ADD RTL

t0: IM <- PC
t1: DCD <- IM[PC]
t2: DCD[OP1] <= INST[0-3,9] ; Rd,
    DCD[OP2] <- INST[4-8]; Rr,
    ALUOP <= add
t3: ALUres <- REG[Rd] + REG[Rr]
t4: REG[Rd] <- ALUres
t5: PC <- PC + 1

Much shorter. It is possible to derive the resulting machine and get more detailed in the design fro this.

LDS RTL

t0: IM <- PC
t1: DCD [IM[PC}
t2: DCD[op2] <- INST[4-8]; Rd
t3: PC <= PC +1
t4: k <- IM[PC]
t5: MEMres <- DM[k]
t6: REG[Rd] <= MEMres
t7: PC <- PC + 1

How is the PC really going to be updated in this? Perhaps we need two increment nits. One for the final update, and another to deal with the two-word instructions.

Laying Out the Machine

In real circuit design, a wiring diagram is defined using something called a “net-list”, which is just a spreadsheet holding all the parts, connection points, and needed connections. Special software takes this net-list and performs the geometric analysis to lay out the parts so the lengths of the wires can be minimized. In building electronic boards, there may be several layers of printed circuit pathways (wires) that move between component attach points. I saw a supercomputer board with a dozen layers, all designed to keep data paths as short as possible:

Short wires make fast machines!

Note

I had the idea of adapting this technique to our simulator. If we define the machine using RTL, we might be able to derive the machine diagram automatically. Building a tool to do this is not that complex (we had a start in our first HLD parser). The TikzBuilder project is working in that direction.

As a final look at the entire system, only for another processor. Here is a full diagram. From this you can see how they use barrier registers to control the movement of signals from one stage to the next.

../_images/CPUdiagram.png

Step 4: Store

The last stage is where the result is written back into the location indicated by the instruction. At this stage, it may be that we have a new value we need to store in the Program Counter register. IN this case, we need to route that value back tot he program counter register where is will be held until the next clock tick.

Refactoring the Design

As you build a program, you make many design decisions along the way. Over time, it might occur to you that your decisions have resulted in code that is getting overly complex. In our case, the decision to manage a variable number of objects at points in our system using the C++ vector class seemed natural at first, but as we get into building the actual simulator we run into problems. Maybe it is time to revisit those decisions.

Reviewing our Design

We have a basic scheme proposed for the simulation. The original idea came from the silly game we played at the beginning of this adventure.

You played the role of a “component” with a “mission” to perform. You had I/O points where “signals” could arrive and leave. The “signals” were a stack of cards. Those “signals were traveling between “components”. We did not use “wires” for that, but imagine the table between each of you where the cards could be placed. That is our “wire”!.

The game was played by asking every “component” to look at the available “signals” and work their “mission”. We had to manage that, or there would have been fights over each stack of cards as two components attacked the same stack of cards. (Ignore that detail for now!)

We took that idea, and set off to build a collection of C++ “components” that could be given a mission. The “signals” will just be digital data of some sort that travels over some kind of C++ “Wire”. We will “attach” “wires” to the “components” to allow them to communicate. In the design we have now, we introduced a “pin” to help manage the attachment process.

So what is wrong?

In using a C++ vector we plce objects into a variable sized array that we can work on using an integer index. You do that all the time, so it seems an obvious way to manage things. However, in our simulator the objects we are managing have names that we are tracking as well. If we want to access an object using that name, we cannot just use the right index, since we do not know what that index is. We have to search the vector to locate the right object by name.

This is not really hard, but doing that over and over ends up creating messy code.

In many places in the current design, the objects I am tracking have names, but the objects themselves are sitting in a magic array, indexed by some arbitrary number. If I want to find an object based on the name, I have a problem. I have to search for it. This is not overly hard, but it is messy to do over an over.

Is there a better way to manage a collection of objects, each with a name, and access the object by that name? And will that new way still allow me to have a variable number of objects in the collection?

Python Dictionaries

Python developers have a solution. This is such a common problem in computing that Python provides a special data type that works like an array, only it uses strings as indices (keys is the proper term here).

The Python data type is called a dictionary. The dictionary can hold any kind of thing, but the “index”, or “key”, is a string. How this is managed internally is not something most Python developers even think about, but it is an interesting programming problem, one often faced by C++ developers learning about more advanced data structures

Fortunately, there is a C++ data type we can use to accomplish the same thing. The Standard Template Library, available on all C++ tools, includes a map data type that does what we want.

Let’s consider ditching the vector and try to use the map instead

Before we get into that, let’s look at a simple example of a map at work:

Warning

Exploring things like this in C++ may make you appreciate Python even more!

Declaring a Map

Here is a simple example of creating a map object:

map_example.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <map>
#include <cstdint>
#include <iterator>


int main(void) {
    int val;
    std::map<std::string, uint8_t> pins;
    //insert a pin
    pins["PCI"] = 5;
    pins["PCO"]= 7;

    std::map<std::string,uint8_t>::iterator it = pins.begin();
    while(it != pins.end()) {
        val = it->second;
        std::cout
            << it->first
            << ":"
            << val
            << std::endl;
        it++;
    }
}

We cannot simply set up a for loop to work through the items in the pins container, because the indices are strings. Even worse, we really do not know in what order those indices will appear (it looks to be alphabetical, but do not depend on that!).

The iterator is the C++ way to deal with this. There is some C++ magic working behind the scenes to make this loop work. For our purposes, just use this pattern whenever you need to loop over all pins and things will work fine.

Note

When you start using anything new, practice setting up a sample of the code you want to try in a separate work area. I create a directory named sandbox in my projects for this purpose. That directory may not even make it onto GitHub, since I use it only for small testing exercises.

Here is the output from this example:

$ g++ -o test map_example.cpp
$ ./test
PCI:5
PCO:7

If we want to use the map instead of the vector, we need to review how we are assembling our system.

Components

A component is a basic digital part that performs some kind of simple action. Each component supports a collection of input signals attachments, and output signal attachments. We also provide a basic method that tells the component to do its work.

We set up a place to name each component object, and we will need a potentially large number of these objects in our simulation. The overall machine needs to store all of these named components. It looks like a map of components will be useful here.

Pins

We proposed an object that could be used as an attachment point for a signal “wire”, and proposed storing the current signal value on that “pin”. We gave each “pin” a unique name, so it looks like we can replace the `vector used for input pins and output pins with just a simple map. However, doing so means that we “attach: wires to components by using a combination of the component name and the pin name.

Signals

Digital systems communicate with each other by passing bits over wires connected between them. The medium we use to connect is called a “wire”. In real digital systems, a wire can move only one bit between components, so we normally gather a set of wires together and attach all of them between components, so we cna move a collection of bits in paralle. Much fasted. Formally, this collection of wires is called a “bus”. That means that we may be moving as many bits at one time as we like between components, and we have to manage that.

Wires

We will model wires in a simple way. A wire has a single value traveling on it at any given moment in time, but we will allow our wire to move an entire collection of bits. Since we are modeling an 8-bit machine, it seems natural to creata a wire that can move eight bits at a time.

There is also a place in our system where being able to move 16-bit data will be needed. The instruction memory unit in our example machine delivers 16 bits of data at ne time. So we also need a 16 bit wire.

Wires can be “driven” by only one component, meaning that a wire can be “attached” to only one output signal at a time.

However, wires can “drive” several components. This means that a single wire can be attached to any number of input ports on components as we like,

Components

Our components are simple boxes, each with a name and a collection of input signal ports, and output signal ports. Internally, there is some “behavior” modeled by a method named “tick” that transforms the input signals to the output signals.

Since the Component can have many input signals and many output signals, we proposed adding a vector of “pins” to each component.

Pins

A pin is at attachment point. It also provides a place to store the current value of the signal available on that pin (either coming from somewhere on an input pin, or going somewhere on an output pin.)

Pins?

We created the Pin class as a way to track pin names and current values. Guess what? We do not even need Pins any more. The new data type stores both of these items, so we can ditch the Pin class in this revision.

Here is a revised Component class using a map:

Component.h
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#pragma once
#include "Pin.h"
#include <string>
#include <map>

class Component {
    public:
        // constructor
        Component();    // default

        Component(std::string n);

        // destructor
        ~Component();

        // input pins
        std::map<std::string, uint8_t> in_pins;

        // output pins
        std::map<std::string, uint8_t> out_pins;

        // mutators
        void add_in_pin(std::string name);
        void add_out_pin(std::string name);
        void tick(void);

        // accessors
        void dump_pins(void);
        std::string get_name(void);

    private:
        std::string name;
};

The changes were minor here. The implementation is where the map is used:

Component.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// implementation file for Component class
#include "Component.h"
#include <iostream>
#include <iomanip>

// default constructor
Component::Component() {}

Component::Component(std::string n) {
    name = n;
}

std::string Component::get_name(void) {
    return name;
}

Component::~Component() {
}

void Component::dump_pins(void) {
    std::cout << "Input pin state:" << std::endl;
    std::map<std::string,uint8_t>::iterator it = in_pins.begin();
    while(it != in_pins.end()) {
        val = it->second;
        std::cout
            << it->first
            << ":"
            << val
            << std::endl;
        it++;
    }
    std::cout << "Output pin state:" << std::endl;
    std::map<std::string,uint8_t>::iterator it = out_pins.begin();
    while(it != out_pins.end()) {
        val = it->second;
        std::cout
            << it->first
            << ":"
            << val
            << std::endl;
        it++;
    }
}

void Component::add_in_pin(std::string n) {
    in_pins[n] = 0;
}

void Component::add_out_pin(std::string n) {
    out_pins[n] = 0;
}

void Component::tick(void) {
    // dummy -
hould be virtual

Step 5: Simulation Loop

The clock in our simulator runs all the time. In fact, it is the one unit that really makes our machine work. We want our clock to be fast, because we want our machine to be fast. However, fast is a relative thing. The machine has no sense of time internally. To the machine, each tick is just a command to do something, send to all the parts in the machine. What happens is the challenge in making a machine do something useful!

Digital Parts

We know that the machine is constructed out of a potentially huge number of simple parts. Each part (or “Component” as we are calling it) has a set of input signals it will use to do internal work, and a set of output signals it will generate as a result of whatever internal processing the part does.

Signals are carried over one or more wires used to connect parts together. These signals arrive at each part at some point in time. The part can react to changes on those signal lines immediately, or it can wait until told to react to them based on some signal (clock tick). We call the first kind of part a “combinational” part, and the second kind a “sequential” part.

The part reacts by beginning to process the input signals to produce the required set of output value. Those outputs will travel over other wires to other components, but that it not our concern now.

The time it takes to complete the internal processing is something we need to consider. The internal work is not instantaneous, and may vary depending on the exact part we are considering. A simple register cna be very quick, but a memory module, especially one based on dynamic ram, is going to be pretty slow.

Since we are attempting to build a software program that models a real machine, we need a way to model this passing of time, and make sure each part “works” in a realistic way.

A simple change to our model will allow us to set a time value on each part, and use that to assess how “fast” our machine will be. We will get to that idea in a bit.

Simple Simulation Loop

Here is the most primitive simulation loop we can write:

for(int time = 0; time < stop_time, time++) {
    for(int p = 0; p<parts_count; p++) {
        part[p].tick();
    }
}

Since our simulation scheme depends of letting the wires have a turn at moving signals around, we really need to give the wires a chance. That looks like this:

for(int time = 0; time < stop_time, time++) {
    for(int p = 0; p<parts_count; p++) {
        part[p].tick();
    }
    for(int w = 0; w < wires_count; w++) {
        wire[w].tock();
    }
}

This simple loop scheme is all we need as a start, but it can be very inefficient. We are assuming that all parts have some initial value on the input signal “ports”, and we begin processing by giving all of the system parts a chance to generate new output signals. These new signals are then passed along to the input side of the connected parts, and we keep going to watch the system run.

There is a significant problem with this basic idea. Not all parts need to do something on each “tick” of this simulated system clock.

We have talked about giving each part an “enable” signal, which will decide if we are to respond or not. The basic idea is shown in this code fragment:

Component::tick(void) {
    if(enable) {
        // do something
    }
    // return, we are done!
}

If the enable signal, (one of our input signals), has a value of “false”, we do nothng on this call to tick. If it is true we do whatever processing is required.

The source for each of these enable controls is the control unit. The signals will be sent to each component as needed to control the actual action each time a clock “tick” happens. The control unit figures out which parts to activate and which parts to disable during instruction decoding. The result is a very large number of control lines wandering from the control unit to each componentin the system (along with the clock). That is a lot of wires.

Generating Timing Data

If we add data indicating how long each part will take to produce its output results, we can use that data to We can track how long the processing took. We need to be careful about this, though. In the real machine, each component does not sit there doing nothing while all other parts do their thing. Instead, they all work in parallel. That means we need to find the largest amount of time consumed by any part in the machine during one “tick” cycle. Here is a simple change to our simulation loop to track this:

int total_time = 0;
int max_tick_time = 0;
int tic_time = 0;

for(int time = 0; time < stop_time, time++) {
    for(int p = 0; p<parts_count; p++) {
        tick_time = part[p].tick();
        if(tick_time > max_tick_time) max_tick_time = tick_time;
    }
    for(int w = 0; w < wires_count; w++) {
        wire[w].tock();
    }
    total_time += max_tick_time;
}

Here we are asking the “tick” routines to return an integer indicating how long each took to complete the internal processing.

We could extend this idea and track other useful design data. How much electricity did you consume? How much heat did you generate. Each of these is important in designing a real system. If the machine gets too hot and it will fail. If it consumes too much energy, and it will not do in a portable/mobile device.

Simulating Component Control

Our simple simulator loop calls the “tick” routine in each component, then the “tock” routine for each wire on every pass through the simulation loop. Is this really needed.

If a component is disabled, there is no work to do in that component, so even calling it to do nothing is a waste of time. Furthermore, any of the output “wires” connected to that component will have no change in their values, so nothing really needs to be transferred from the idle component to any connected other part. That means we are wasting a lot of our host machine’s processing doing calls to functions that are not needed.

How can we manage all of this in a more effective way?

The answer is to switch from the simple loop structure shown here to another common form of simulation. This one is called “Event Driven Simulation”

Event Driven Simulation

In Event Driven Simulation, each component in the system is given an initial chance to start work from some starting condition.