Computer Architecture: ECE 3055 Sample Problems and Solutions
Transcription
Computer Architecture: ECE 3055 Sample Problems and Solutions
Computer Architecture: ECE 3055 Sample Problems and Solutions 1 Consider the execution of the following block of SPIM code. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. .data start: .word 0x32, 44, Top, 0x33 str: .asciiz “Top” .align 8 end: .space 16 .text .globl main main: Top: exit: li $t3, 4 la $t0, start la $t1, end lw $t5, 0($t0) sw $t5, 0($t1) addi $t0, $t0, 4 addi $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $zero, Top li $v0 syscall 1 (a) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents. Data Segment Addresses Data Segment Contents 0x10010000 0x00000032 0x10010004 0x0000002c 0x10010008 0x00400010 0x1001000c 0x00000033 0x10010010 0x00706f54 0x10010014 0x00000000 0x10010018 0x00000000 0x1001001c 0x00000000 0x10010020 0x00000000 0x10010024 0x00000000 The value in 0x10010008 depends on how the preceding instructions are emcoded. “la $t1, end” is encoded into two instructions while “la $t0, start” is encoded into one instruction. As long as your assumptions are consistent you shouldnot lose points. Everything else is 0x00000000 upto and beyond 0x10010100 (this is the point of the .align 8 directive) 1 (b) What are the values of the following. exit 0x00400028 Top 0x00400010 addi $t3, $t3, -1 (encoding) 0x216bffff 1 (c) Provide a set of data directives that will perform the memory allocation implied by the following high level language declarations. • character My_Char; integer Limit; .data .byte 0x00 .align 2 .word 0x0 • integer array[10]; .data .space 40 ** Note there are many solutions * 1 (d) Given the following sequences, what is the range of feasible addresses for the label target on the MIPS architecture? start: . . . target: beq$1, $2, target add$4, $5, $6 sub$6, $4, $8 start + 8 <= target =< start+4 +215-1 (words) 1 (e) What i s the difference between the j and jal instructions? Both instructions work identically with regard to the program control flow, i.e., which instruction is executed next and how he address fields of the instruction are endcoded. The jal instruction also causes the value of PC+4 to be stored in register $31. 2 The following is block of SPIM code to compute the sum of the elements of a diagonal in a matrix stored as shown. For the matrix shown below the sum in $t2 should end up being 1+12+23+34 = 70. Fill in any missing pieces of code to complete the program. Use any additional registers as necessary to keep track of the number of loop iterations and any other information you require. You may complete your code on the next page as necessary. If does not have to fit on the spaces shown. You cannot change the loop body. Remember there are many distinct, correct, feasible solutions! .data row1: row1: row1: row1: .word 1,2,3,4 .word 11,12,13,14 .word 21,22,23,24 .word 31,32,33,34 .text la $t0, row1 li $t6, 20 # load offset to next diagonal element (16+4) li $t7, 4 # load counter for number of elements loop: lw $t1, 0($t0) add $t2, $t2, $t1 #loop body that performs the sum add $t0, $t0, $t6 addi $t7, $t7, -1 bne $t7, $0, loop # check if all elements have been accessed li $v0, 10 # exit the program syscall 3 Consider the following MIPs code taken from some program. We wish to turn this piece of code into a procedure. Note that no procedure call conventions have been followed and the procedure must be re-written to follow the MIPs procedure call conventions. procA: lw $12, 0($8) sw $12, 0($15) addi $8, $8, 4 addi $15, $15, 4 addi $16, $16, -1 bne $16, $0, procA jr $31 3 (a) What registers will have to be saved on the stack by the procedure if the MIPS register procedure call convention is to be followed and you do not modify the code shown above? $s0, $fp (assuming we use the frame pointer). $ra does not have to saved since no other procedure/ function is called. 3 (b) Which registers are used to pass parameters to this procedure? How many registers do you think will be required for parameter passing ? The values of $8, $15, and $16 would be required in the procedure and are passed using registers $a0, $a1, and $a2. 3 (c) If the procedure is stored in memory starting at location 0x00400020, what will be the encoding of the jal procA instruction that is in the calling program? 0x0c100008 4 Consider the execution of the following block of SPIM code on a multicycle datapath. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume immediate instructions take 4 cycles. start: str: main: move: end: done: .data .word 21, 22, 23,24 .asciiz “CmpE” .align 4 .word 24, 0x77 .text .globl main li $t3, 4 lui $t0, 0x1001 lui $t1, 0x1002 lw $t5, 0($t0) sw $t5, 0($t1) addiu $t0, $t0, 4 addiu $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $zero, move 4 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle of program execution is cycle number 0! Cycle ALUSrcA ALUOp RegWrite Instruction Register IorD 14 1 00 0 0x8d0d0000 X 35 1 01 0 0x1560fffb X 4 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents. Data Segment Addresses Data Segment Contents 0x10010000 0x00000015 0x10010004 0x00000016 0x10010008 0x00000017 0x1001000c 0x00000018 0x10010010 0x45706d43 0x10010014 0x00000000 0x10010018 0x00000000 0x1001001c 0x00000000 0x10010020 0x00000018 0x10010024 0x00000077 4 (c) Show a sequence of SPIM instructions that will branch to the label loop if the contents of register $t0 is 13. addi $t1, $t0, -13 beq $t1, $zero, loop 4 (d) What are the values of end, done, start, and main? start: 0x10010000 main: 0x00400000 end: 0x00400020 done: 0x00400024 4 (e) What does the following bit pattern represent assuming that it is one of the following. 10101101 00101101 00000000 00000000. Provide your answer in the form indicated. A single precision IEEE 754 floating point number (show actual value in scientific notation) ____-1.0101101 x 2-37_________ A MIPS instruction (show in symbolic code) _sw $t5, 0($t1)__________ 5 Consider the execution of the following block of SPIM code on a multicycle datapath. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume immediate instructions take 4 cycles. start: str: main: move: end: done: .data .word 21, 22, 23,24 .asciiz “CmpE” .align 4 .word 24, 0x77 .text .globl main li $t3, 4 lui $t0, 0x1001 lui $t1, 0x1002 lw $t5, 0($t0) sw $t5, 0($t1) addiu $t0, $t0, 4 addiu $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $zero, move 5 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle of program execution is cycle number 0! Cycle ALUSrcB ALUOp RegDst Instruction Register PCWriteCond 9 11 00 X 0x3c091002 0 20 00 00 0 0xad2d0000 0 5 (b) Show a sequence of SPIM instructions to implement the following: x = a[i*4], where a[i] is the access of element i from array a[]. Use whatever register allocation you need. .text add $t5, $t0, $t0 # $t0 contains the value of i add $t5, $t5, $t5 # two additions to compute i*4 add $t5, $t5, $t5 # two additions to compute i*8 add $t5, $t5, $t5 # two additions to compute i*16 add $t1, $t5, $t3 # base address contained in $t3 lw $t2, 0($t1) # value of x stored in $t2 5 (c) What does the following bit pattern represent assuming that it is one of the following. 00111100 00001101 00010000 00000000. Provide your answer in the form indicated. A single precision IEEE 754 floating point number (show actual value in scientific notation) ____1.00011010001 x 2-7_________ A MIPS instruction (show in symbolic code) _lui $t5, 0x1000__________ 6 Answer the following questions with respect to the MIPS program shown below. Assume that the data segment starts at 0x10010000 and the text segment starts at 0x00400000. .data L1: .word 0x32, 104 .asciiz “Test 1” .align 2 Blank: .word L2: .space 64 main: loop: .text li $t0, 4 move $t2, $zero li $a0, 1 jal solo addi $t0, $t0, -1 addi $a0, $a0, 1 add $t2, $t2, $v0 slt $t1, $t0, $zero bne $t1, $zero, loop li $v0, 10 syscall 6 (a) How much space in bytes does the program occupy in the data segment and text segment? Data segment (21 words, 84 bytes) - remember must also count reserved space Text segment (11 words, 44 bytes) 6 (b) With respect to the above program, • what are the values of the following? L2 _____0x10010014____________ loop ______0x0040000c___________ memory location 0x1001000c ______0x00003120__________ • What information would be stored in the symbol table for the above program? Be specific about the above program. Do not provide a generic answer. Upon completion of compilation, the label solo and its address (relative to the start of this module). This is information to be used by the linker to find and link the module corresponding to solo. 6 (c) What is the encoding of the following instructions? bne $t1, $zero, loop ____0x1520fffa____________ add $t0, $t0, $v0 _____0x01024020__________ 6 (d) Which of the instructions in the above program, if any, must be identified as requiring relocation information ? The jal instruction since this is the only instruction that relies on absolute addresses. 6 (e) Suppose the procedure solo is independently compiled as a separate module. When it is linked with the main program the first instruction in solo is placed in memory at address 0x0040100c. Provide the hexadecimal encoding of the jal solo instruction. 0x0c100403 7 Consider the program shown on the previous question. You have probably noticed that it does not follow the MIPS register saving convention. Rewrite the code showing any changes that would be required by the MIPS register saving conventions. .data L1: .word 0x32, 104 .asciiz “Test 1” .align 2 Blank: .word L2: .space 64 main: loop: .text li $t0, 4 move $t2, $zero li $a0, 1 subu $sp, $sp, 32 sw $t0, 0($sp) sw $t2, 8($sp) sw $a0, 12($sp) sw $t1, 16($sp) jal solo lw $t0, 0($sp) lw $t2, 8($sp) lw $t1, 16($sp) lw $a0, 12($sp) addu $sp, $sp, 32 addi $t0, $t0, -1 addi $a0, $a0, 1 add $t2, $t2, $v0 slt $t1, $t0, $zero bne $t1, $zero, loop li $v0, 10 syscall 7 (a) What is the difference between the jal and j instructions? The execution of the jal instruction also causes the return address to be saved in $ra. The target address is computed in the same manner for both instructions. 8 Answer the following questions with respect to the MIPS program shown below. For the purpose of this test, assume that each instruction can be stored in one word, i.e., do not worry about pseudo instructions! Further, assume the data segment starts at 0x1001000 and that the text segment starts are 0x04000000. L2: .data .word 24, 0x16 .byte, 77,66,55,44 .text move $t0,$0 loop: mul $t8, $t0, 4 add $t1, $a0, $t8 sw $0, 0($t1) addi $t0, $t0, 1 slt $t7, $t0, $a1 bne $t7, $0, loop 8 (a) What are the values of the labels L2 and Loop? L2 __0x10010008__________ loop ___0x04000004___________ 8 (b) Provide the hexadecimal encodings of the following instructions? bne $t7, $0, loop 0x15e0fffa add $t1, $a0, $t8 0x00984820 8 (c) Show the contents of the data segment. Show both addresses and values at those addresses. 0x10010000 0x18 0x10010004 0x16 0x10010008 0x2c37424d 8 (d) The instruction BLT (Branch on less than or equal too) is not a native instruction. Show the SPIM code that could be used to implement this test. ble $t2, $t3, loop slt $t1, $t3, $t2 beq $t1, $0 loop 9 (20 pts) This question deals with procedure calls. Consider the following SPIM program segment. For the purpose of this test, assume that each instruction can be stored in one word, i.e., do not worry about pseudo instructions! Further, assume the data segment starts at 0x1001000 and that the text segment starts are 0x04000000. .data first: .word 0x33, 0x2 .byte 4, 3, 0, 0 str: .ascii”Final” .text main: li $t0, 4 move $a0, $t0 loop jal func move $a0, $v0 slt $a0, $0, $t9 bne $t0, $0,loop li $v0, 10 syscall func: addi $a0, $a0, -1 move $v0, $a0 jr $ra 9 (a) Consider the first time the procedure func is called. What is the contents of $ra when the program is executing func. 0x0400000c 9 (b) Based on the MIPS register saving conventions which registers should be stored on the stack in the calling stack frame when func is called. $t0, $t9, $a0 Note in this case $a0 is stored as a matter of convention since the calling procedure is not relying on the availavbility of the value that $a0 contained before the call. 10 Answer the following questions with respect to the MIPS program shown below. Assume that the data segment starts at 0x10010000 and that the text segment starts at 0x00400000. Assume that the instruction done is encoded in one word. .data label: .word 24, 28 .byte 64, 32 .asciiz “Example Program” .text main: jal push jal pop done pop: ret1: lw $fp, 0($sp) lw $ra, 4($sp) addiu $sp, $sp, 32 jr $ra push: subiu $sp, $sp, 32 sw $fp, 0($sp) sw $ra, 4($sp) ret2: jr $ra 10 (a) Write the values of the words stored at the following memory locations. Provide your answer in hexadecimal notation. Word Address Value _0x10010008________ ___78452040____ _0x0040000c________ ___8fbe0000____ _0x00400000________ ___0c100007_________ 10 (b) Consider the process of assembly of the above program. • What would the symbol table contain? All of the labels that the linker needs to know about: push and pop and main (for programs where the loader creates code that jumps to main). We do not need to store labels from the data segment. • Assuming that the memory map is fixed, what relocation information would be recorded if any? The jal instructions generate absolute addresses and must have relocation information stored to permit placement of the program starting elsewhere in memory. 10 (c) What are the values of the following labels? ret1 ____0x00400018___________ push ____0x0040001c___________ 10 (d) Assume that the procedures push and pop were assembled in a distinct file and linked with the main program. Further assume that the procedures were placed in memory starting at location 0x00400040. Provide the encoding of the ‘jal pop’ instruction. 0x0c100010 11 (a) Provide the MIPS code to do the following. • implement the greater-than-or-equal-to-zero test using only native instructions. Example for a general test of $t0 greater-than-or-equal-to $t1 slt $t2, $t0, $t1 bne $t2, $0, False j True Where True and False are the labels of the instructions that correspond to the blocks of code to be executed depending on the outcome of the test. • initialize a register to the value 0x33334444. move $t0, $zero lui $t0 0x3333 ori $t0, $t0, 0x4444 11 (b) Provide two advantages and two disadvantages of RISC vs. CISC machines. From text 12 Answer the following questions with respect to the MIPS program shown below. Assume that the data segment starts at 0x10010000 and the text segment starts at 0x00400000. .data first: .word 0x21, 32 .byte 4, 3 .align 2 str: .ascii”Test” .text main: li $4, 4 loop: jal func slt $7, $0, $4 bne $7, $0,loop end: li $v0, 10 syscall func: addi $4, $4, -1 jr $31 12 (a) What function does the last two instructions in the main program realize? Program Exit. Ref SPIM documentation 12 (b) With respect to the above program, • what are the values of the following symbols? end __0x00400014_____________ loop __0x04000004_____________ func __0x04000018_____________ str __0x1001000c_____________ • What is the total number of bytes required for this program, both data segment and text segment storage? data segment - 16 text segment - 32 12 (c) What is the encoding of the following instructions? bne $7, $0, loop __0x14e0fffd__________ jal func ___0x0c100006________ addi $4, $4, -1 ___0x2084ffff_________ 12 (d) Using only native instructions, how can you implement the following test: branch-ongreater-than-or-equal-to-5? Example: slt $t2, $t0, $t1 bne $t2, $0, False j True Where True and False are the labels of the instructions that correspond to the blocks of code to be executed depending on the outcome of the test. 13 Consider the case where Procedure A calls procedure B, and procedure B calls procedure C. Procedures A and B both use registers $15, $16, $17 and $24, and procedure C uses registers $9, $10, and $11. The top of the stack initially points to the top of an active frame. 13 (a) Show the contents of the stack when Procedure C is executing. Assume the MIPS register saving conventions as defined in the text. State, or clearly indicate the number of stack frames and which registers are saved by each procedure. 7FFFFFFC $sp $15 $24 saved by A $16 $17 stack grows down saved by B $15 $24 C’s stack frame Note: I have not shown the procedures saving $ra and $fp which should be illustrated. Procedure C does not use any of the $sx registers and does not call other procedures. Therefore C does not need to save any of the registers. 13 (b) Would callee save have been more efficient? Justify your answer. No. Each procedure would have to save all the registers used in the procedure. This is clearly less efficient, e.g., look at procedure C. 14 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume that registers $7, $8 and $9 have initial values 16, 0x10010020, and 0x10020020 respectively. str: move: end: .data .asciiz “Start” .align 2 .word 24, 0x6666 .text lw $12, 0($8) sw $12, 0($9) addiu $8, $8, 4 addiu $9, $9, 4 addi $7, $7, -1 bne $7, $0, move 14 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle of program execution is cycle number 1 Cycle ALUSelA ALUOp PCWrite IR (in HEX!) ALUOutput 8 1 00 0 0xad2c0000 0x10020020 20 1 00 0 0x20e7ffff 0xf 14 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents. Data Segment Addresses Data Segment Contents 0x10010000 0x72617453 0x10010004 0x00000074 0x10010008 0x00000018 0x1001000c 0x00006666 14 (c) Assume the exception/interrupt handler is stored at location 0x80000020. Provide the SPIM instruction(s) for transferring program control to begin executing instructions at this address. Do not worry about setting the return address. lui $1, 0x8000 addi $1, $1, 0x0020 jr $1 14 (d) If the above code was implemented as a procedure, following the MIPS register saving conventions, which registers, if any, would be saved by the procedure? All of the registers which are used are temp registers and therefore are not saved by the procedure. Since no other procedure is called, there is no need to save the return address regaietrs, $ra. Using the MIPS convention only the frame pointer would be saved in this example. There are sufficient argument registers 14 (e) What does the following bit pattern represent assuming that it is one of the following. 10101101 11110010 00000000 00000000. Provide your answer in the form indicated. A single precision IEEE 754 floating point number (show in scientific notation) ___-1.111001 x 2-36___________ A MIPS instruction (show in symbolic code) __sw $18, 0($15)___________ 15 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume that registers $7, $8 and $9 have initial values 16, 0x10010020, and 0x10020020 respectively. str: move: end: .data .asciiz “Exams” .align 4 .word 24, 0x6666 .text lw $12, 0($8) sw $12, 0($9) addiu $8, $8, 4 addiu $9, $9, 4 addi $7, $7, -1 bne $7, $0, move 15 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle of program execution is cycle number 1 Cycle ALUSelB ALUOp RegDst IR (in HEX!) MemToReg 7 11 00 X 0xad2c0000 X 17 10 00 0 0x25290004 0 15 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents. Data Segment Addresses Data Segment Contents 0x10010000 0x6d617845 0x10010004 0x00000073 0x10010008 0x00000000 0x1001000c 0x00000000 0x10010010 0x00000018 0x10010014 0x00006666 15 (c) What is the difference between signed and unsigned instructions and what is the moti- vation for making this distinction? Signed instructions interpret operands as 2’s complement numbers They also will set conditions such as overflow, negative, etc. Unsigned numbers will be treated as n bit binary numbers. They will typically not set conditions such as overflow. Certain classes of numbers natually involve unsigned arithmetic, e.g., memory address computations. In order to detect these conditions in some instances and ignore them in others, MIPS uses two different types of instructions. 15 (d) What are the values of end and str? str: 0x10010000 end: 0x00400014 15 (e) What does the following bit pattern represent assuming that it is one of the following. 00101001 10111010 00000000 00000000. Provide your answer in the form indicated. A single precision IEEE 754 floating point number (show in scientific notation) ____1.011101 x 2-44_________ A MIPS instruction (show in symbolic code) _slti $26, $13, 0__________ 16 Answer the following questions with respect to the MIPS program shown below. Note that this simple program does not use the register saving conventions followed in class. Assume that each instruction is a native instruction and can be stored in one word! Further, assume the data segment starts at 0x10001000 and that the text segment starts are 0x00400000. .data label: .word 8,16,32,64 .byte 64, 32 .text .globl main main: la $4, label li $5, 16 jal func done func: loop: move $2, $4 move $3, $5 add $3, $3, $2 move $9, $0 lw $22, 0($2) add $9, $9, $22 addi $2, $2, 4 slt $8, $2, $3 bne $8, $0, loop move $2, $9 jr $31 16 (a) What does the above program do? Sum the elements stored starting at address label 16 (b) State the values of the labels loop and main? 0x00400020 loop __________________________ 0x00400000 main ___________________________ 16 (c) what are the hexadecimal encodings of the following instructions? bne $8, $0, loop 0x1500fffb add $9, $9, $22 0x01364820 16 (d) The above program does not implement any register saving conventions on a procedure call. Assume we will use the MIPS register saving convention i.e., Caller/Callee Save. Ignore the storage of local variables on the stack. What will be the contents of the top frame on the stack when the procedure func is executing? $fp, $22, since no other procedure is called $ra is not stored 16 (e) Identify the local and global labels in the program? main is a global label all other labels are local labels 17 (a) What is an unresolved reference? How and when does it become resolved? A reference to a label that is undefined It is resolved by the linker which has access to all of the labels accessed by the programs being linked 17 (b) Write the SPIM instruction sequence that you could use to implement the branch-ongreater-than-or-equal-to-test. For example, how would implement “bge $7, $6, loop”. slt $1, $7, $6 beq $1, $0, loop 18 Consider the following isolated block of code. Note that some initialization code is missing. .text move: lw $12, 0($8) sw $12, 0($9) addiu $8, $8, 4 addiu $9, $9, 4 addi $7, $7, -1 end: bne $7, $0, move 18 (a) If the text segment starts at 0x04000000, what is the value of the label end?. 0x04000014 18 (b) Let is assume we wish to implement the above function as a procedure. • How many arguments would the procedure have and how would they be passed to this procedure from a calling program? Three passed in argument registers $a0-$a2 • name the registers that would be stored on the stack after this procedure is invoked $7, $8, $9, $12 would have to be stored by the caller, if at all. $fp is the only register that is stored following the steps in the text. 18 (c) Show the hexadecimal encoding of the last instruction. 0x14e0ffffa 18 (d) What does the following bit pattern represent assuming that it is one of the following. 00101100 01110000 11000000 00000000 A single precision IEEE 754 floating point number (in scientific notation)) _1.111000011 x 2-39_________ A MIPS instruction (symbolic code!) _sltiu $16, $3, 0xc000________ 19 Consider the following isolated block of SPIM code. Ignore initialization code. The data segment starts at 0x1001000 and the text segment starts at 0x00400000. Note that this block of code does not use registers saving conventions on a procedure call. start: end: mystery: loop: skip: exit: addi $2, $0, 85 addi $3, $0, 1 jal mystery j exit add $4, $0, andi $5, $2, beq $5, $0, add $4, $4, srl $2, $2, bne $2, $0, jr $31 li $v0, 10 syscall $0 1 skip $3 1 loop 19 (a) What are the values of the following labels?. loop ________0x00400014________ skip _________0x00400020________ 19 (b) What are the encodings of the following instructions? addi $3, $0, 1 __0x20030001______________ beq $5, $2, skip __0x10a00001______________ 19 (c) (6 pts) Let us suppose the procedure mystery is independently compiled and linked when the program is run. After linking the program is stored in memory and mystery has a value of 0x00400010. What is the encoding of the jal mystery instruction. 0x0c100004 19 (d) Which of the following are valid MIPS instructions? If they are invalid you must state the correct form or clearly state why it is incorrect. • lw $t0, $t1($t3) The offset must be a label or numeric constant. Examples of the correct form are: lw $t0, 4($t3) or lw $t0, label2($t3) • sltiu $t1, $t2, 0x44 This is correct. Immediates can be specified in hexadecimal notation or base 10 notations. • bne $t1, label, loop The arguments to the instruction must be registers, or one of the arguments can actually be an immediate. A correct form would be bne $t1, $s3, loop 19 (e) Suppose I want to store the following information in the data segment in the order shown. Show a sequence of SPIM data directives that will correctly do so. The text string, each reserved word and the byte array should be labeled so that programs may reference them easily. • the text string “Enter a SPIM instruction” • reserve space for 4 words • store an array of 16 bytes (pick your own values) str: L1: L2: L3: L4: array: .data .asciiz “Enter a SPIM instruction” .align 2 .space 4 .space 4 .space 4 .space 4 .byte 1,2,3,4,5,6,7,8,9,11,22,33,44,55,66,77 19 (f) (2 pts). In the preceding question (1(f)) what assembler directive would cause the byte array to start on a 64 byte boundary? .align 6 6 this moves allocation to the next 2 byte boundary 19 (g) What does the program in question 1 do? You have to specific and state the function or operation performed. No partial credit. Counts the number of bit set to 1 in $2. 20 (a) What is the difference between native instructions and pseudoinstructions? Native instructions are implemented by the underlying hardware. pseudoinstructions are translated into native instructions by the assembler. pseudoinstructions provide more powerful instructions to the programmer without affecting the complexity of the hardware since they are not interpreted by the hardware. 20 (b) Let us consider the program shown in the previous question. The procedure mystery does not appear to follow any of the SPIM procedure call conventions. If you were to rewrite mystery to follow the MIPS caller/callee saving convention, what registers would you expect to find in the stack frame for mystery while it is executing? Based on our class discussions only $fp. 20 (c) Suppose that a 32-bit integer A is stored in memory, starting at the address given in register $t1, and that another 32-bit integer B is stored in the next sequential word after A. Write a MIPS procedure Swap that exchanges the values A and B in memory if A > B. The procedure should return to its caller when it is finished. Assume a purely caller-save convention for preserving registers is being used. Full credit is the program fits in the table below. Provide comments. label instruction Swap: lw $t0, 0($t1) lw $t2, 4($t1) slt $t3, $t2, $t0 beq $t3, $0, exit sw $t0, 4($t1) sw $t2, 0($t1) exit: jr $31 comment Load A Load B Check if B<A If not true (result=0)exit Swap A and B Return Another solutiojn would expect to find the address of A in $a0 and use this register instead of $t1. Since we are using caller-save and this procedure does not call anyone else, the return address does not need to be saved. With no saving requirements there is no stack frame that needs to be allocated/deallocated. 21 (a) Show the single precision normalized floating point representations of the following numbers. Use the 754 standard and present your solution in hexadecimal notation. -0.875 0.0 0xbf600000__________________________________________________ ____ ___________________________________________________________ -1.03125 ___0xbf840000____________________________________________ 21 (b) Provide the hexadecimal representation of a denormalized number in double precision IEEE 754 notation. What is the purpose of denormalized numbers? 0x00001000 Denormalized numbers are numbers with a value betwee 0 and the smallest number that can be represented. The IEEE 754 convention identifies denormalized numbers with an exponent value of 0 and a non-zero significand. 21 (c) Assuming IEEE 754 rules and conventions, what is the result of the following operation in hexadecimal notation. • 1.101 x 2112 X 1.01 x 212 (note that the exponent values are represented in base 10). Note that the exponents are biased. Thus when we add the exponents we must substract 127 to obtain the correct value of the biased exponent of the product. However when we do this we find that we have 112+12-127 = -3. Since the value of the biased exponent is less that 0, we have underflow. 21 (d) Assuming that we have 6 significant digits (including the implicit bit), does the following calculation produce any error? If so what is the magnitude of the error? 1.001 x 2121 + 1.1 x 2128 After adjusting the expoent value of the smaller number have 1.1000000000 x 2128 + 0.0000001001 x 2128 ------------------1.1000001001 x 2128 With 6 significant digits the answer is 1.1 x 2128 The magnitude of the error is 1.001 x 2121 22 For the following questions, consider the single cycle datapath shown in Figure 5.24 on page 314. The value of loop is 0x00400004, the contents of register $t0, $t1, and $t2 are 0x01000033, 0x00004008, and 0x000000c0, respectively, and the contents of register $1 is 0x10010008 22 (a) Fill in the values of control signals in the following table for each instruction. Instruction j loop sw $t2, 8($at) ALUSrc X 1 ALUOp XX 00 RegDst X X MemToReg X X Write Data Input to Data Memory contents of $16 0x000000c0 The jump instruction does not utilize any portion of the datapath other than the portion that updates the PC with the jump address. However, the register file does utilize bits of the encoded address as register addresses for rs and rt. The input to the write data port of memory is the contents of register rt. Therefore first encode the j instruction to find the value of rt. It will be 16. 22 (b) Errors in fabrication are a major concern in the design of a datapath. Suppose that MemToReg and RegDst multiplexor control signals were stuck at 0, i.e., you cannot assert these signals due to a fabrication defect. Which of the instructions supported by the datapath will still work? sw, beq, and j. This follows from the truth table for the controller. Which rows can still produce valid control signal values? 22 (c) This question deals with the changes to the datapath required to implement the addition of the jal instruction. Full credit requires you to be specific. • what are the structural changes, if any, that must be made to the datapath - We need $31 as a hardwired input address to the RegDst Mux. RegDst is now a 2 bit control signal - We need PC+4 as an input to the MemToReg Mux since it is now a candidate for writing into the register file (into $31). The MemToReg signal is now a 2 bit control signal - The jal control signal is used to select the jump address from the PC source mux. This mux is is expanded to a 2 bit signal. • Show the changes required to the main controller truth table and the PLA implementation of this controller. RegDst AluSrc MemToReg RegWrite MemRead MemWrite Br AluOp Jal 10 X 10 1 0 0 0 XX 1 The PLA implementation illustarted in Figure C.2.5 will be changed to add another gate to recognize the jal instruction. The RegDst and MemToReg outputs must be modified to produce two output signals. The most significant bit is simply the jal while the least significant remains as before. We also introduce a jal output. 23 Consider the state of following procedure executing within the single cycle SPIM data path shown in Figure 5.24 on pg. 314. The first instruction is at address 0x00400440 and registers $4 and $5 contain the values 0x10010440 and 0x00000040 respectively. Assume that all instructions are native instructions. func: loop: end: add $2, $4, $0 add $3, $5, $0 add $3, $3, $2 add $9, $0, $0 add $22, $0, $0 lw $22, 0($2) add $9, $9, $22 addi $2, $2, 4 beq $3, $2, end j loop move $2, $9 jr $31 23 (a) Fill in the values of the signals shown for the following instructions (first time through the loop). Instruction ALUOutput Value at the Write Data Input of Data Memory beq $3, $2, end 0x0000003c 0x100100444 X 01 0 X lw $22, 0($2) 0x10010440 0x00000000 0 00 1 1 RegDst ALUOp ALUSrc MemTo Reg 23 (b) Suppose in our design the controller was implemented in a separate chip. Due to a fabrication defect the MemToReg control signal has been found to be defective. What modifications can we make to any part of the datapath (excluding the controller) such that the datapath still functions correctly. (Hint: This is equivalent to asking how can I get rid of the MemToReg control signal). Invert the RegDst signal and use it as MemToReg 23 (c) Now suppose the above program was executed on a multicycle datapath. Assume that the first cycle for fetching the first instruction is numbered cycle 0. Assume immediate instructions take the same number of cycles as the R-format instructions. On which cycle do the following events take place. • add $9, $0, $0 completes execution ____15_______(the end of the instuction execution not the EX cycle) • the branch address for beq $2, $3, end is computed (for the first time through the loop) _____34____ 23 (d) What the principal advantage of the multicycle datapath over the single cycle datapath. Be specific and provide an example to illustrate your point. Each instruction only takes as long as it needs to. For example, if the clock cycle was 10 ns and the load takes 5 cycles (50 ns) branches now only take 30 ns (3 cycles and R-format instuctionsonly take 40 ns (4 cycles). The net time savings over the execution fo a program can be substantial. 24 Show the hexadecimal value of the single precision normalized floating point representations of the following numbers. Use the IEEE 754 standard.Your representation should be fully compliant with the definition. 0x7f800000 Infinity 0xc0900000 ______________________ -4.5 24 (a) Perform the following floating point operations assuming that the number representations are IEEE 754 compliant. Your answers can be represented in scientific notation. • multiplication: 0xbe000000 x 0xbf800000 ( -0.125 x -1.0 = 0.125 • 1.001 x 2131 + 1.11001 x 2126 1.001 x 2131 + 0.0000111001 x 2131 = 1.0010111001 x 2131 24 (b) Provide an example of a number that is too large to be represented in single precision notation but can be represented in double precision. 1.0 x 2277 25 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the fetch cycle for the first instruction is cycle number 0. Assume that addiu and addi require the same number of states as an add instruction. add $t4, $0, $0 addi $t3, $0, 4 lw $t0, 0($t1) add $t4, $t4, $t0 addiu $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $0, loop loop: 25 (a) How many cycles does this code sequence take to execute to completion? 88 (the loop takes 20 cycles and executes 4 times). 25 (b) On what cycle is the add instruction in the body of the loop executed for the second time (remember the first cycle is cycle number 0!)? Cycle no. 35 for the EX cycle of the add instruction. 25 (c) Fill in the following values during the requested cycle (remember the first cycle is cycle number 0!). Cycle 13 27 ALUSrcB ALUOp RegDst PCWrite MemToReg PCSource 01 00 00 01 0 0 1 0 0 0 00 01 The first row corresponds to the fetch cycle of the add instruction. The second row corresponds to third cycle of the branch instruction. 26 (a) Show the single precision normalized floating point representations of the following numbers. Use the 754 standard and present your solution in hexadecimal notation. -0.125 0xbe000000 ___________________________________________________________ 0x41208000 ___________________________________________________________ 10.03125 ___________________________________________________________ 0.0 26 (b) How would you interpret the following hexadecimal number assuming that it is a single precision IEEE 754 floating point number: 0x00140000? It is a denormalized number since the exponent is 0, but the significand is non-zero. 26 (c) Assuming IEEE 754 rules and conventions, what is the result of the following operation in hexadecimal notation. • 1.101 x 2142 X 1.01 x 2122 (note that the exponent values are represented in base 10). 1.000001 x 2138 26 (d) Assuming that we have 6 significant digits (including the implicit bit). What is the value of the guard, round and sticky bits in the course of the following computation? Repeat for 4 significant digits. 1.001 x 2121 + 1.1 x 2128 After adjusting the expoent value of the smaller number have 1.1000000000 x 2128 + 0.0000001001 x 2128 ------------------1.1000001001 x 2128 With 6 significant digits we have the guard, round and sticky bits as 011 With 4 significant digits we have 001. Note the sticky bit is set if here are non-zero bits to the right of the guard and round bits. 27 For the following questions, consider the single cycle datapath shown in Figure 5.24 on page 314. The value of loop is 0x00400004, the contents of register $t0, $t1, and $t2 are 0x01000033, 0x00004008, and 0x000000c0, respectively, and the contents of register $1 is 0x10010008 27 (a) Fill in the values of control signals in the following table for each instruction. Instruction jal loop bne $t1, $t2, loop ALUSrc ALUOp RegDst MemToReg x 0 x 01 10 X 10 X Write Data Input to Data Memory contents of $16 0x000000c0 The jal instruction requires some additions to the datapath. - We need $31 as a hardwired input address to the RegDst Mux. RegDst is now a 2 bit control signal - We need PC+4 as an input to the MemToReg Mux since it is now a candidate for writing into the register file (into $31). The MemToReg signal is now a 2 bit control signal However, the register file does utilize bits of the encoded instruction as register addresses for rs and rt. The input to the write data port of memory is the contents of register rt. Therefore encode the jal instuction to find the value of rt. It will be 16. 27 (b) Your are designer and wish to reduce the number of control signals in this datapath. Provide one hardware modification that will eliminate the MemToReg control signal. Use the complement of RegDst as the value of this signal. This is not the only solution but one which is readily apparent from the truth table. 27 (c) This question deals with the changes to the datapath required to implement the addition of the addi instruction. Full credit requires you to be specific. • what are the structural changes, if any, that must be made to the datapath none are required. We have the connections necessary to add an immediate operand to the contents of a register. The issue is simply one of control. • Show the changes required to the main controller truth table and the PLA implementation of this controller. RegDst AluSrc MemToReg RegWrite MemRead MemWrite Br AluOp 0 1 0 1 0 0 0 00 The values of AluSrc and RegWrite become modified to include another OR term. Note the change in these two columns on the truth table. Also we have an additional gate on the input to recognize the addi instruction. 28 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the fetch cycle for the first instruction is cycle number 0. Assume that li and addi require the same number of states as an add instruction, that jal and slt require the same number of states as a bne instruction. main: loop: .text li $t0, 2 add $t2, $zero, $zero li $a0, 1 jal solo addi $t0, $t0, -1 addi $a0, $a0, 1 add $t2, $t2, $v0 slt $t1, $t0, $zero bne $t1, $zero, loop 28 (a) How many cycles does this code sequence take to execute to completion? 33 28 (b) On what cycle is the slt instruction in the body of the loop fetched for the first time (remember the first cycle is cycle number 0!)? 27 28 (c) Fill in the following values during the requested cycle (remember the first cycle is cycle number 0!). Cycle 13 26 ALUSrcB 11 00 IorD 0 0 PCWrite 0 0 RegDst 0 1 MemToReg 0 0 PCSource 00 00 The first row corresponds to a decode cycle. The second row corresponds to a write back cycle. 29 (a) Show the hexadecimal value of the double precision normalized floating point representations of the following numbers. Use the IEEE 754 standard. Infinity 0x7ff0000000000000________ -1.0 _0xbff0000000000000_______ 29 (b) Assuming a total of 4 significant digits including in the implicit bit, is there any inaccuracy in the following IEEE 754 computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the magnitude of the error if any. (1.01 x 2125) X (1.001 x 2129) 1.001000 x 1.01 --------------1.01101 x 2125+129-127=127 Rounding down gives us 1.011 x 2127 There is an error and the magnitude of the error is 0.00001 x 2127 29 (c) When using Booth’s algorithm, does it matter which of the operands are the mutliplier? Justify your answer. The number of addition operations depends on the distribution of 1’s in the multiplier. We can pick the operand that will produce the minimum number of addition operations. If shifts are faster than additions, we can speed up the operaton with this optimization. 29 (d) Consider the operation 58 * 4 using version 3 of the multiplication algorithm in chapter 4 (not Booth’s algorithm). Recall that this is the most efficient version of this algorithm. What are the initial and final contents of the multiplicand and product registers? Assume that the multiplier and multiplicand have the same number of bits. Initial Final multiplicand 000100 000100 Product 000000111010 000011101000 30 Consider a base, multi-cycle SPIM machine operating at 200 Mhz, with the following program execution characteristics. Instruction class CPI Frequency R-Type + Immediate 4 45% lw 5 25% sw 4 20% beq 3 7% j 3 3% Each of the following parts are independent of each other. 30 (a) The designers decided to add a new addressing mode to help with array addressing where we often add a byte offset to a base address stored in a register. Therefore a sequence of instructions such as code block 1 below can be replaced by the instruction shown in code block 2. We have 40% of the lw instructions affected in this manner. Assuming that the cycle time does not change and the new instruction also takes 5 cycles, what is the speedup produced by the use of this new addressing mode? add $t2, $t1, $t0 lw $t3, 0($t2) becomes lwx $t2, $t1+$t0 code block 1 code block 2 CPI_old = 0.45*4 + 0.25*5 + 0.20*4 + 0.07*3 + 0.03*3 = 4.15 Ex_old = #Instr * 4.15 * clock_cycle_time With the optimization the number of instructions has been changed. If 40% of the lw instructions have been affected that means 0.4 * 0.25 = 0.1 or 10% of the total number of instructions have been affected. For each of these instructions we can eliminate one instruction by combining two instructions. The new total number of instructions is therefore (0.9 * #Instr). The eliminated instructions are all R-format instructions. Therefore the percentage of such instructions are reduced by 10%. CPI_new = 0.35*4 + 0.25*5 + 0.2*4 + 0.07*3 + 0.03*3)/0.9 = 4.17 Ex_new = (0.9 * #Instr) * 4.17 * clock_cycle_time Comparing the execution times we can see that this is a good trade-off. 30 (b) Is it possible to halve the execution time of programs by simply speeding up R-format and immediate instructions, i.e., realize a speedup of 2? Justify your answer. From Amdahl’s Law we have Speedup = 1/(s + ((1-s)/n)) where s is the fraction that is unaffected, and n is the factor by which this component can be sped up. Let us assume that R-format and immediate instructions are infinitely sped up, i.e., n = infinity. In this case we have Speedup = 1/(0.55) < 2. Therefore the answer is no. 30 (c) What is the maximum MIPS rating of the multicycle datapath? The fastest instruction takes 3 cycles, i.e., has a CPI of 3. This is the theoretically lowest CPI possible for any program (although it is unrealistic to have a program that only branches opr jumps!) MIPS Rating = (Clock Rate in MHZ)/CPI = 200/3 = 66.7 MIPS 31 We wish to make our datapath implement instructions such as those that are found in the Power PC. Specifically we are interested in adding an instruction that will increment a counter, and branch to a label if the counter value is 0: icb $t0, loop.This instructon operates just as a branch instruction would if the branch is taken. Asume that this instruction uses the sameformat as a branch instuction (with the rt field set to 0). Assume that the opcode for this new instruction is 101000. Explicitly state any assumptions you need to answer the following questions. 31 (a) State any additions/modifications required to the hardware data path of Figure 5.28 or describe how the instruction can be implemented with no physical modifications. The execution of icb requires the contents of the register to be decremented. We can provide this value of 1 as an input to the AluSrcB mux which makes the number of inputs 5, and therefore the AluSrcB mux requires a three bit control signal. The RegDst field must be exapanded to a 2 bit field since the rs register must now be written using rs as a destination address. Thus this instruction has a WB state. 31 (b) Show the modified state machine that will handle this instruction. Provide the values of relevant control signals in each additional state. Any control signals not specified will assumed to be X. from decode AluSrcA = 1 AluSrcB = 100 AluOp = 00 PCWriteCond PCSource = 10 to fetch RegWrite RegDst = 10 MemToReg = 0 31 (c) Show the additions to the microcode to implement this instruction. The existing microcoded implementation of the state machine before addition of this instruction is shown below. You need only show additions/modifications. Note that there is more than one approach to solving this problem. Soem solutions may require additions to the microinstruction format in your text. If so state them explicitly. Dispatch ROM 1 Dispatch ROM 2 Op Name Value Op Name Value 000000 R-format 0110 100011 lw 0011 000010 jmp 1001 101011 sw 0101 000100 beq 1000 100011 lw 0010 101011 sw 0010 101000 icb 1010 label ALU control src1 src2 fetch add PC 4 add PC extshft dispatch-1 add A extend dispatch-2 Mem1 Register Control LW2 Memory PCwrite Control Sequencing read PC ALU seq read ALU seq write MDR SW2 fetch write ALU Rformat1 Func A fetch B seq write ALU BEQ1 Subt A B JUMP1 ICB add A +1 write ALU fetch ALUOut-Cond fetch jmp addr fetch ALUOut-cond seq fetch 32 Consider the manner in which exceptions are accommodated in the SPIM datapath, and the manner in which the data path controller is modified to support exceptions. 32 (a) In order to support n exceptions, how many additional states should be added to the state machine? How many additional microinstructions should added? n 32 (b) Consider an illegal memory address exception. Show the changes to the state machine for this exception. Make any assumptions that you feel may be necessary. For each additional state clearly show the values of relevant control signals. Any signals left unspecified will be assumed to be X (don’t care). from - State 0 - State 3 - State 5 code for illegal memory address is 10 to fetch IntCause = 10 CauseWrite AluSrcA = 0 AluSrcB = 01 AluOp = 01 EPCWrite PCwrite PCSource = 11 The illegal memory address exception is detected in any state that performs a memory access. 32 (c) What is the minimum number of cycles from the occurrence of an exception to the time the first instruction of the exception handler is fetched? One. Exceptions are handled in one state in this datapath (ref. Figure 5.39 and Figure 5.40). The next state is the fetch state. 33 (a) Show the hexadecimal value of the double precision normalized floating point representations of the following numbers. Use the IEEE 754 standard. 0x7ff0000000000000 (more than one answer:significand must be nonzero) NaN _________________________________ -0.125 _0xbfc0000000000000______________ 33 (b) Assuming a total of 4 significant digits including in the implicit bit, is there any inaccuracy in the following IEEE 754 computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the magnitude of the error if any. (1.01 x 2125) + (1.001 x 2129) 0.000101 x 2129 1.001000 x 2129 --------------1.001101 x 2129 Rounding up gives us 1.01 x 2129 There is an error and the magnitude of the error is 0.000011 x 2129 33 (c) Using Booths algorithm and n bit multipliers and multiplicands, how many steps are required? n steps 33 (d) Consider the operation 18 * 8 using version 3 of the multiplication algorithm in chapter 4 (not Booth’s algorithm). Recall that this is the most efficient version of this algorithm. What are the initial and final contents of the multiplicand and product registers? Assume that the multiplier and multiplicand have the same number of bits. Initial Final Multiplicand 10010 10010 Product 0000001000 0010010000 Note that I have left out the sign bits for these numbers. The solution could also have been presented using 6 bits for the multiplicand and 11 bits for the product (10 bit result and the sign bit). 34 Consider a base, multi-cycle SPIM machine operating at 200 Mhz, with the following program execution characteristics. Instruction class CPI Frequency R-Type + Immediate 4 45% lw 5 25% sw 4 20% beq 3 7% j 3 3% The following parts are independent of each other. 34 (a) The designers decided to add a new addressing mode wherein registers can be automatically incremented by 4. Therefore a sequence such as code block 1 below can be replaced by the instruction shown in code block 2. We have 30% of the lw instructions affected in this manner. Assuming that the cycle time does not change and the new instruction also takes 5 cycles, what is the speedup produced by the use of this new addressing mode? lw $t0, 0($t1) addi $t1, $t1, 4 becomes lw $t0, ($t1)+ code block 1 code block 2 CPI_old = 0.45*4 + 0.25*5 + 0.20*4 + 0.07*3 + 0.03*3 = 4.15 Ex_old = #Instr * 4.15 * clock_cycle_time With the optimization the number of instructions has been changed. If 30% of the lw instructions have been affected that means 0.3 * 0.25 = 0.075 or 7.5% of the total number of instructions have been affected. For each of these instructions we can eliminate one instruction by combining two instructions. The new total number of instructions is therefore (0.925 * #Instr). The eliminated instructions are all immediate instructions. Therefore the percentage of such instructions are reduced by 7.5%. CPI_new = 0.375*4 + 0.25*5 + 0.2*4 + 0.07*3 + 0.03*3)/0.925 = 4.17 Ex_new = (0.925 * #Instr) * 4.17 * clock_cycle_time Comparing the execution times we can see that this is a good trade-off. 34 (b) New ALU implementation technology has made it possible to double the speed of the R-format and immediate instructions. What is overall the speedup that can be realized for programs with the instructions statistics shown in the table? From Amdahl’s Law we have Speedup = 1/(s + ((1-s)/n)) where s is the fraction that is unaffected, and n is the factor by which this component can be sped up. Plugging in the values from the table we have Speedup = 1/(0.55 + (0.45/2)) = 1.29 or 29% speedup. 34 (c) What is the maximum MIPS rating of the multicycle datapath? The fastest instruction takes 3 cycles, i.e., has a CPI of 3. This is the theoretically lowest CPI possible for any program (although it is unrealistic to have a program that only branches opr jumps!) MIPS Rating = (Clock Rate in MHZ)/CPI = 200/3 = 66.7 MIPS 35 We wish to make our datapath implement instructions such as those that are found in the Power PC. Specifically we are interested in adding the following instruction: lwrf $t0, $t1+$t2. As you may recall, this instruction uses the sum of the contents of registers $t1 and $t2 as addresses. Assume that this instruction uses R-format encodings where the shamt and func fields are ignored and an opcode 101000. Explicitly state any assumptions you need to answer the following questions. 35 (a) State any additions/modifications required to the hardware data path of Figure 5.39 or describe how the instruction can be implemented with no physical modifications. No changes are required. Simply add rs and rt to obtain the memory address. 35 (b) Show the modified state machine that will handle this instruction. Provide the values of relevant control signals in each additional state. Any control signals not specified will assumed to be X. from decode ALUSrcA = 1 ALUSrcB = 00 ALUOp = 00 to fetch MemRead IorD = 1 RegDst = 1 RegWrite MemToReg = 1 35 (c) Show the additions to the microcode to implement this instruction. The existing microcoded implementation of the state machine before addition of this instruction is shown below. You need only show additions/modifications. Note that there is more than one approach to solving this problem. If any changes are necessary to the microinstruction format, state them explicitly. Dispatch ROM 1 Dispatch ROM 2 Op Name Value Op Name Value 000000 R-format 0110 100011 lw 0011 000010 jmp 1001 101011 sw 0101 000100 beq 1000 100011 lw 0010 101011 sw 0010 101000 lw+ 1010 label ALU control src1 src2 fetch add PC 4 add PC extshft dispatch-1 add A extend dispatch-2 Mem1 Register Control LW2 Memory PCwrite Control Sequencing read PC ALU seq read ALU seq write MDR SW2 fetch write ALU Rformat1 Func A fetch B seq write ALU BEQ1 Subt A fetch B JUMP1 LWRF add A B ALUOut-Cond fetch jmp addr fetch seq read ALU write MDR-rd seq fetch 36 Consider the manner in which exceptions are accommodated in the SPIM datapath, and the manner in which the data path controller is modified to support exceptions. 36 (a) How are exceptions different from procedure calls? • Procedure calls perform state saving operations and make use of the stack Exceptions need not be concerned with state saving operations • Exceptions must encode the cause of the exception or use a vectored implementation to decode the exception • Exception logic stores the address of the offending instruction rather than the next instruction. • Procedure calls are normal changes in the flow of control rather than unexpected changes. • Exceptions can lead to unexpected program termination. 36 (b) Consider an illegal instruction exception. Show the changes to the state machine for this exception. Make any assumptions that you feel may be necessary. For each additional state clearly show the values of relevant control signals. Any signals left unspecified will be assumed to be X (don’t care). from decode code for illegal instruction is 10 to fetch IntCause = 0 CauseWrite AluSrcA = 0 AluSrcB = 01 AluOp = 01 EPCWrite PCwrite PCSource = 11 The illegal instruction exception can be detected in the decode state. A transition is made to the exception state where the code is recorded and the state machine moves to the fetch state to start fetching the first instruction of the exception handler. 36 (c) What is the minimum number of cycles from the occurrence of an exception to the time the first instruction of the exception handler is fetched? One. Exceptions are handled in one state in this datapath (ref. Figure 5.39 and Figure 5.40). The next state is the fetch state. 37 (a) Show the single precision normalized floating point representations of the following numbers. Use the 754 standard and present your solution in hexadecimal notation. -0.0625 0.0 ___________0xbd800000______________________ _________________0x00000000___________________ 37 (b) Assuming IEEE 754, what is the result of the following operations • 1.1101 x 2102 X 1.01 x 2172 (note that the exponent values are represented in base 10). 1.0010001 x 2148 37 (c) Consider a double precision floating point number (IEEE 754). As it turns out, the range of numbers that it can represent is not sufficient for your application. Keeping the total number of bits constant, how would you change the format of the number? The number of bits in the exponent determines the range. This number should be increased at the expense of the number of bits in the significand. 37 (d) What is the smallest positive number you can represent in IEEE 754 double precision format. Represent this number in scientific notation. 1.0 x 2-1022 38 How would you modify the single bit ALU to produce an overflow bit. Clearly state which of the 32 single bit ALUs you would modify and modify the circuit below to show how you would compute the overflow bit. op c_in a 0 binvert 1 b + 2 3 less result set c_out overflow Compute the EX-OR of the carry in to the ALU31 and the carry out from ALU 31. Use a single EX-OR gate. 39 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the fetch cycle for the first instruction is cycle 0. loop: add $t4, $0, $0 addi $t3, $0, 8 lw $t0, 0($t1) add $t4, $t4, $t0 addiu $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $0, loop 39 (a) Fill in the following values at the end of the requested cycle. The first row corresponds to the WB cycle for the load instruction. The second row corresponds to the decode cycle. Cycle ALUSrcB ALUOp RegDst PCWrite MemToReg PCSource 12 00 00 0 0 1 00 18 11 00 0 0 0 00 39 (b) On what cycle is the branch instruction fetched for the last time? 165. The loop executes 8 times. Each immediate instuctions takes 4 cycles. 40 (a) Show the hexadecimal value of the double precision normalized floating point representations of the following numbers. Use the IEEE 754 standard. ___0xbff0000000000000_______________________________________ -1.0 0.0625 ___0x3fb0000000000000_____________________________ 40 (b) Clearly state two advantages of using Booth’s algorithm for multiplication over the standard multiplication algorithm provided in the book. 1. Potentially reduces the number of addition operations performed depending on the value of the multiplier 2. Naturally handles two’s complement representations performing signed multiplication 40 (c) Assuming a total of 6 significant digits including the implicit bit, is there any inaccuracy in the following computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the magnitude of the error if any. (-1.11 x 2-2) + (1.01 x 23) The rounded value that is obtained in 1.0011 x 23. The difference between this value and the actual value (assuming a sufficient number of bits) is given by 0.0000001. The correct value is 1.0011001 x 23. 41 Consider a base, multi-cycle SPIM machine operating at 50 Mhz, with the following program execution characteristics. Instruction class CPI Frequency R-Type 4 40% lw 5 25% sw 4 25% beq 3 7% j 3 3% We can increase reduce the clock cycle time by 15% at the expense adding an additional cycle for data memory accesses. 41 (a) Is this a good tradeoff. Provide a clear and quantitative justification for your answer. CPIold = 0.4*4 + 0.25*5 + 0.25*4 + 0.07*3 + 0.03*3 = 4.15 Execution Timeold = I * 4.15 * clock_cycle CPInew = 0.4*4 + 0.25*6 + 0.25 * 5 + 0.07*3 + 0.03*3 = 4.65 Execution Timenew = I * 4.65 * 0.85* clock_cycle time Execution Timenew < Execution Timeold 41 (b) What is the minimum improvement in clock cycle time that is necessary for this approach to be profitable? x* 4.65 < 4.15 x < 4.15/4.65 = 0.892. 42 Consider the multicycle datapath. 42 (a) What is the minimum number of cycles for the execution of any instruction? 3 42 (b) For a datapath with a N state controller, what is the size of the microcode store (in number of microinstructions)? 2 log N 42 (c) What is it that determines the duration of the clock cycle time? The state (or cycle) in the datapath that has the longest execution path or delay. 43 (a) Show the hexadecimal value of the double precision normalized floating point representations of the following numbers. Use the IEEE 754 standard. -1.5 0xbff800000000000 0.03125 0x3fa0000000000000 43 (b) Using Booths algorithm and n bit multipliers and multiplicands, what is the range of numbers that can be accurately multiplied? -2n-1 to 2n-1-1 43 (c) Assuming a total of 6 significant digits including the implicit bit, is there any inaccuracy in the following computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the magnitude of the error if any. (-1.1 x 2-1) + (1.001 x 24) Correct answer: 1.000101 x 24 Rounded answer: 1.00011 x 24 Error: 0.000001 44 (20 pts) Consider a base, multi-cycle SPIM machine operating at 50 Mhz, with the following program execution characteristics. Instruction class CPI Frequency R-Type 4 45% lw 5 25% sw 4 20% beq 3 7% j 3 3% 44 (a) With clever algorithms for register allocation we can reduce the number of lw and sw instructions by 10%. What is the improvement in execution time? CPIold = 4*0.45 + 5*0.25 + 4*0.2 + 3*0.07 + 3*0.03 = 4.15 CPInew = (0.45*4 + 5*0.25*0.9 + 4*0.2*0.9 + 3*0.07 + 3*0.03)/I” = 3.945/I” I” = 0.45 + 0.25*0.9 + 0.2*0.9 + 0.07 + 0.03 = 0.955 CPInew = 3.945/0.955 = 4.13 Execution Time Old = I * 4.15 * clock_cycle_time Execution Time New = 0.955* I * 4.13 * clock_cycle_time Compare! 44 (b) Would a 10% reduction in clock cycle time justify a single cycle increase in the number of cycles for memory accesses? CPInew = (0.45*4 + 6*0.25 + 5*0.2 + 3*0.07 + 3*0.03) = 4.6 Execution Time New = I * 4.6 * 0.9 * clock_cycle_time = I * 4.14 * clock_cycle_time Barely better 45 (a) Show the single precision normalized floating point representations of the following numbers. Use the 754 standard and present your solution in hexadecimal notation. 0.375 -1.5 ___0x3ec00000______________________________________________ ___0xbfc00000_____________________________________________ 45 (b) Assuming IEEE 754, what is the result of the following operations • 1.1101 x 282 X 1.01 x 2122 (note that the exponent values are represented in base 10). 1.0010001 x 278 (do not forget to substract the extra 127 bias!) • 1.001 x 294 + 1.1 x 297 (note that the exponent values are represented in base 10). 1.101001 x 297 46 Consider a base, multi-cycle SPIM machine operating at 100 Mhz, with the following program execution characteristics. Compiler optimizations reduce the number of instructions in each class to the amount shown. Instruction class CPI Frequency Compiler Opt R-Type 4 45% 95% lw 5 30% 90% sw 4 20% 95% beq 3 5% 75% 46 (a) What is the base CPI with and without compiler optimizations? Without Opt = 0.45*4 + 0.3*5 + 0.2*4 + 0.05*3 = 4.25 With Optimization = Cycles/#instr =(0.45*4*0.95+0.3*5*0.9+0.2*4*0.95+0.05*3*0.75)/(0.45*0.95+0.3*0.9+0.2*0.95+0.05*0.75) = CPInew 46 (b) What is the speedup due to compiler optimizations? Speedup = Told/Tnew = (I * 4.25 * cycle_time)/(I” * CPInew* cycle_time) where I” = (0.45*0.95+0.3*0.9+0.2*0.95+0.05*0.75) ( from above: this is the new number of instructions) 47 Consider the annotated multicycle datapath shown in your text. Fill in the fill in the following table for the sequence of instructions shown below. Each entry should contain the value of the requested signal at the end of the requested cycle. The initial contents of registers $1, $2 and $3 are 0x0000088, 0x0000022 and 0x00000033 respectively. The address of Loop is 0x00400004. PC Instruction Cycle ALUOutput Value at the Output of the ALU 0x0040000c beq $1, $2, Loop 3 0x00400004 0x00000066 0x00000022 01 0x00400010 sw $1, 4($2) 4 0x00000026 0x00000026 0x00000088 00 Rt ALUOp 48 (a) Show the double precision normalized floating point representations of the following numbers. Use the 754 standard and present your solution in hexadecimal notation. -0.1875 0.0 _____oxbfc0000000000000___________________________ ___________________________________________________________ 48 (b) Assuming IEEE 754 rules and conventions, what is the result of the following operations. Assume 5 significant digits including the implicit bit. • 1.11 x 212 x 1.001 x 239. 212 x 239. = 251-127 = 2-76 Remember that the exponent is biased and therefore we have to substract 127 from the sum of the exponents. The fact that the result is negative indicates that the result is underflow. • 1.0001 x 2133 + 1.0101 x 2139 1.0001 x 2133 + 1.0101 x 2139 = 0.0000010001 x 2139 + 1.0101 x 2139 = 1.0101 x 2139 (the result is rounded down according to the rules for 754). 48 (c) Provide an example of a number that is too small to be represented with single precision notation but can be represented with double precision. 1.0 x 2-130 This number is just smaller than the smallest number that can be represented in single preceision numbers. 49 For the following questions, consider the single cycle datapath shown in the text. The value of loop is 0x00400004, the contents of register $t1, $t2, and $t3 are 0x01000033, 0x00004008, and 0x000000c0, respectively, and the contents of register $1 is 0x10010008 49 (a) Fill in the values of control signals in the following table for each instruction. Instruction lw $t3, 4($t2) ALUSrc 1 ALUOp 00 RegDst 0 MemToReg 1 Write Data Input to Data Memory 0x000000c0 49 (b) You are designer and wish to reduce the number of control signals that the controller has to generate. Provide one hardware modification that will eliminate the RegDst control signal (but not the mux). Note that the RegsDst signal is the complement of the MemToReg signal. This the RegDst mux can be controlled by the MemToReg signal passed through an inverter. 50 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the fetch cycle for the first instruction is cycle number 0. Assume that la, li and addi require the same number of states as an add instruction. .data .word 2,3,4,5 .space 16 .text L1: L2: la $t8, L1 la $t9, L2 li $t7, 4 lw $t2, 0($t8) sw $t2, 0($t9) addi $t8, $t8, 4 addi $t9, $t9, 4 addi $t7, $t7, -1 bne $t7, $0, move move: end: 50 (a) What would be the CPI and MIPS rating for a 266 MHz machine for the program shown above? The loop executes four times. Each time through the loop the instructions take a total of 24 cycles. The first three instructions take 12 cycles. The total number of cycles is 108. The number of instructions in the loop are 6. Therefore the total number of instructions executed is 6*4+3 = 27. The CPI = 108/27 = 4.0. MIPS rating = 266/CPI 50 (b) Fill in the following values during the requested cycle (remember the first cycle is cycle number 0!). Cycle # 17 40 ALUSrcB 01 XX IorD 0 X PCWrite 1 0 RegDst X 0 MemToReg X 1 PCSource 00 XX The first row corresponds to the fetch cycle for the sw instruction. The second row corresponds to the write back cycle for the lw instuction executed for the second time through the loop. 50 (c) On the figure of the multicycle datapath attached overleaf, show any modifications that would be required to add the jr instruction. Below show the modifications to the state machine how any additional states would fit in the state machine. For each additional state that you have added, show the values of the relevant control signals. Any signals that you omit from the description of a state are assumed to be deasserted. According to the format definition for the jr instruction, the rs field identifies the register. This value will be available in the A register at the end of the decode cycle. Add another signal from the A register to the PCSource mux. The value of PCSource of 11 will now select the contents of the A register to be written to the PC. This writeback can be handled in one extra state after decode with PCWrite = 1 and PCSource = 11. IF ID jr PCWrite = 1 PCSource = 11 51 Consider the execution of the following block of SPIM code on a multicycle datapath. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume immediate instructions take 4 cycles. start: str: main: move: end: done: .data .word 21, 22, 0x44, 0x12 .asciiz “Done ” .align 4 .byte 0x88, 0x32 .text .globl main li $t3, 2 lui $t0, 0x1001 lui $t1, 0x1001 addi $t1, $t1, 8 lw $t5, 0($t0) lw $t6, 0($t1) add $t7, $t5, $t6 addi $t0, $t0, 4 addi $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $zero, move 51 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle of program execution is cycle number 0! Cycle ALUSrcB MemToReg RegWrite Branch Address RegDst 12 01 X 0 0x00400030 X 28 00 X 0 0x0041e05c X 51 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents. Data Segment Addresses Data Segment Contents 0x10010000 0x00000015 0x10010004 0x00000016 0x10010008 0x00000044 0x1001000c 0x00000012 0x10010010 0x656e6f44 0x10010014 0x00000020 0x10010018 0x00000000 0x1001001c 0x00000000 0x10010020 0x00003288 0x10010024 51 (c) Show the SPIM program to implement a for-loop that executes 12 times. . loop: li $t1, 12 # loop body # loop body .. .. addi $t1, $t1, -1 bne $t1, $zero, loop 51 (d) What are the values of move, str and end? move end str 0x00400010 0x00400028 0x10010010 51 (e) What does the following bit pattern represent assuming that it is one of the following. 00111100 00001001 01000000 00000000. Provide your answer in the form indicated. A single precision IEEE 754 floating point number (show actual value in scientific notation) ____1.000100101 x 2-7_______________ A MIPS instruction (show in symbolic code) __lui $t1, 0x4000________ 52 (a) Show the hexadecimal value of the single precision normalized floating point representations of the following numbers. Use the IEEE 754 standard. 0x7f800000 Infinity a denormalized number __0x00011000_________ 52 (b) Assuming a total of 4 significant digits including in the implicit bit, is there any inaccuracy in the following IEEE 754 computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the magnitude of the error if any. (1.11 x 2114) X (1.01 x 2119) The product is 1.00011 x 2107 With only 4 significant digits and the guard and round bit being 11, we round up to give 1.001 x 2107 The maginitude of the error is 0.00001 x 2107 52 (c) Consider the operation 17 * 4 using version 3 of the multiplication algorithm in chapter 4 (not Booth’s algorithm). Recall that this is the most efficient version of this algorithm. What are the initial and final contents of the multiplicand and product registers? Assume that the multiplier and multiplicand have the same number of bits. Use the minimum number of bits as determined by the above operands. Initial Final Multiplicand 010001 010001 Product 000000000100 000001000100 52 (d) Compare the critical path delays for carry lookahead logic and ripple carry logic for a 32 bit ALU. Ripple carry delay through single bit ALU is 2 gate delays. Delay through a single node in the carry lookahead tree is also 2 gate delays. Ripple carry = 32 * 2 = 64 gate delays. Carry lookahead = 1 (for generate and propagate signal generation) + 5 *2 (up the tree) + 4 * 2 (down the tree) + 2 (through the last ALU) = 21 gate delays 53 Consider a base, multi-cycle SPIM machine operating at 300 Mhz, with the following program execution characteristics. Instruction class CPI Frequency R-Type or Immediate 4 40% lw 5 25% sw 4 20% bne 3 10% j 3 5% Questions 48(a), 48(b), and 48(c) are independent of each other. 53 (a) The designers propose adding a new branch instruction, decrement-and-branch-if-zero: dcb $t0, loop. This instructions decrements the register $t0. If the resulting value is equal to 0 then the program branches to the instruction at label loop. Therefore a sequence of instructions such as code block 1 below can be replaced by the instruction shown in code block 2. We have 35% of the branch instructions affected in this manner. Assuming that the cycle time does not change and the new instruction takes 4 cycles, is this a good idea? addi $t2, $t1, -1 bne $t2, $0, loop code block 1 becomes dcb $t2, loop code block 2 CPI_old = 0.4*4 + 0.25*5 + 0.20*4 + 0.10*3 + 0.05*3 = 4.10 Ex_old = #Instr * 4.1 * clock_cycle_time With the optimization the number of instructions has been changed. If 35% of the branch instructions have been affected that means 0.35 * 0.1 = 0.035 or 3.5% of the total number of instructions have been affected. For each of these instructions we can eliminate one instruction by combining two instructions. The new total number of instructions is therefore (0.965 * #Instr). The eliminated instructions are all immediate instructions. Therefore the percentage of such instructions are reduced by 3.5%. However they are replaced by instructions taking 4 cycles. Therefore the net effect is the reduction of the number of branch instructions by 35% (or to 65% of the original). CPI_new = 0.365*4 + 0.25*5 + 0.2*4 + 0.1*3*0.65 + 0.05*3 + 4*0.035)/0.965 = 4.1 Ex_new = (0.965 * #Instr) * 4.1* clock_cycle_time Comparing the execution times we can see that this is a good trade-off. 53 (b) If improved register allocation techniques in the compiler can reduce the number of lw and sw instructions by 10% what is the improvement in the MIPS rating? Current MIPS rating is 300/4.1 = 73.17 MIPS With the reduced number of instructions the new CPI is given by (0.4*4 + 0.225*5 + 0.18*4 + 0.1*3 + 0.05*3)/I’ where I’ is given by (0.4+ 0.9*0.25 + 0.9*0.2 + 0.1 + 0.05) The new MIPS rating is 300/CPInew The improvement is given by their differences. 53 (c) Suppose in a typical program executable, 20% of the instructions is from system libraries and implementing operating system related functionality.What is the maximum speedup that can be obtained by speeding up the application code sequences? This goverened by Amdahl’s Law. We the maximum speedup as 1/0.2 = 5. 54 We wish to have our multicycle datapath implement the slti $t0, $t1, 0x4000 instruction. Explicitly state any assumptions you need to answer the following questions. 54 (a) Either state any additions/modifications required to the hardware data path of Figure 5.33, or describe how the instruction can be implemented with no physical modifications. We can simply have the ALU perform a slt than operation on the inputs and select the ALU inputs accordingly. The key problem is getting the ALU controller to issue a slt opcode to the ALU (111). This is the main hardware modification that is required. Suppose we use the unused ALUOp value of 11 to signifiy this. Note that the truth table for the ALU controller must be reimplemented since we do not have don’t cares in one of the ALUOp bit fields any more. 54 (b) Show the modified state machine that will handle this instruction. Provide the values of relevant control signals in each additional state. Any control signals not specified will assumed to be X. from decode to fetch ALUSrcA = 1 ALUSrcB = 10 ALUOp = 11 RegDst = 0 RegWrite MemToReg = 0 54 (c) Show the additions to the microcode to implement this instruction. The existing microcoded implementation of the state machine before addition of this instruction is shown below. You need only show additions/modifications. Note that there is more than one approach to solving this problem. Some solutions may require additions to the microinstruction format in the text. If so state them explicitly. Dispatch ROM 1 Dispatch ROM 2 Op Name Value Op Name Value 000000 R-format 0110 100011 lw 0011 000010 jmp 1001 101011 sw 0101 000100 beq 1000 100011 lw 0010 101011 sw 0010 001010 slti 1010 label ALU control src1 src2 fetch add PC 4 add PC extshft dispatch-1 add A extend dispatch-2 Mem1 Register Control LW2 Memory PCwrite Control Sequencing read PC ALU seq read ALU seq write MDR SW2 fetch write ALU Rformat1 Func A fetch B seq write ALU BEQ1 Subt A B JUMP1 SLTI SLTOP A Extend fetch ALUOut-Cond fetch jmp addr fetch seq Write ALU-rt fetch 55 Consider the manner in which exceptions are accommodated in the SPIM datapath, and the manner in which the data path controller is modified to support exceptions. You certainly do not want your programs to jump to addresses in the data segment, or for that matter let us say to any address greater than 0x10010000. If j instructions generate such an illegal address you want to generate an exception.This questions addresses the implementation of such an exception in the SPIM datapath. 55 (a) How can you test for the occurrence of this exception? The occurrence of the exception is detected by using an ALU to compare the jump address with 0x10010000 and generating a single bit execption single to denote the result of this comparison. 55 (b) Describe the hardware modifications required to control to support such an exception. From state 1001 we now make a transition to the exception state. To make this transition we must modify the next state logic of the state machine implementation. Currentlky the next state after state 1001 is state 0000. This must now be modified to move to either state 0000 (if no exception) or state 1010 (exception state) of an exception occured. You can use a table with these two alternatives just as we did for the next state logic out of state 0010. The input to the table is a single bit signal denoting whether the exception occured or not (from 5(a)). 55 (c) Show the required modifications to the state machine to support such an exception.All control signals left unspecified are assumed to be X. \ from jump code for illegal address is 10 IntCause = 10 CauseWrite AluSrcA = 0 AluSrcB = 01 AluOp = 01 EPCWrite PCwrite PCSource = 11 to fetch