Computer Architecture: ECE 3055 Sample Problems and Solutions

Transcription

Computer Architecture: ECE 3055
Sample Problems and Solutions
1 Consider the execution of the following block of SPIM code. The text segment starts at
0x00400000 and that the data segment starts at 0x10010000.
.data
start:
.word 0x32, 44, Top, 0x33
str:
.asciiz “Top”
.align 8
end:
.space 16
.text
.globl main
main:
Top:
exit:
li $t3, 4
la $t0, start
la $t1, end
lw $t5, 0($t0)
sw $t5, 0($t1)
addi $t0, $t0, 4
addi $t1, $t1, 4
addi $t3, $t3, -1
bne $t3, $zero, Top
li $v0
syscall
1 (a) Show the addresses and corresponding contents of the locations in the data segment that
are loaded by the above SPIM code. Clearly associate addresses and corresponding contents.
Data Segment Addresses
Data Segment Contents
0x10010000
0x00000032
0x10010004
0x0000002c
0x10010008
0x00400010
0x1001000c
0x00000033
0x10010010
0x00706f54
0x10010014
0x00000000
0x10010018
0x00000000
0x1001001c
0x00000000
0x10010020
0x00000000
0x10010024
0x00000000
The value in 0x10010008 depends on how the preceding instructions are
emcoded. “la $t1, end” is encoded into two instructions while “la $t0, start” is
encoded into one instruction. As long as your assumptions are consistent you
shouldnot lose points. Everything else is 0x00000000 upto and beyond
0x10010100 (this is the point of the .align 8 directive)
1 (b) What are the values of the following.
exit
0x00400028
Top
0x00400010
addi $t3, $t3, -1 (encoding)
0x216bffff
1 (c) Provide a set of data directives that will perform the memory allocation implied by the
following high level language declarations.
• character My_Char;
integer Limit;
.data
.byte 0x00
.align 2
.word 0x0
• integer array[10];
.data
.space 40
** Note there are many solutions *
1 (d) Given the following sequences, what is the range of feasible addresses for the label target on the MIPS architecture?
start:
.
.
.
target:
beq$1, $2, target
add$4, $5, $6
sub$6, $4, $8
start + 8 <= target =< start+4 +215-1 (words)
1 (e) What i s the difference between the j and jal instructions?
Both instructions work identically with regard to the program control flow, i.e.,
which instruction is executed next and how he address fields of the instruction are
endcoded. The jal instruction also causes the value of PC+4 to be stored in register $31.
2 The following is block of SPIM code to compute the sum of the elements of a diagonal in a
matrix stored as shown. For the matrix shown below the sum in $t2 should end up being
1+12+23+34 = 70. Fill in any missing pieces of code to complete the program. Use any
additional registers as necessary to keep track of the number of loop iterations and any
other information you require. You may complete your code on the next page as necessary.
If does not have to fit on the spaces shown. You cannot change the loop body. Remember
there are many distinct, correct, feasible solutions!
.data
row1:
row1:
row1:
row1:
.word 1,2,3,4
.word 11,12,13,14
.word 21,22,23,24
.word 31,32,33,34
.text
la $t0, row1
li $t6, 20 # load offset to next diagonal element (16+4)
li $t7, 4 # load counter for number of elements
loop:
lw $t1, 0($t0)
add $t2, $t2, $t1
#loop body that performs the sum
add $t0, $t0, $t6
addi $t7, $t7, -1
bne $t7, $0, loop # check if all elements have been accessed
li $v0, 10 # exit the program
syscall
3 Consider the following MIPs code taken from some program. We wish to turn this piece of
code into a procedure. Note that no procedure call conventions have been followed and the
procedure must be re-written to follow the MIPs procedure call conventions.
procA: lw $12, 0($8)
sw $12, 0($15)
addi $8, $8, 4
addi $15, $15, 4
addi $16, $16, -1
bne $16, $0, procA
jr $31
3 (a) What registers will have to be saved on the stack by the procedure if the MIPS register
procedure call convention is to be followed and you do not modify the code shown above?
$s0, $fp (assuming we use the
frame pointer). $ra does not have
to saved since no other procedure/
function is called.
3 (b) Which registers are used to pass parameters to this procedure? How many registers do
you think will be required for parameter passing ?
The values of $8, $15, and $16 would be required in
the procedure and are passed using registers $a0, $a1,
and $a2.
3 (c) If the procedure is stored in memory starting at location 0x00400020, what will be the
encoding of the jal procA instruction that is in the calling program?
0x0c100008
4 Consider the execution of the following block of SPIM code on a multicycle datapath. The
text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume
immediate instructions take 4 cycles.
start:
str:
main:
move:
end:
done:
.data
.word 21, 22, 23,24
.asciiz “CmpE”
.align 4
.word 24, 0x77
.text
.globl main
li $t3, 4
lui $t0, 0x1001
lui $t1, 0x1002
lw $t5, 0($t0)
sw $t5, 0($t1)
addiu $t0, $t0, 4
addiu $t1, $t1, 4
addi $t3, $t3, -1
bne $t3, $zero, move
4 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle
of program execution is cycle number 0!
Cycle
ALUSrcA
ALUOp
RegWrite
Instruction Register
IorD
14
1
00
0
0x8d0d0000
X
35
1
01
0
0x1560fffb
X
4 (b) Show the addresses and corresponding contents of the locations in the data segment that
are loaded by the above SPIM code. Clearly associate addresses and corresponding contents.
0x10010000
0x00000015
0x10010004
0x00000016
0x10010008
0x00000017
0x1001000c
0x00000018
0x10010010
0x45706d43
0x10010014
0x00000000
0x10010018
0x00000000
0x1001001c
0x00000000
0x10010020
0x00000018
0x10010024
0x00000077
4 (c) Show a sequence of SPIM instructions that will branch to the label loop if the contents
of register $t0 is 13.
addi $t1, $t0, -13
beq $t1, $zero, loop
4 (d) What are the values of end, done, start, and main?
start: 0x10010000
main: 0x00400000
end: 0x00400020
done: 0x00400024
4 (e) What does the following bit pattern represent assuming that it is one of the following.
10101101 00101101 00000000 00000000. Provide your answer in the form indicated.
A single precision IEEE 754 floating point number
(show actual value in scientific notation)
____-1.0101101 x 2-37_________
A MIPS instruction (show in symbolic code)
_sw $t5, 0($t1)__________
start:
str:
main:
move:
end:
done:
.data
.word 21, 22, 23,24
.asciiz “CmpE”
.align 4
.word 24, 0x77
.text
.globl main
li $t3, 4
lui $t0, 0x1001
lui $t1, 0x1002
lw $t5, 0($t0)
sw $t5, 0($t1)
addiu $t0, $t0, 4
addiu $t1, $t1, 4
addi $t3, $t3, -1
Cycle
ALUSrcB
ALUOp
RegDst
Instruction Register
PCWriteCond
9
11
00
X
0x3c091002
0
20
00
00
0
0xad2d0000
0
5 (b) Show a sequence of SPIM instructions to implement the following: x = a[i*4], where
a[i] is the access of element i from array a[]. Use whatever register allocation you need.
.text
add $t5, $t0, $t0 # $t0 contains the value of i
add $t5, $t5, $t5 # two additions to compute i*4
add $t1, $t5, $t3 # base address contained in $t3
lw $t2, 0($t1) # value of x stored in $t2
5 (c) What does the following bit pattern represent assuming that it is one of the following.
____1.00011010001 x 2-7_________
_lui $t5, 0x1000__________
6 Answer the following questions with respect to the MIPS program shown below. Assume
that the data segment starts at 0x10010000 and the text segment starts at 0x00400000.
.data
L1:
.word 0x32, 104
.asciiz “Test 1”
.align 2
Blank:
.word
L2:
.space 64
main:
loop:
.text
li $t0, 4
move $t2, $zero
li $a0, 1
jal solo
addi $t0, $t0, -1
addi $a0, $a0, 1
add $t2, $t2, $v0
slt $t1, $t0, $zero
bne $t1, $zero, loop
li $v0, 10
syscall
6 (a) How much space in bytes does the program occupy in the data segment and text segment?
Data segment (21 words, 84 bytes) - remember must also count reserved space
Text segment (11 words, 44 bytes)
6 (b) With respect to the above program,
• what are the values of the following?
L2
_____0x10010014____________
loop
______0x0040000c___________
memory location 0x1001000c ______0x00003120__________
• What information would be stored in the symbol table for the above program? Be specific
about the above program. Do not provide a generic answer.
Upon completion of compilation, the label solo and its address (relative to
the start of this module). This is information to be used by the linker to
find and link the module corresponding to solo.
6 (c) What is the encoding of the following instructions?
____0x1520fffa____________
add $t0, $t0, $v0
_____0x01024020__________
6 (d) Which of the instructions in the above program, if any, must be identified as requiring
relocation information ?
The jal instruction since this is the only instruction that relies on absolute
addresses.
6 (e) Suppose the procedure solo is independently compiled as a separate module. When it is
linked with the main program the first instruction in solo is placed in memory at address
0x0040100c. Provide the hexadecimal encoding of the jal solo instruction.
0x0c100403
7 Consider the program shown on the previous question. You have probably noticed that it
does not follow the MIPS register saving convention. Rewrite the code showing any
changes that would be required by the MIPS register saving conventions.
.data
L1:
.word 0x32, 104
.asciiz “Test 1”
.align 2
Blank:
.word
L2:
.space 64
main:
loop:
.text
li $t0, 4
move $t2, $zero
li $a0, 1
subu $sp, $sp, 32
sw $t0, 0($sp)
sw $t2, 8($sp)
sw $a0, 12($sp)
sw $t1, 16($sp)
jal solo
lw $t0, 0($sp)
lw $t2, 8($sp)
lw $t1, 16($sp)
lw $a0, 12($sp)
addu $sp, $sp, 32
addi $t0, $t0, -1
addi $a0, $a0, 1
add $t2, $t2, $v0
slt $t1, $t0, $zero
li $v0, 10
syscall
7 (a) What is the difference between the jal and j instructions?
The execution of the jal instruction also causes the return address to
be saved in $ra. The target address is computed in the same manner
for both instructions.
8 Answer the following questions with respect to the MIPS program shown below. For the
purpose of this test, assume that each instruction can be stored in one word, i.e., do
not worry about pseudo instructions! Further, assume the data segment starts at
0x1001000 and that the text segment starts are 0x04000000.
L2:
.data
.word 24, 0x16
.byte, 77,66,55,44
.text
move $t0,$0
loop: mul $t8, $t0, 4
add $t1, $a0, $t8
sw $0, 0($t1)
addi $t0, $t0, 1
slt $t7, $t0, $a1
bne $t7, $0, loop
8 (a) What are the values of the labels L2 and Loop?
L2 __0x10010008__________
loop ___0x04000004___________
8 (b) Provide the hexadecimal encodings of the following instructions?
bne $t7, $0, loop
0x15e0fffa
add $t1, $a0, $t8
0x00984820
8 (c) Show the contents of the data segment. Show both addresses and values at those
addresses.
0x10010000
0x18
0x10010004
0x16
0x10010008
0x2c37424d
8 (d) The instruction BLT (Branch on less than or equal too) is not a native instruction. Show
the SPIM code that could be used to implement this test.
ble $t2, $t3, loop
slt $t1, $t3, $t2
beq $t1, $0 loop
9 (20 pts) This question deals with procedure calls. Consider the following SPIM program
segment. For the purpose of this test, assume that each instruction can be stored in one
word, i.e., do not worry about pseudo instructions! Further, assume the data segment
starts at 0x1001000 and that the text segment starts are 0x04000000.
.data
first: .word 0x33, 0x2
.byte 4, 3, 0, 0
str:
.ascii”Final”
.text
main: li $t0, 4
move $a0, $t0
loop jal func
move $a0, $v0
slt $a0, $0, $t9
bne $t0, $0,loop
li $v0, 10
syscall
func: addi $a0, $a0, -1
move $v0, $a0
jr $ra
9 (a) Consider the first time the procedure func is called. What is the contents of $ra when the
program is executing func.
0x0400000c
9 (b) Based on the MIPS register saving conventions which registers should be stored on the
stack in the calling stack frame when func is called.
$t0, $t9, $a0
Note in this case $a0 is stored as a matter of convention since
the calling procedure is not relying on the availavbility of the
value that $a0 contained before the call.
that the data segment starts at 0x10010000 and that the text segment starts at 0x00400000.
Assume that the instruction done is encoded in one word.
.data
label: .word 24, 28
.byte 64, 32
.asciiz “Example Program”
.text
main: jal push
jal pop
done
pop:
ret1:
lw $fp, 0($sp)
lw $ra, 4($sp)
addiu $sp, $sp, 32
jr $ra
push: subiu $sp, $sp, 32
sw $fp, 0($sp)
sw $ra, 4($sp)
ret2: jr $ra
10 (a) Write the values of the words stored at the following memory locations. Provide your
answer in hexadecimal notation.
Word Address
Value
_0x10010008________
___78452040____
_0x0040000c________
___8fbe0000____
_0x00400000________
___0c100007_________
10 (b) Consider the process of assembly of the above program.
• What would the symbol table contain?
All of the labels that the linker needs to know about: push and pop and main (for
programs where the loader creates code that jumps to main). We do not
need to store labels from the data segment.
• Assuming that the memory map is fixed, what relocation information would be
recorded if any?
The jal instructions generate absolute addresses and must have relocation
information stored to permit placement of the program starting elsewhere
in memory.
10 (c) What are the values of the following labels?
ret1 ____0x00400018___________
push ____0x0040001c___________
10 (d) Assume that the procedures push and pop were assembled in a distinct file and linked
with the main program. Further assume that the procedures were placed in memory starting
at location 0x00400040. Provide the encoding of the ‘jal pop’ instruction.
0x0c100010
11 (a) Provide the MIPS code to do the following.
• implement the greater-than-or-equal-to-zero test using only native instructions.
Example for a general test of $t0 greater-than-or-equal-to $t1
slt $t2, $t0, $t1
bne $t2, $0, False
j True
Where True and False are the labels of the instructions that
correspond to the blocks of code to be executed
depending on the outcome of the test.
• initialize a register to the value 0x33334444.
move $t0, $zero
lui $t0 0x3333
ori $t0, $t0, 0x4444
11 (b) Provide two advantages and two disadvantages of RISC vs. CISC machines.
From text
that the data segment starts at 0x10010000 and the text segment starts at 0x00400000.
.data
first: .word 0x21, 32
.byte 4, 3
.align 2
str:
.ascii”Test”
.text
main: li $4, 4
loop: jal func
slt $7, $0, $4
bne $7, $0,loop
end:
li $v0, 10
syscall
func: addi $4, $4, -1
jr $31
12 (a) What function does the last two instructions in the main program realize?
Program Exit. Ref SPIM documentation
12 (b) With respect to the above program,
• what are the values of the following symbols?
end __0x00400014_____________
loop __0x04000004_____________
func __0x04000018_____________
str __0x1001000c_____________
•
What is the total number of bytes required for this program, both data segment and
text segment storage?
data segment - 16
text segment - 32
12 (c) What is the encoding of the following instructions?
bne $7, $0, loop
__0x14e0fffd__________
jal func
___0x0c100006________
addi $4, $4, -1
___0x2084ffff_________
12 (d) Using only native instructions, how can you implement the following test: branch-ongreater-than-or-equal-to-5?
Example:
slt $t2, $t0, $t1
bne $t2, $0, False
j True
Where True and False are the labels of the instructions that
correspond to the blocks of code to be executed
depending on the outcome of the test.
13 Consider the case where Procedure A calls procedure B, and procedure B calls procedure C.
Procedures A and B both use registers $15, $16, $17 and $24, and procedure C uses registers $9, $10, and $11. The top of the stack initially points to the top of an active frame.
13 (a) Show the contents of the stack when Procedure C is executing. Assume the MIPS
register saving conventions as defined in the text. State, or clearly indicate the number of
stack frames and which registers are saved by each procedure.
7FFFFFFC
$sp
$15
$24
saved by A
$16
$17
stack grows
down
saved by B
$15
$24
C’s stack frame
Note: I have not shown the procedures saving $ra and $fp which should be
illustrated. Procedure C does not use any of the $sx registers and does not
call other procedures. Therefore C does not need to save any of the registers.
13 (b) Would callee save have been more efficient? Justify your answer.
No. Each procedure would have to save all the registers used in
the procedure. This is clearly less efficient, e.g., look at procedure C.
14 Consider the execution of the following block of SPIM code on a multicycle datapath.
Assume that the text segment starts at 0x00400000 and that the data segment starts at
0x10010000. Assume that registers $7, $8 and $9 have initial values 16, 0x10010020, and
0x10020020 respectively.
str:
move:
end:
.data
.asciiz “Start”
.align 2
.word 24, 0x6666
.text
lw $12, 0($8)
sw $12, 0($9)
addiu $8, $8, 4
addiu $9, $9, 4
addi $7, $7, -1
bne $7, $0, move
of program execution is cycle number 1
Cycle
ALUSelA
ALUOp
PCWrite
IR (in HEX!)
ALUOutput
8
1
00
0
0xad2c0000
0x10020020
20
1
00
0
0x20e7ffff
0xf
14 (b) Show the addresses and corresponding contents of the locations in the data segment
that are loaded by the above SPIM code. Clearly associate addresses and corresponding
contents.
0x10010000
0x72617453
0x10010004
0x00000074
0x10010008
0x00000018
0x1001000c
0x00006666
14 (c) Assume the exception/interrupt handler is stored at location 0x80000020. Provide the
SPIM instruction(s) for transferring program control to begin executing instructions at this
address. Do not worry about setting the return address.
lui $1, 0x8000
addi $1, $1, 0x0020
jr $1
14 (d) If the above code was implemented as a procedure, following the MIPS register saving
conventions, which registers, if any, would be saved by the procedure?
All of the registers which are used are temp registers and therefore are not saved
by the procedure. Since no other procedure is called, there is no need to save the
return address regaietrs, $ra. Using the MIPS convention only the frame pointer
would be saved in this example. There are sufficient argument registers
(show in scientific notation)
___-1.111001 x 2-36___________
__sw $18, 0($15)___________
Assume that the text segment starts at 0x00400000 and that the data segment starts at
0x10010000. Assume that registers $7, $8 and $9 have initial values 16, 0x10010020, and
0x10020020 respectively.
str:
move:
end:
.data
.asciiz “Exams”
.align 4
.word 24, 0x6666
.text
lw $12, 0($8)
sw $12, 0($9)
addiu $8, $8, 4
addiu $9, $9, 4
addi $7, $7, -1
bne $7, $0, move
of program execution is cycle number 1
Cycle
ALUSelB
ALUOp
RegDst
IR (in HEX!)
MemToReg
7
11
00
X
0xad2c0000
X
17
10
00
0
0x25290004
0
contents.
0x10010000
0x6d617845
0x10010004
0x00000073
0x10010008
0x00000000
0x1001000c
0x00000000
0x10010010
0x00000018
0x10010014
0x00006666
15 (c) What is the difference between signed and unsigned instructions and what is the moti-
vation for making this distinction?
Signed instructions interpret operands as 2’s complement numbers
They also will set conditions such as overflow, negative, etc. Unsigned
numbers will be treated as n bit binary numbers. They will typically
not set conditions such as overflow. Certain classes of numbers
natually involve unsigned arithmetic, e.g., memory address computations.
In order to detect these conditions in some instances and ignore them
in others, MIPS uses two different types of instructions.
15 (d) What are the values of end and str?
str: 0x10010000
end: 0x00400014
(show in scientific notation)
____1.011101 x 2-44_________
_slti $26, $13, 0__________
16 Answer the following questions with respect to the MIPS program shown below. Note that
this simple program does not use the register saving conventions followed in class.
Assume that each instruction is a native instruction and can be stored in one word!
Further, assume the data segment starts at 0x10001000 and that the text segment starts are
0x00400000.
.data
label: .word 8,16,32,64
.byte 64, 32
.text
.globl main
main: la $4, label
li $5, 16
jal func
done
func:
loop:
move $2, $4
move $3, $5
add $3, $3, $2
move $9, $0
lw $22, 0($2)
add $9, $9, $22
addi $2, $2, 4
slt $8, $2, $3
bne $8, $0, loop
move $2, $9
jr $31
16 (a) What does the above program do?
Sum the elements stored starting at address label
16 (b)
State the values of the labels loop and main?
0x00400020
loop __________________________
0x00400000
main ___________________________
16 (c) what are the hexadecimal encodings of the following instructions?
bne $8, $0, loop
0x1500fffb
add $9, $9, $22
0x01364820
16 (d) The above program does not implement any register saving conventions on a procedure call. Assume we will use the MIPS register saving convention i.e., Caller/Callee
Save. Ignore the storage of local variables on the stack. What will be the contents of the top
frame on the stack when the procedure func is executing?
$fp, $22, since no other procedure
is called $ra is not stored
16 (e) Identify the local and global labels in the program?
main is a global label
all other labels are local labels
17 (a) What is an unresolved reference? How and when does it become resolved?
A reference to a label that is undefined
It is resolved by the linker which has
access to all of the labels accessed by the
programs being linked
17 (b) Write the SPIM instruction sequence that you could use to implement the branch-ongreater-than-or-equal-to-test. For example, how would implement “bge $7, $6, loop”.
slt $1, $7, $6
beq $1, $0, loop
18 Consider the following isolated block of code. Note that some initialization code is missing.
.text
move: lw $12, 0($8)
sw $12, 0($9)
addiu $8, $8, 4
addiu $9, $9, 4
addi $7, $7, -1
end: bne $7, $0, move
18 (a) If the text segment starts at 0x04000000, what is the value of the label end?.
0x04000014
18 (b) Let is assume we wish to implement the above function as a procedure.
• How many arguments would the procedure have and how would they be passed to
this procedure from a calling program?
Three
passed in argument registers $a0-$a2
•
name the registers that would be stored on the stack after this procedure is invoked
$7, $8, $9, $12 would have to be stored by the caller, if at all.
$fp is the only register that is stored following the steps in
the text.
18 (c) Show the hexadecimal encoding of the last instruction.
0x14e0ffffa
18 (d) What does the following bit pattern represent assuming that it is one of the following.
00101100 01110000 11000000 00000000
(in scientific notation))
_1.111000011 x 2-39_________
A MIPS instruction (symbolic code!)
_sltiu $16, $3, 0xc000________
19 Consider the following isolated block of SPIM code. Ignore initialization code. The data
segment starts at 0x1001000 and the text segment starts at 0x00400000. Note that this
block of code does not use registers saving conventions on a procedure call.
start:
end:
mystery:
loop:
skip:
exit:
addi $2, $0, 85
addi $3, $0, 1
jal mystery
j exit
add $4, $0,
andi $5, $2,
beq $5, $0,
add $4, $4,
srl $2, $2,
bne $2, $0,
jr
$31
li $v0, 10
syscall
$0
1
skip
$3
1
loop
19 (a) What are the values of the following labels?.
loop ________0x00400014________
skip _________0x00400020________
19 (b) What are the encodings of the following instructions?
addi $3, $0, 1
__0x20030001______________
beq $5, $2, skip
__0x10a00001______________
19 (c) (6 pts) Let us suppose the procedure mystery is independently compiled and linked
when the program is run. After linking the program is stored in memory and mystery has
a value of 0x00400010. What is the encoding of the jal mystery instruction.
0x0c100004
19 (d) Which of the following are valid MIPS instructions? If they are invalid you must state
the correct form or clearly state why it is incorrect.
• lw $t0, $t1($t3)
The offset must be a label or numeric constant. Examples of the correct
form are:
lw $t0, 4($t3) or
lw $t0, label2($t3)
• sltiu $t1, $t2, 0x44
This is correct. Immediates can be specified in hexadecimal notation or base 10
notations.
• bne $t1, label, loop
The arguments to the instruction must be registers, or one of the arguments can actually
be an immediate. A correct form would be
bne $t1, $s3, loop
19 (e) Suppose I want to store the following information in the data segment in the order
shown. Show a sequence of SPIM data directives that will correctly do so. The text string,
each reserved word and the byte array should be labeled so that programs may reference
them easily.
• the text string “Enter a SPIM instruction”
• reserve space for 4 words
• store an array of 16 bytes (pick your own values)
str:
L1:
L2:
L3:
L4:
array:
.data
.asciiz “Enter a SPIM instruction”
.align 2
.space 4
.space 4
.space 4
.space 4
.byte 1,2,3,4,5,6,7,8,9,11,22,33,44,55,66,77
19 (f) (2 pts). In the preceding question (1(f)) what assembler directive would cause the byte
array to start on a 64 byte boundary?
.align 6
6
this moves allocation to the next 2 byte boundary
19 (g) What does the program in question 1 do? You have to specific and state the function or
operation performed. No partial credit.
Counts the number of bit set to 1 in $2.
20 (a) What is the difference between native instructions and pseudoinstructions?
Native instructions are implemented by the underlying hardware. pseudoinstructions
are translated into native instructions by the assembler. pseudoinstructions provide
more powerful instructions to the programmer without affecting the complexity of the
hardware since they are not interpreted by the hardware.
20 (b) Let us consider the program shown in the previous question. The procedure mystery
does not appear to follow any of the SPIM procedure call conventions. If you were to
rewrite mystery to follow the MIPS caller/callee saving convention, what registers
would you expect to find in the stack frame for mystery while it is executing?
Based on our class discussions only $fp.
20 (c)
Suppose that a 32-bit integer A is stored in memory, starting at the address given in
register $t1, and that another 32-bit integer B is stored in the next sequential word after A.
Write a MIPS procedure Swap that exchanges the values A and B in memory if A > B.
The procedure should return to its caller when it is finished. Assume a purely caller-save
convention for preserving registers is being used. Full credit is the program fits in the table
below. Provide comments.
label
instruction
Swap: lw $t0, 0($t1)
lw $t2, 4($t1)
slt $t3, $t2, $t0
beq $t3, $0, exit
sw $t0, 4($t1)
sw $t2, 0($t1)
exit: jr $31
comment
Load A
Load B
Check if B<A
If not true (result=0)exit
Swap A and B
Return
Another solutiojn would expect to find the address of A in $a0 and use this register
instead of $t1. Since we are using caller-save and this procedure does not call anyone
else, the return address does not need to be saved. With no saving requirements there is
no stack frame that needs to be allocated/deallocated.
21 (a) Show the single precision normalized floating point representations of the following
numbers. Use the 754 standard and present your solution in hexadecimal notation.
-0.875
0.0
0xbf600000__________________________________________________
____
___________________________________________________________
-1.03125 ___0xbf840000____________________________________________
21 (b) Provide the hexadecimal representation of a denormalized number in double precision
IEEE 754 notation. What is the purpose of denormalized numbers?
0x00001000
Denormalized numbers are numbers with a value betwee 0 and the smallest number
that can be represented. The IEEE 754 convention identifies denormalized
numbers with an exponent value of 0 and a non-zero significand.
21 (c) Assuming IEEE 754 rules and conventions, what is the result of the following operation
in hexadecimal notation.
• 1.101 x 2112 X 1.01 x 212 (note that the exponent values are represented in base
10).
Note that the exponents are biased. Thus when we add the exponents we must
substract 127 to obtain the correct value of the biased exponent of the
product. However when we do this we find that we have
112+12-127 = -3. Since the value of the biased exponent is less that 0, we
have underflow.
21 (d) Assuming that we have 6 significant digits (including the implicit bit), does the following calculation produce any error? If so what is the magnitude of the error?
1.001 x 2121 + 1.1 x 2128
After adjusting the expoent value of the smaller number have
1.1000000000 x 2128 +
0.0000001001 x 2128
------------------1.1000001001 x 2128
With 6 significant digits the answer is 1.1 x 2128
The magnitude of the error is 1.001 x 2121
22 For the following questions, consider the single cycle datapath shown in Figure 5.24 on
page 314. The value of loop is 0x00400004, the contents of register $t0, $t1, and $t2 are
0x01000033, 0x00004008, and 0x000000c0, respectively, and the contents of register $1 is
0x10010008
22 (a) Fill in the values of control signals in the following table for each instruction.
Instruction
j loop
sw $t2, 8($at)
ALUSrc
X
1
ALUOp
XX
00
RegDst
X
X
MemToReg
X
X
Write Data
Input to Data
Memory
contents of $16
0x000000c0
The jump instruction does not utilize any portion of the datapath other than the portion that
updates the PC with the jump address. However, the register file does utilize bits of the
encoded address as register addresses for rs and rt. The input to the write data port of memory
is the contents of register rt. Therefore first encode the j instruction to find the value of rt. It
will be 16.
22 (b) Errors in fabrication are a major concern in the design of a datapath. Suppose that
MemToReg and RegDst multiplexor control signals were stuck at 0, i.e., you cannot assert
these signals due to a fabrication defect. Which of the instructions supported by the datapath will still work?
sw, beq, and j. This follows from the truth table for the
controller. Which rows can still produce valid control signal
values?
22 (c) This question deals with the changes to the datapath required to implement the addition
of the jal instruction. Full credit requires you to be specific.
• what are the structural changes, if any, that must be made to the datapath
- We need $31 as a hardwired input address to the RegDst Mux. RegDst is
now a 2 bit control signal
- We need PC+4 as an input to the MemToReg Mux since it is now a candidate
for writing into the register file (into $31). The MemToReg signal is now a 2 bit
control signal
- The jal control signal is used to select the jump address from the PC source mux.
This mux is is expanded to a 2 bit signal.
• Show the changes required to the main controller truth table and the PLA implementation
of this controller.
RegDst AluSrc MemToReg RegWrite MemRead MemWrite Br AluOp Jal
10
X
10
1
0
0
0
XX 1
The PLA implementation illustarted in Figure C.2.5 will be changed to add another gate
to recognize the jal instruction. The RegDst and MemToReg outputs must be modified
to produce two output signals. The most significant bit is simply the jal while the least
significant remains as before. We also introduce a jal output.
23 Consider the state of following procedure executing within the single cycle SPIM data path
shown in Figure 5.24 on pg. 314. The first instruction is at address 0x00400440 and registers $4 and $5 contain the values 0x10010440 and 0x00000040 respectively. Assume that
all instructions are native instructions.
func:
loop:
end:
add $2, $4, $0
add $3, $5, $0
add $3, $3, $2
add $9, $0, $0
add $22, $0, $0
lw $22, 0($2)
add $9, $9, $22
addi $2, $2, 4
beq $3, $2, end
j loop
move $2, $9
jr $31
23 (a) Fill in the values of the signals shown for the following instructions (first time through
the loop).
Instruction
ALUOutput
Value at the
Write Data
Input of Data
Memory
beq $3, $2, end
0x0000003c
0x100100444
X
01
0
X
lw $22, 0($2)
0x10010440
0x00000000
0
00
1
1
RegDst
ALUOp
ALUSrc
MemTo
Reg
23 (b) Suppose in our design the controller was implemented in a separate chip. Due to a fabrication defect the MemToReg control signal has been found to be defective. What modifications can we make to any part of the datapath (excluding the controller) such that the
datapath still functions correctly. (Hint: This is equivalent to asking how can I get rid of the
MemToReg control signal).
Invert the RegDst signal and use it as MemToReg
23 (c) Now suppose the above program was executed on a multicycle datapath. Assume that
the first cycle for fetching the first instruction is numbered cycle 0. Assume immediate
instructions take the same number of cycles as the R-format instructions. On which cycle
do the following events take place.
•
add $9, $0, $0 completes execution
____15_______(the end of the instuction execution not the EX cycle)
•
the branch address for beq $2, $3, end is computed (for the first time through the loop)
_____34____
23 (d) What the principal advantage of the multicycle datapath over the single cycle datapath.
Be specific and provide an example to illustrate your point.
Each instruction only takes as long as it needs to. For example, if the clock cycle
was 10 ns and the load takes 5 cycles (50 ns) branches now only take 30 ns (3
cycles and R-format instuctionsonly take 40 ns (4 cycles). The net time savings
over the execution fo a program can be substantial.
24 Show the hexadecimal value of the single precision normalized floating point representations of the following numbers. Use the IEEE 754 standard.Your representation should be
fully compliant with the definition.
0x7f800000
Infinity
0xc0900000
______________________
-4.5
24 (a) Perform the following floating point operations assuming that the number representations are IEEE 754 compliant. Your answers can be represented in scientific notation.
•
multiplication: 0xbe000000 x 0xbf800000 (
-0.125 x -1.0 = 0.125
•
1.001 x 2131 + 1.11001 x 2126
1.001 x 2131 + 0.0000111001 x 2131
= 1.0010111001 x 2131
24 (b) Provide an example of a number that is too large to be represented in single precision
notation but can be represented in double precision.
1.0 x 2277
Assume that the fetch cycle for the first instruction is cycle number 0. Assume that addiu
and addi require the same number of states as an add instruction.
add $t4, $0, $0
addi $t3, $0, 4
lw $t0, 0($t1)
add $t4, $t4, $t0
addiu $t1, $t1, 4
addi $t3, $t3, -1
bne $t3, $0, loop
loop:
25 (a) How many cycles does this code sequence take to execute to completion?
88 (the loop takes 20 cycles and executes 4 times).
25 (b) On what cycle is the add instruction in the body of the loop executed for the second
time (remember the first cycle is cycle number 0!)?
Cycle no. 35 for the EX cycle of the add instruction.
25 (c) Fill in the following values during the requested cycle (remember the first cycle is cycle
number 0!).
Cycle
13
27
ALUSrcB
ALUOp
RegDst
PCWrite
MemToReg
PCSource
01
00
00
01
0
0
1
0
0
0
00
01
The first row corresponds to the fetch cycle of the add instruction. The second row corresponds to third cycle of the branch instruction.
-0.125
0xbe000000
___________________________________________________________
0x41208000
___________________________________________________________
10.03125 ___________________________________________________________
0.0
26 (b) How would you interpret the following hexadecimal number assuming that it is a single precision IEEE 754 floating point number: 0x00140000?
It is a denormalized number since the exponent is 0, but the significand
is non-zero.
26 (c) Assuming IEEE 754 rules and conventions, what is the result of the following operation
in hexadecimal notation.
• 1.101 x 2142 X 1.01 x 2122 (note that the exponent values are represented in
base 10).
1.000001 x 2138
26 (d) Assuming that we have 6 significant digits (including the implicit bit). What is the
value of the guard, round and sticky bits in the course of the following computation?
Repeat for 4 significant digits.
1.001 x 2121 + 1.1 x 2128
After adjusting the expoent value of the smaller number have
1.1000000000 x 2128 +
0.0000001001 x 2128
------------------1.1000001001 x 2128
With 6 significant digits we have the guard, round and sticky bits as 011
With 4 significant digits we have 001.
Note the sticky bit is set if here are non-zero bits to the right of the guard and round
bits.
27 For the following questions, consider the single cycle datapath shown in Figure 5.24 on
page 314. The value of loop is 0x00400004, the contents of register $t0, $t1, and $t2 are
0x01000033, 0x00004008, and 0x000000c0, respectively, and the contents of register $1 is
0x10010008
Instruction
jal loop
bne $t1, $t2, loop
ALUSrc
ALUOp
RegDst
MemToReg
x
0
x
01
10
X
10
X
Write Data
Input to Data
Memory
contents of $16
0x000000c0
The jal instruction requires some additions to the datapath.
- We need $31 as a hardwired input address to the RegDst Mux. RegDst is
now a 2 bit control signal
- We need PC+4 as an input to the MemToReg Mux since it is now a candidate
for writing into the register file (into $31). The MemToReg signal is now a 2 bit
control signal
However, the register file does utilize bits of the encoded instruction as register addresses for
rs and rt. The input to the write data port of memory is the contents of register rt. Therefore
encode the jal instuction to find the value of rt. It will be 16.
27 (b) Your are designer and wish to reduce the number of control signals in this datapath.
Provide one hardware modification that will eliminate the MemToReg control signal.
Use the complement of RegDst as the value of this signal. This is not the
only solution but one which is readily apparent from the truth table.
27 (c) This question deals with the changes to the datapath required to implement the addition
of the addi instruction. Full credit requires you to be specific.
• what are the structural changes, if any, that must be made to the datapath
none are required. We have the connections necessary to add an
immediate operand to the contents of a register. The issue is simply
one of control.
• Show the changes required to the main controller truth table and the PLA implementation
of this controller.
RegDst AluSrc MemToReg RegWrite MemRead MemWrite Br AluOp
0
1
0
1
0
0
0
00
The values of AluSrc and RegWrite become modified to include another OR term. Note
the change in these two columns on the truth table. Also we have an additional gate on the
input to recognize the addi instruction.
Assume that the fetch cycle for the first instruction is cycle number 0. Assume that li and
addi require the same number of states as an add instruction, that jal and slt require the
same number of states as a bne instruction.
main:
loop:
.text
li $t0, 2
add $t2, $zero, $zero
li $a0, 1
jal solo
addi $t0, $t0, -1
addi $a0, $a0, 1
add $t2, $t2, $v0
slt $t1, $t0, $zero
28 (a) How many cycles does this code sequence take to execute to completion?
33
28 (b) On what cycle is the slt instruction in the body of the loop fetched for the first time
(remember the first cycle is cycle number 0!)?
27
28 (c) Fill in the following values during the requested cycle (remember the first cycle is cycle
number 0!).
Cycle
13
26
ALUSrcB
11
00
IorD
0
0
PCWrite
0
0
RegDst
0
1
MemToReg
0
0
PCSource
00
00
The first row corresponds to a decode cycle. The second row corresponds to a write back
cycle.
29 (a) Show the hexadecimal value of the double precision normalized floating point representations of the following numbers. Use the IEEE 754 standard.
Infinity 0x7ff0000000000000________
-1.0
_0xbff0000000000000_______
29 (b) Assuming a total of 4 significant digits including in the implicit bit, is there any inaccuracy in the following IEEE 754 computation assuming the use of the guard and round
bit? Justify your answer by showing how this value is calculated, and identifying the magnitude of the error if any.
(1.01 x 2125) X (1.001 x 2129)
1.001000 x
1.01
--------------1.01101 x 2125+129-127=127
Rounding down gives us
1.011 x 2127
There is an error and the magnitude of the error is
0.00001 x 2127
29 (c) When using Booth’s algorithm, does it matter which of the operands are the mutliplier?
Justify your answer.
The number of addition operations depends on
the distribution of 1’s in the multiplier. We can
pick the operand that will produce the minimum
number of addition operations. If shifts are
faster than additions, we can speed up the operaton with this optimization.
29 (d) Consider the operation 58 * 4 using version 3 of the multiplication algorithm in chapter
4 (not Booth’s algorithm). Recall that this is the most efficient version of this algorithm.
What are the initial and final contents of the multiplicand and product registers? Assume
that the multiplier and multiplicand have the same number of bits.
Initial
Final
multiplicand
000100
000100
Product
000000111010
000011101000
30 Consider a base, multi-cycle SPIM machine operating at 200 Mhz, with the following program execution characteristics.
Instruction class
CPI
Frequency
R-Type + Immediate
4
45%
lw
5
25%
sw
4
20%
beq
3
7%
j
3
3%
Each of the following parts are independent of each other.
30 (a) The designers decided to add a new addressing mode to help with array addressing
where we often add a byte offset to a base address stored in a register. Therefore a
sequence of instructions such as code block 1 below can be replaced by the instruction
shown in code block 2. We have 40% of the lw instructions affected in this manner. Assuming that the cycle time does not change and the new instruction also takes 5 cycles, what is
the speedup produced by the use of this new addressing mode?
add $t2, $t1, $t0
lw $t3, 0($t2)
becomes
lwx $t2, $t1+$t0
code block 1
code block 2
CPI_old = 0.45*4 + 0.25*5 + 0.20*4 + 0.07*3 + 0.03*3 = 4.15
Ex_old = #Instr * 4.15 * clock_cycle_time
With the optimization the number of instructions has been changed. If 40% of the lw
instructions have been affected that means 0.4 * 0.25 = 0.1 or 10% of the total number of
instructions have been affected. For each of these instructions we can eliminate one
instruction by combining two instructions. The new total number of instructions is therefore (0.9 * #Instr). The eliminated instructions are all R-format instructions. Therefore the
percentage of such instructions are reduced by 10%.
CPI_new = 0.35*4 + 0.25*5 + 0.2*4 + 0.07*3 + 0.03*3)/0.9 = 4.17
Ex_new = (0.9 * #Instr) * 4.17 * clock_cycle_time
Comparing the execution times we can see that this is a good trade-off.
30 (b) Is it possible to halve the execution time of programs by simply speeding up R-format
and immediate instructions, i.e., realize a speedup of 2? Justify your answer.
From Amdahl’s Law we have Speedup = 1/(s + ((1-s)/n)) where s is the fraction
that is unaffected, and n is the factor by which this component can be sped up. Let
us assume that R-format and immediate instructions are infinitely sped up, i.e., n
= infinity. In this case we have
Speedup = 1/(0.55) < 2.
Therefore the answer is no.
30 (c) What is the maximum MIPS rating of the multicycle datapath?
The fastest instruction takes 3 cycles, i.e., has a CPI of 3. This is the theoretically lowest
CPI possible for any program (although it is unrealistic to have a program that only
branches opr jumps!)
MIPS Rating = (Clock Rate in MHZ)/CPI = 200/3 = 66.7 MIPS
31 We wish to make our datapath implement instructions such as those that are found in the
Power PC. Specifically we are interested in adding an instruction that will increment a
counter, and branch to a label if the counter value is 0: icb $t0, loop.This instructon operates just as a branch instruction would if the branch is taken. Asume that this instruction
uses the sameformat as a branch instuction (with the rt field set to 0). Assume that the
opcode for this new instruction is 101000. Explicitly state any assumptions you need to
answer the following questions.
31 (a) State any additions/modifications required to the hardware data path of Figure 5.28 or
describe how the instruction can be implemented with no physical modifications.
The execution of icb requires the contents of the register to be decremented. We can
provide this value of 1 as an input to the AluSrcB mux which makes the number of
inputs 5, and therefore the AluSrcB mux requires a three bit control signal. The
RegDst field must be exapanded to a 2 bit field since the rs register must now be
written using rs as a destination address. Thus this instruction has a WB state.
31 (b) Show the modified state machine that will handle this instruction. Provide the values of
relevant control signals in each additional state. Any control signals not specified will
assumed to be X.
from
decode
AluSrcA = 1
AluSrcB = 100
AluOp = 00
PCWriteCond
PCSource = 10
to fetch
RegWrite
RegDst = 10
MemToReg = 0
31 (c) Show the additions to the microcode to implement this instruction. The existing microcoded implementation of the state machine before addition of this instruction is shown
below. You need only show additions/modifications. Note that there is more than one
approach to solving this problem. Soem solutions may require additions to the microinstruction format in your text. If so state them explicitly.
Dispatch ROM 1
Dispatch ROM 2
Op
Name
Value
Op
Name
Value
000000
R-format
0110
100011
lw
0011
000010
jmp
1001
101011
sw
0101
000100
beq
1000
100011
lw
0010
101011
sw
0010
101000
icb
1010
label
ALU
control
src1
src2
fetch
add
PC
4
add
PC
extshft
dispatch-1
add
A
extend
dispatch-2
Mem1
Register
Control
LW2
Memory
PCwrite
Control
Sequencing
read PC
ALU
seq
read ALU
seq
write MDR
SW2
fetch
write ALU
Rformat1 Func
A
fetch
B
seq
write ALU
BEQ1
Subt
A
B
JUMP1
ICB
add
A
+1
write ALU
fetch
ALUOut-Cond
fetch
jmp addr
fetch
ALUOut-cond
seq
fetch
32 Consider the manner in which exceptions are accommodated in the SPIM datapath, and the
manner in which the data path controller is modified to support exceptions.
32 (a) In order to support n exceptions, how many additional states should be added to the
state machine? How many additional microinstructions should added?
n
32 (b) Consider an illegal memory address exception. Show the changes to the state machine
for this exception. Make any assumptions that you feel may be necessary. For each additional state clearly show the values of relevant control signals. Any signals left unspecified
will be assumed to be X (don’t care).
from
- State 0
- State 3
- State 5
code for illegal
memory address is 10
to fetch
IntCause = 10
CauseWrite
AluSrcA = 0
AluSrcB = 01
AluOp = 01
EPCWrite
PCwrite
PCSource = 11
The illegal memory address exception is detected in any state
that performs a memory access.
32 (c) What is the minimum number of cycles from the occurrence of an exception to the time
the first instruction of the exception handler is fetched?
One. Exceptions are handled in one state in this datapath (ref. Figure
5.39 and Figure 5.40). The next state is the fetch state.
0x7ff0000000000000
(more than one answer:significand must be nonzero)
NaN
_________________________________
-0.125
_0xbfc0000000000000______________
(1.01 x 2125) + (1.001 x 2129)
0.000101 x 2129
1.001000 x 2129
--------------1.001101 x 2129
Rounding up gives us
1.01 x 2129
There is an error and the magnitude of the error is
0.000011 x 2129
33 (c) Using Booths algorithm and n bit multipliers and multiplicands, how many steps are
required?
n steps
33 (d) Consider the operation 18 * 8 using version 3 of the multiplication algorithm in chapter
that the multiplier and multiplicand have the same number of bits.
Initial
Final
Multiplicand
10010
10010
Product
0000001000
0010010000
Note that I have left out the sign bits for these numbers. The solution could also have been presented using 6 bits for the multiplicand and 11 bits for the product (10 bit result and the sign bit).
Instruction class
CPI
Frequency
R-Type + Immediate
4
45%
lw
5
25%
sw
4
20%
beq
3
7%
j
3
3%
The following parts are independent of each other.
34 (a) The designers decided to add a new addressing mode wherein registers can be automatically incremented by 4. Therefore a sequence such as code block 1 below can be replaced
by the instruction shown in code block 2. We have 30% of the lw instructions affected in
this manner. Assuming that the cycle time does not change and the new instruction also
takes 5 cycles, what is the speedup produced by the use of this new addressing mode?
lw $t0, 0($t1)
addi $t1, $t1, 4
becomes
lw $t0, ($t1)+
code block 1
code block 2
CPI_old = 0.45*4 + 0.25*5 + 0.20*4 + 0.07*3 + 0.03*3 = 4.15
With the optimization the number of instructions has been changed. If 30% of the lw
instructions have been affected that means 0.3 * 0.25 = 0.075 or 7.5% of the total number
of instructions have been affected. For each of these instructions we can eliminate one
instruction by combining two instructions. The new total number of instructions is therefore (0.925 * #Instr). The eliminated instructions are all immediate instructions. Therefore
the percentage of such instructions are reduced by 7.5%.
CPI_new = 0.375*4 + 0.25*5 + 0.2*4 + 0.07*3 + 0.03*3)/0.925 = 4.17
Ex_new = (0.925 * #Instr) * 4.17 * clock_cycle_time
34 (b) New ALU implementation technology has made it possible to double the speed of the
R-format and immediate instructions. What is overall the speedup that can be realized for
programs with the instructions statistics shown in the table?
From Amdahl’s Law we have Speedup = 1/(s + ((1-s)/n)) where s is the fraction
that is unaffected, and n is the factor by which this component can be sped up.
Plugging in the values from the table we have
Speedup = 1/(0.55 + (0.45/2)) = 1.29 or 29% speedup.
34 (c) What is the maximum MIPS rating of the multicycle datapath?
The fastest instruction takes 3 cycles, i.e., has a CPI of 3. This is the theoretically lowest
CPI possible for any program (although it is unrealistic to have a program that only
branches opr jumps!)
MIPS Rating = (Clock Rate in MHZ)/CPI = 200/3 = 66.7 MIPS
35 We wish to make our datapath implement instructions such as those that are found in the
Power PC. Specifically we are interested in adding the following instruction: lwrf $t0,
$t1+$t2. As you may recall, this instruction uses the sum of the contents of registers $t1
and $t2 as addresses. Assume that this instruction uses R-format encodings where the
shamt and func fields are ignored and an opcode 101000. Explicitly state any assumptions
you need to answer the following questions.
35 (a) State any additions/modifications required to the hardware data path of Figure 5.39 or
describe how the instruction can be implemented with no physical modifications.
No changes are required. Simply add rs and rt to obtain the memory address.
assumed to be X.
from
decode
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 00
to fetch
MemRead
IorD = 1
RegDst = 1
RegWrite
MemToReg = 1
approach to solving this problem. If any changes are necessary to the microinstruction format, state them explicitly.
Dispatch ROM 1
Dispatch ROM 2
Op
Name
Value
Op
Name
Value
000000
R-format
0110
100011
lw
0011
000010
jmp
1001
101011
sw
0101
000100
beq
1000
100011
lw
0010
101011
sw
0010
101000
lw+
1010
label
ALU
control
src1
src2
fetch
add
PC
4
add
PC
extshft
dispatch-1
add
A
extend
dispatch-2
Mem1
Register
Control
LW2
Memory
PCwrite
Control
Sequencing
read PC
ALU
seq
read ALU
seq
write MDR
SW2
fetch
write ALU
Rformat1 Func
A
fetch
B
seq
write ALU
BEQ1
Subt
A
fetch
B
JUMP1
LWRF
add
A
B
ALUOut-Cond
fetch
jmp addr
fetch
seq
read ALU
write MDR-rd
seq
fetch
manner in which the data path controller is modified to support exceptions.
36 (a) How are exceptions different from procedure calls?
• Procedure calls perform state saving operations and make use of the
stack Exceptions need not be concerned with state saving operations
• Exceptions must encode the cause of the exception or use a vectored
implementation to decode the exception
• Exception logic stores the address of the offending instruction rather
than the next instruction.
• Procedure calls are normal changes in the flow of control rather than
unexpected changes.
• Exceptions can lead to unexpected program termination.
36 (b) Consider an illegal instruction exception. Show the changes to the state machine for
this exception. Make any assumptions that you feel may be necessary. For each additional
state clearly show the values of relevant control signals. Any signals left unspecified will
be assumed to be X (don’t care).
from decode
code for illegal
instruction is 10
to fetch
IntCause = 0
CauseWrite
AluSrcA = 0
AluSrcB = 01
AluOp = 01
EPCWrite
PCwrite
PCSource = 11
The illegal instruction exception can be detected in the decode
state. A transition is made to the exception state where the code
is recorded and the state machine moves to the fetch state to
start fetching the first instruction of the exception handler.
36 (c) What is the minimum number of cycles from the occurrence of an exception to the time
the first instruction of the exception handler is fetched?
One. Exceptions are handled in one state in this datapath (ref. Figure
5.39 and Figure 5.40). The next state is the fetch state.
-0.0625
0.0
___________0xbd800000______________________
_________________0x00000000___________________
37 (b) Assuming IEEE 754, what is the result of the following operations
base 10).
1.0010001 x 2148
37 (c) Consider a double precision floating point number (IEEE 754). As it turns out, the
range of numbers that it can represent is not sufficient for your application. Keeping the
total number of bits constant, how would you change the format of the number?
The number of bits in the exponent determines the range. This number
should be increased at the expense of the number of bits in the significand.
37 (d) What is the smallest positive number you can represent in IEEE 754 double precision
format. Represent this number in scientific notation.
1.0 x 2-1022
38 How would you modify the single bit ALU to produce an overflow bit. Clearly state which
of the 32 single bit ALUs you would modify and modify the circuit below to show how you
would compute the overflow bit.
op
c_in
a
0
binvert
1
b
+
2
3
less
result
set
c_out
overflow
Compute the EX-OR of the carry in to the ALU31 and the
carry out from ALU 31. Use a single EX-OR gate.
Assume that the fetch cycle for the first instruction is cycle 0.
loop:
add $t4, $0, $0
addi $t3, $0, 8
lw $t0, 0($t1)
add $t4, $t4, $t0
addiu $t1, $t1, 4
addi $t3, $t3, -1
bne $t3, $0, loop
39 (a) Fill in the following values at the end of the requested cycle.
The first row corresponds to the WB cycle for the load instruction. The second row corresponds to the decode cycle.
Cycle
ALUSrcB
ALUOp
RegDst
PCWrite
MemToReg
PCSource
12
00
00
0
0
1
00
18
11
00
0
0
0
00
39 (b) On what cycle is the branch instruction fetched for the last time?
165. The loop executes 8 times. Each immediate instuctions takes 4 cycles.
___0xbff0000000000000_______________________________________
-1.0
0.0625
___0x3fb0000000000000_____________________________
40 (b) Clearly state two advantages of using Booth’s algorithm for multiplication over the
standard multiplication algorithm provided in the book.
1. Potentially reduces the number of addition
operations performed depending on the
value of the multiplier
2. Naturally handles two’s complement
representations performing signed
multiplication
40 (c) Assuming a total of 6 significant digits including the implicit bit, is there any inaccuracy in the following computation assuming the use of the guard and round bit? Justify
your answer by showing how this value is calculated, and identifying the magnitude of the
error if any.
(-1.11 x 2-2)
+
(1.01 x 23)
The rounded value that is obtained in 1.0011 x 23. The difference between this value
and the actual value (assuming a sufficient number of bits) is given by 0.0000001.
The correct value is 1.0011001 x 23.
41 Consider a base, multi-cycle SPIM machine operating at 50 Mhz, with the following program execution
characteristics.
Instruction class
CPI
Frequency
R-Type
4
40%
lw
5
25%
sw
4
25%
beq
3
7%
j
3
3%
We can increase reduce the clock cycle time by 15% at the expense adding an additional cycle for data memory accesses.
41 (a) Is this a good tradeoff. Provide a clear and quantitative justification for your answer.
CPIold = 0.4*4 + 0.25*5 + 0.25*4 + 0.07*3 + 0.03*3 = 4.15
Execution Timeold = I * 4.15 * clock_cycle
CPInew = 0.4*4 + 0.25*6 + 0.25 * 5 + 0.07*3 + 0.03*3 = 4.65
Execution Timenew = I * 4.65 * 0.85* clock_cycle time
Execution Timenew < Execution Timeold
41 (b) What is the minimum improvement in clock cycle time that is necessary for this approach to be profitable?
x* 4.65 < 4.15
x < 4.15/4.65 = 0.892.
42 Consider the multicycle datapath.
42 (a) What is the minimum number of cycles for the execution of any instruction?
3
42 (b) For a datapath with a N state controller, what is the size of the microcode store (in number of microinstructions)?
2
log N
42 (c) What is it that determines the duration of the clock cycle time?
The state (or cycle) in the datapath that has the longest execution path or delay.
-1.5
0xbff800000000000
0.03125
0x3fa0000000000000
43 (b) Using Booths algorithm and n bit multipliers and multiplicands, what is the range of
numbers that can be accurately multiplied?
-2n-1 to 2n-1-1
43 (c) Assuming a total of 6 significant digits including the implicit bit, is there any inaccuracy in the following computation assuming the use of the guard and round bit? Justify
your answer by showing how this value is calculated, and identifying the magnitude of the
error if any.
(-1.1 x 2-1) + (1.001 x 24)
Correct answer: 1.000101 x 24
Rounded answer: 1.00011 x 24
Error: 0.000001
44 (20 pts) Consider a base, multi-cycle SPIM machine operating at 50 Mhz, with the following program execution characteristics.
Instruction class
CPI
Frequency
R-Type
4
45%
lw
5
25%
sw
4
20%
beq
3
7%
j
3
3%
44 (a) With clever algorithms for register allocation we can reduce the number of lw and sw
instructions by 10%. What is the improvement in execution time?
CPIold = 4*0.45 + 5*0.25 + 4*0.2 + 3*0.07 + 3*0.03 = 4.15
CPInew = (0.45*4 + 5*0.25*0.9 + 4*0.2*0.9 + 3*0.07 + 3*0.03)/I” = 3.945/I”
I” = 0.45 + 0.25*0.9 + 0.2*0.9 + 0.07 + 0.03 = 0.955
CPInew = 3.945/0.955 = 4.13
Execution Time Old = I * 4.15 * clock_cycle_time
Execution Time New = 0.955* I * 4.13 * clock_cycle_time
Compare!
44 (b) Would a 10% reduction in clock cycle time justify a single cycle increase in the number
of cycles for memory accesses?
CPInew = (0.45*4 + 6*0.25 + 5*0.2 + 3*0.07 + 3*0.03) = 4.6
Execution Time New = I * 4.6 * 0.9 * clock_cycle_time
= I * 4.14 * clock_cycle_time
Barely better
0.375
-1.5
___0x3ec00000______________________________________________
___0xbfc00000_____________________________________________
45 (b) Assuming IEEE 754, what is the result of the following operations
base 10).
1.0010001 x 278 (do not forget to substract the extra 127 bias!)
• 1.001 x 294 + 1.1 x 297 (note that the exponent values are represented in base
10).
1.101001 x 297
46 Consider a base, multi-cycle SPIM machine operating at 100 Mhz, with the following program execution characteristics. Compiler optimizations reduce the number of instructions
in each class to the amount shown.
Instruction class
CPI
Frequency
Compiler Opt
R-Type
4
45%
95%
lw
5
30%
90%
sw
4
20%
95%
beq
3
5%
75%
46 (a) What is the base CPI with and without compiler optimizations?
Without Opt = 0.45*4 + 0.3*5 + 0.2*4 + 0.05*3 = 4.25
With Optimization = Cycles/#instr
=(0.45*4*0.95+0.3*5*0.9+0.2*4*0.95+0.05*3*0.75)/(0.45*0.95+0.3*0.9+0.2*0.95+0.05*0.75)
= CPInew
46 (b) What is the speedup due to compiler optimizations?
Speedup = Told/Tnew
= (I * 4.25 * cycle_time)/(I” * CPInew* cycle_time)
where
I” = (0.45*0.95+0.3*0.9+0.2*0.95+0.05*0.75) ( from above: this is the new number of
instructions)
47 Consider the annotated multicycle datapath shown in your text. Fill in the fill in the following table for the sequence of instructions shown below. Each entry should contain the value
of the requested signal at the end of the requested cycle. The initial contents of registers $1,
$2 and $3 are 0x0000088, 0x0000022 and 0x00000033 respectively. The address of Loop
is 0x00400004.
PC
Instruction
Cycle
ALUOutput
Value at the
Output of
the ALU
0x0040000c
beq $1, $2, Loop
3
0x00400004
0x00000066
0x00000022
01
0x00400010
sw $1, 4($2)
4
0x00000026
0x00000026
0x00000088
00
Rt
ALUOp
48 (a) Show the double precision normalized floating point representations of the following
-0.1875
0.0
_____oxbfc0000000000000___________________________
___________________________________________________________
48 (b) Assuming IEEE 754 rules and conventions, what is the result of the following operations. Assume 5 significant digits including the implicit bit.
• 1.11 x 212 x 1.001 x 239.
212 x 239. = 251-127 = 2-76
Remember that the exponent is biased and therefore we have to substract 127 from
the sum of the exponents. The fact that the result is negative indicates that the result
is underflow.
• 1.0001 x 2133 + 1.0101 x 2139
1.0001 x 2133 + 1.0101 x 2139 =
0.0000010001 x 2139 + 1.0101 x 2139 = 1.0101 x 2139 (the result is rounded down
according to the rules for 754).
48 (c) Provide an example of a number that is too small to be represented with single precision notation but can be represented with double precision.
1.0 x 2-130 This number is just smaller than the smallest number that can
be represented in single preceision numbers.
49 For the following questions, consider the single cycle datapath shown in the text. The
value of loop is 0x00400004, the contents of register $t1, $t2, and $t3 are 0x01000033,
0x00004008, and 0x000000c0, respectively, and the contents of register $1 is 0x10010008
Instruction
lw $t3, 4($t2)
ALUSrc
1
ALUOp
00
RegDst
0
MemToReg
1
Write Data
Input to Data
Memory
0x000000c0
49 (b) You are designer and wish to reduce the number of control signals that the controller
has to generate. Provide one hardware modification that will eliminate the RegDst control
signal (but not the mux).
Note that the RegsDst signal is the complement of the MemToReg signal. This
the RegDst mux can be controlled by the MemToReg signal passed through an
inverter.
Assume that the fetch cycle for the first instruction is cycle number 0. Assume that la, li
and addi require the same number of states as an add instruction.
.data
.word 2,3,4,5
.space 16
.text
L1:
L2:
la $t8, L1
la $t9, L2
li $t7, 4
lw $t2, 0($t8)
sw $t2, 0($t9)
addi $t8, $t8, 4
addi $t9, $t9, 4
addi $t7, $t7, -1
bne $t7, $0, move
move:
end:
50 (a) What would be the CPI and MIPS rating for a 266 MHz machine for the program
shown above?
The loop executes four times. Each time through the loop the instructions take a total
of 24 cycles. The first three instructions take 12 cycles. The total number of cycles is
108. The number of instructions in the loop are 6. Therefore the total number of
instructions executed is 6*4+3 = 27. The CPI = 108/27 = 4.0.
MIPS rating = 266/CPI
50 (b) Fill in the following values during the requested cycle (remember the first cycle is cycle
number 0!).
Cycle #
17
40
ALUSrcB
01
XX
IorD
0
X
PCWrite
1
0
RegDst
X
0
MemToReg
X
1
PCSource
00
XX
The first row corresponds to the fetch cycle for the sw instruction. The second row corresponds to the write back cycle for the lw instuction executed for the second time
through the loop.
50 (c) On the figure of the multicycle datapath attached overleaf, show any modifications that
would be required to add the jr instruction. Below show the modifications to the state
machine how any additional states would fit in the state machine. For each additional state
that you have added, show the values of the relevant control signals. Any signals that you
omit from the description of a state are assumed to be deasserted.
According to the format definition for the jr instruction, the rs field identifies the register.
This value will be available in the A register at the end of the decode cycle. Add another signal from the A register to the PCSource mux. The value of PCSource of 11 will now
select the contents of the A register to be written to the PC. This writeback can be handled in
one extra state after decode with PCWrite = 1 and PCSource = 11.
IF
ID
jr
PCWrite = 1
PCSource = 11
start:
str:
main:
move:
end:
done:
.data
.word 21, 22, 0x44, 0x12
.asciiz “Done ”
.align 4
.byte 0x88, 0x32
.text
.globl main
li $t3, 2
lui $t0, 0x1001
lui $t1, 0x1001
addi $t1, $t1, 8
lw $t5, 0($t0)
lw $t6, 0($t1)
add $t7, $t5, $t6
addi $t0, $t0, 4
addi $t1, $t1, 4
addi $t3, $t3, -1
Cycle
ALUSrcB
MemToReg
RegWrite
Branch Address
RegDst
12
01
X
0
0x00400030
X
28
00
X
0
0x0041e05c
X
contents.
0x10010000
0x00000015
0x10010004
0x00000016
0x10010008
0x00000044
0x1001000c
0x00000012
0x10010010
0x656e6f44
0x10010014
0x00000020
0x10010018
0x00000000
0x1001001c
0x00000000
0x10010020
0x00003288
0x10010024
51 (c) Show the SPIM program to implement a for-loop that executes 12 times.
.
loop:
li $t1, 12
# loop body
# loop body
..
..
addi $t1, $t1, -1
51 (d) What are the values of move, str and end?
move
end
str
0x00400010
0x00400028
0x10010010
____1.000100101 x 2-7_______________
__lui $t1, 0x4000________
52 (a) Show the hexadecimal value of the single precision normalized floating point representations of the following numbers. Use the IEEE 754 standard.
0x7f800000
Infinity
a denormalized number __0x00011000_________
(1.11 x 2114) X (1.01 x 2119)
The product is 1.00011 x 2107
With only 4 significant digits and the guard and round bit
being 11, we round up to give
1.001 x 2107
The maginitude of the error is 0.00001 x 2107
52 (c) Consider the operation 17 * 4 using version 3 of the multiplication algorithm in chapter
that the multiplier and multiplicand have the same number of bits. Use the minimum number of bits as determined by the above operands.
Initial
Final
Multiplicand
010001
010001
Product
000000000100
000001000100
52 (d) Compare the critical path delays for carry lookahead logic and ripple carry logic for a
32 bit ALU. Ripple carry delay through single bit ALU is 2 gate delays. Delay through a
single node in the carry lookahead tree is also 2 gate delays.
Ripple carry = 32 * 2 = 64 gate delays.
Carry lookahead = 1 (for generate and propagate signal generation) + 5 *2 (up the
tree) + 4 * 2 (down the tree) + 2 (through the last ALU) = 21 gate delays
Instruction class
CPI
Frequency
R-Type or Immediate
4
40%
lw
5
25%
sw
4
20%
bne
3
10%
j
3
5%
Questions 48(a), 48(b), and 48(c) are independent of each other.
53 (a) The designers propose adding a new branch instruction, decrement-and-branch-if-zero:
dcb $t0, loop. This instructions decrements the register $t0. If the resulting value is equal to
0 then the program branches to the instruction at label loop. Therefore a sequence of
instructions such as code block 1 below can be replaced by the instruction shown in code
block 2. We have 35% of the branch instructions affected in this manner. Assuming that the
cycle time does not change and the new instruction takes 4 cycles, is this a good idea?
addi $t2, $t1, -1
bne $t2, $0, loop
code block 1
becomes
dcb $t2, loop
code block 2
CPI_old = 0.4*4 + 0.25*5 + 0.20*4 + 0.10*3 + 0.05*3 = 4.10
With the optimization the number of instructions has been changed. If 35% of the
branch instructions have been affected that means 0.35 * 0.1 = 0.035 or 3.5% of the
total number of instructions have been affected. For each of these instructions we can
eliminate one instruction by combining two instructions. The new total number of
instructions is therefore (0.965 * #Instr). The eliminated instructions are all immediate
instructions. Therefore the percentage of such instructions are reduced by 3.5%. However they are replaced by instructions taking 4 cycles. Therefore the net effect is the
reduction of the number of branch instructions by 35% (or to 65% of the original).
CPI_new = 0.365*4 + 0.25*5 + 0.2*4 + 0.1*3*0.65 + 0.05*3 + 4*0.035)/0.965 = 4.1
Ex_new = (0.965 * #Instr) * 4.1* clock_cycle_time
53 (b) If improved register allocation techniques in the compiler can reduce the number of lw
and sw instructions by 10% what is the improvement in the MIPS rating?
Current MIPS rating is 300/4.1 = 73.17 MIPS
With the reduced number of instructions the new CPI is given by (0.4*4 + 0.225*5 +
0.18*4 + 0.1*3 + 0.05*3)/I’
where I’ is given by (0.4+ 0.9*0.25 + 0.9*0.2 + 0.1 + 0.05)
The new MIPS rating is 300/CPInew
The improvement is given by their differences.
53 (c) Suppose in a typical program executable, 20% of the instructions is from system libraries and implementing operating system related functionality.What is the maximum speedup
that can be obtained by speeding up the application code sequences?
This goverened by Amdahl’s Law. We the maximum speedup as 1/0.2 = 5.
54 We wish to have our multicycle datapath implement the slti $t0, $t1, 0x4000 instruction.
Explicitly state any assumptions you need to answer the following questions.
54 (a) Either state any additions/modifications required to the hardware data path of Figure
5.33, or describe how the instruction can be implemented with no physical modifications.
We can simply have the ALU perform a slt than operation on the inputs and select the
ALU inputs accordingly. The key problem is getting the ALU controller to issue a slt
opcode to the ALU (111). This is the main hardware modification that is required.
Suppose we use the unused ALUOp value of 11 to signifiy this. Note that the truth
table for the ALU controller must be reimplemented since we do not have don’t cares
in one of the ALUOp bit fields any more.
assumed to be X.
from
decode
to fetch
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 11
RegDst = 0
RegWrite
MemToReg = 0
approach to solving this problem. Some solutions may require additions to the microinstruction format in the text. If so state them explicitly.
Dispatch ROM 1
Dispatch ROM 2
Op
Name
Value
Op
Name
Value
000000
R-format
0110
100011
lw
0011
000010
jmp
1001
101011
sw
0101
000100
beq
1000
100011
lw
0010
101011
sw
0010
001010
slti
1010
label
ALU
control
src1
src2
fetch
add
PC
4
add
PC
extshft
dispatch-1
add
A
extend
dispatch-2
Mem1
Register
Control
LW2
Memory
PCwrite
Control
Sequencing
read PC
ALU
seq
read ALU
seq
write MDR
SW2
fetch
write ALU
Rformat1 Func
A
fetch
B
seq
write ALU
BEQ1
Subt
A
B
JUMP1
SLTI
SLTOP
A
Extend
fetch
ALUOut-Cond
fetch
jmp addr
fetch
seq
Write ALU-rt
fetch
manner in which the data path controller is modified to support exceptions. You certainly
do not want your programs to jump to addresses in the data segment, or for that matter let
us say to any address greater than 0x10010000. If j instructions generate such an illegal
address you want to generate an exception.This questions addresses the implementation of
such an exception in the SPIM datapath.
55 (a) How can you test for the occurrence of this exception?
The occurrence of the exception is detected by using an ALU to
compare the jump address with 0x10010000 and generating a single bit execption single to denote the result of this comparison.
55 (b) Describe the hardware modifications required to control to support such an exception.
From state 1001 we now make a transition to the exception state. To make this transition we must modify the next state logic of the state machine implementation. Currentlky the next state after state 1001 is state 0000. This must now be modified to move
to either state 0000 (if no exception) or state 1010 (exception state) of an exception
occured. You can use a table with these two alternatives just as we did for the next state
logic out of state 0010. The input to the table is a single bit signal denoting whether the
exception occured or not (from 5(a)).
55 (c) Show the required modifications to the state machine to support such an exception.All
control signals left unspecified are assumed to be X. \
from jump
code for illegal
address is 10
IntCause = 10
CauseWrite
AluSrcA = 0
AluSrcB = 01
AluOp = 01
EPCWrite
PCwrite
PCSource = 11
to fetch

Computer Architecture: ECE 3055 Sample Problems and Solutions

Transcription

Similar documents

Homework 2

How to install RCS/RK TAGHeuer By Chronelec active loop

TEMPLATE HOW-TO MAKE A GUM PASTE BOW

MIPS Assembly What is MIPS? Operating Systems (CST 1A) Michaelmas 2008

Overview of the Bike Course (radar and Smedleys

Document 6460907

peaked mountain (333 acres) - The Trustees of Reservations

Clivet P-Matic for WLHP

Welcome to the Fitness Trails

Sample Problem Set #1 - SOLUTIONS

RMFL - Blue Force Gear