32
CPOfiw?
e>iefat(Or\
c
ycte.
v^ZY^oiy
jfrtll
cyfle/
SiiSE
:
CPUmtT0'5B^MMa^4^6^—SP^5d
°
Hitb
:
CPU
time
=
(CPU
execution
cycles
+
Mem-stall
cycles)
x
Cycle
time
Memory
access
^
,
Mem-stall
cycles/prog.
=
Prn^am
^
Miss
rate
x
Miss
penalty
,
,.
Memory
access
^
Mem-stall
cycles/mstr.
=
^
x
Miss
rate
x
Miss
penalty
instruction
Suppose
there
are
100
instr.
in
a
program
among
which
30%
of
load
and
store
instr.
Number
of
access/program
Number
of
access/instr.
Separate
cache
Instruction
Cache
100
Data
Cache
30
0.3
Combined
cache
Cache
130
1.3
For
separate
cache:
CPIeffective
=
CPIbase
+
Mcmoiy
Stall
per
instruction
=
CPIbase
+
I-cache
stall
per
instruction
+
D-cache
stall
per
instruction
=
CPIbase
+
I-cache
access
per
instr
x
Miss
rate
x
Miss
penalty
+
D-cache
access
per
instr
x
Miss
rate
x
Miss
penalty
For
combined
cache:
^Pleffective
=
CPIb
ase
+
Memory
stall
per
instruction
=
CPIbase
+[^ache
access
fp^ins^x
Miss
rate
x
Miss
penalty)
/
1'cade
YY^hs
rt\te
;
^'4
t?—cAcfie
YUlff
'■
fX
I
oo
Kvlfjf
pe'^'<IOj
t^IcO
cyc[qs
Assume
an
instruction
cache
miss
rate
for
a
program
is
2%
and
a
data
cache
miss
rate
is
4%.
If
a
processor
has
a
CPI
of
2
without
any
memory
stalls
and
the
miss
penalty
is
100
cycles
for
all
misses/determine
how
much
faster
a
processor
would
run
with
a
perfect
cache
that
never
missedy
Use
the
instruction
frequencies
for
SPECint2000
(load/store:
36%)
Answer
effectiVe
Cpl
=
6fII.
<
p-ajvci,?
rr«'i
p?"
=
2
-I-
[<
looy
f
(k>fcyfooyf%
i-<1yYctt..
Instruction
cache
stall
per
instruction
=
1
y
2%
^100
=
2
'9+24'
I'
^4
Data
cache
stall
per
instruction
=
0.36
y
0.04
y
100
=
1.44
=
r.+f
Mem.
stall
per
instr.
=
2
+
1.44
=
3.44
^
Effective
CPI
=
Base
CPI
+
Memory
stall
per
instr.
=
2
+
3.44
=
5.44
Ration
of
the
execution
time
=
5.44/2
=
2.72
fills
:
CPI
2
1
°
cpi
mmmm
i
+
3.44
=
4.44
°
4.44/1
=
4.44
°
mnmm^
3.44/5.44
=
63%
if
3.44/4.44
=
77%
=
Suppose
we
increase
the
performance
of
the
computer
in
the
previous
example
by^oubling
its
clock
rat^
Since
the
main
memory
speed
is
unlikely
to
change,
assume
that
the
absolute
time
to
handle
a
cache
miss
does
not
change.
How
much
faster
will
the
computer
be
with
the
faster
clock,
assuming
the
same
i^iisyate
as
the
previous
example?
Answer
34
I
Measured
in
faster
clock
cycle,
the
new
miss
penalty
will
be
200
cycles
Mem.
stall
per
instruction
=
2%
x
200
+
36%
x
(4%
x
200)
=
6.88
Faster
system
with
cache
miss,
CPl
=
2
+
6.88
=
8.88
Slower
system
with
cache
miss,
CPl
=
5.44
The
faster
clock
system
will
be
Execution
time
of
slow
clock/Execution
time
of
faster
clock
=
I
X
CPI_slow
X
Cycle
time
/
(I
x
CPI_fast
x
1/2
x
Cycle
time)
=
5.44
/
(8.88
X
1/2)
=
1.23
times
faster
associative
cache)
^Pl^lXCmultiievel
cache)—
—IS'KIS
1-ccihie
(o;/
<rO
/
The
data
cache
has
a
92%
hit
rate
and
a
2-cycle
hit
latency,
and
the
cache
/'
miss
penalty
is
124
cycles.(
instructions
are
loads
and
stores.
The
instruction
cache
has
a
hit
rate
of
90%
with
a
miss
penalty
of
50
cycles.
Assume
the
base
CPl
using
a
perfect
memory
system
is
1.0.
Calculate
the
CPl
of
the
pipeline,
assuming
everything
else
is
working
perfectly.
Assume
the
load
never
stalls
a
dependent
instruction
and
assume
the
processor
must
wait
for
stores
to
finish
when
they
miss
the
cache.
Finally,
assume
that
instruction
cache
misses
and
data
cache
misses
never
occur
at
the
same
time.
Show
your
work.
I
35
1.
Calculate
the
additional
CPI
due
to
the
instruction
cache
stalls.
2.
Calculate
the
additional
CPI
due
to
the
data
cache
stalls.
3.
Calculate
the
overall
CPI
for
the
machine.
Answer
(1)
The
additional
CPI
due
to
instruction
cache
stalls
=
1
x
0.1
x
50
=
5
(2)
The
additional
CPI
due
to
data
cache
stalls
=
0.3
x
0.08
x
124
=
2.976
1/
(3)
The
overall
CPI
=
1
+
5
+
2.976
=
8.976
j
MliSA
:
fit05^(direct
mapped)
°
sitb
Silbmlss
rate'fM#±?i-
°
Cache
block
0
1
2
3
Memory
bloek
address
0
1
2
3
4
5
6
7