54
3/\^
(•
Co-<^^yh(n'^yj
IS
(Average
Memory
Access
Time,
AMAT)»
fxSxR%i®l|—H
t^is^stiPJJ¥±^ie'tSfi^isa^rB^^it»$PT:
AMAT
=
Time
for
a
hit
+
Miss
rate
x
Miss
penalty
•
spTwm^
•
MU¥±^E'isfi#isBf
rai^itmspT:
>
CPU
LI
L2
—»
Ln
Cache Cache
Cache
Hit
Time
Ti
T2
Tn
Miss
rate
Ml
M2
Mn
Miss
penalty
Pi
P2
P„
Memory
AMAT
=
Ti
+
Ml
X
Pi
+
M2
X
P2
+
...
+
Mn
X
Pn
=
Ti
+
^M,.P,.
i=\
[SAmmmT.]
Suppose
that
in
1000
memory
references
there
are
60
misses
in
the
first-level
cache,
30
misses
in
the
second-level
cache,
and
5
misses
in
the
third-level
cache.
Assume
the
miss
penalty
from
the
L3
cache
to
memory
is
100
clock
cycles,
the
hit
time
of
the
L3
cache
is
10
clocks,
the
hit
time
of
the
L2
cache
is
5
clocks,
the
hit
time
of
LI
is
1
clock
cycle,
and
there
are
1.5
memory
references
per
instruction.
(a)
What's
the
global
miss
rate
for
each
level
of
caches?
(b)
What's
the
local
miss
rate
for
each
level
of
caches?
(c)
What
is
the
average
memory
access
time?
(d)
What
is
the
average
stall
cycle
per
instruction?
|
55
Answer
(a)
LI
=
60/1000
=
L2
=
30/1000
=
0^
L3
=
5/1000
=^0)35
(b)
LI
=
60/1000
=
0.06,
L2
=
30/60
=
0.5,
L3
=
5/30
=
0.167
(c)
AMAT
=
1
+
0.0^
X
5
{o^x
10
+
O^bS-x
100
=
2.1
clock
cycles
(d)
(2.1
-1)
X
1.5
=
1.65
clock
cycles
cpi^
The
Average
Memory
Access
Time
equation
(AMAT)
has
three
components:
hit
time,
miss
rate,
and
miss
penalty.
For
each
of
the
following
cache
optimizations,
indicate
which
component
of
the
AMAT
equation
is
improved.
(1)
Using
a
second-level
cache
(2)
Using
a
direct-mapped
cache
,
.
^
L
h(-t
-(iMt
'T'
(3)
Using
a
4-way
set-associative
cache
(4)
Using
a
virtually-addressed
cache
p,
T
'
(5)
Performing
hardware
pre-fetching
using
stream
buffers
^
(6)
Using
a
non-blocking
cache
(7)
Using
larger
blocks
Answer
(1)
miss
penalty
(2)
hit
time
(3)
miss
rate
(4)
hit
time
(5)
miss
rate
(6)
miss
penalty
(7)
miss
rate
56
I
^«AS5lJS'KWl2'lifi
'ft
y
Assume
that
main
memory
accesses
take
70
ns
and
that
memory
accesses
are
36%
of
all
instructions.
The
following
table
shows
data
for
LI
caches
attached
to
each
of
two
processors,
PI
and
P2.
LI
size
LI
miss
rate
LI
hit
time
PI
1KB
11.4%
0.62
ns
P2
2KB
8.0%
0.66
ns
(1)
Assuming
that
the
LI
hit
time
determines
the
cycle
times
for
PI
and
c|»dt
-
P2,
what
are
their
respective
clock
rates?
(2)
What
is
the
AMAT
for
each
of
PI
and
P2?
(3)
Assuming
a
base
CPI
of
1.0,
what
is
the
total
CPI
for
each
of
PI
and
P2?
Which
processor
is
faster?
For
the
next
three
problems,
we
will
consider
the
addition
of
an
L2
cache
to
PI
to
presumably
make
up
for
its
limited
LI
cache
capacity.
Use
the
LI
cache
capacities
and
hit
times
from
the
previous
table
when
solving
the
following
problems.
The
L2
miss
rate
indicated
is
its
local
miss
rate.
L2
size
L2
miss
rate
L2
hit
time
512
KB
98%
3.22
ns
(4)
What
is
the
AMAT
for
PI
with
the
addition
of
an
L2
cache?
Is
the
AMAT
better
or
worse
with
the
L2
cache?
(5)
Assuming
a
base
CPI
of
1.0,
what
is
the
total
CPI
for
PI
with
the
addition
of
an
L2
cache?
(6)
Which
processor
is
faster,
now
that
PI
has
an
L2
cache?
If
PI
is
faster,
what
miss
rate
would
P2
need
in
its
LI
cache
to
match
Pi's
I
57
performance?
If
P2
is
faster,
what
miss
rate
would
PI
need
in
its
LI
cache
to
match
P2's
performance?
Answer
(1)
(2)
(3)
PI
1.61
GHz
8.60
ns
13.87
cycles
18.5
P2
P2
1.52
GHz
6.26
ns
9.48
cycles
12.54
(4)
(5)
8.81
ns
14.21
eyeles
Worse
18.96
(6)
PI
with
L2
cache:
CPI
=
18.96.
P2:
CPl
=
12.54.
P2
is
still
faster
than
PI
even
with
an
L2
caehe
The
miss
rate
for
PI
in
its
LI
eache
should
be
7.83%
to
match
P2's
performance
-4m
jwjvfyte
h
'
y
7'
t£(3)
:
CPlpi
=
1
+
(1.36
x
=
18.5
CPlp2
=
1
+
(1.36
X
0.08
X
70/0.66)
=
12.54
tt(4)
•
L2
global
miss
rate
=
0.114
x
0.98
=
0.11172
0.62
+
0.114
X
3.22
+
0.11172
x
70
=
8.81
ti:(5)
:
1
+
1.36
x
(0.114
x
3.22/0.62
+
0.11172
x
70/0.62)
=
18.96
1i.(6)
•
Suppose
the
LI
cache
miss
rate
is
M
CPl
for
PI
with
second
level
cache
=
1
+
1.36
x
(M
x
3.22
/
0.62
+
M
X
0.98
X
70
/
0.62)
=
1
+
157.54
M
PI
performance
match
P2
performance
implies
both
instruction
times
should
be
the
same
->
(1
+
157.54
M)
x
0.62
=
12.54
x
0.66
M
=
7.83%
58
I
To
capture
the
fact
that
the
time
to
access
data
for
both
hits
and
misses
affects
performance,
designers
often
use
average
memory
access
time
(AMAT)
as
a
way
to
examine
alternative
cache
designs.
Average
memory
access
time
is
the
average
time
to
access
memory
considering
both
hits
and
misses
and
the
frequency
of
different
accesses;
it
is
equal
to
the
following:
AMAT
=
Time
for
a
hit
+
Miss
rate
x
Miss
penalty
AMAT
is
useful
as
a
figure
of
merit
for
different
cache
systems.
(1)
Find
the
AMAT
for
a
processor
with
a
2
ns
clock,
a
miss
penalty
of
20
clock
cycles,
a
miss
rate
of
0.05
misses
per
reference,
and
a
cache
access
time
(including
hit
detection)
of
1
clock
cycle.
Assume
that
the
read
and
write
miss
penalties
are
the
same
and
ignore
other
write
stalls.
(2)
Suppose
we
can
improve
the
miss
rate
to
0.03
misses
per
reference
by
doubling
the
cache
size.
This
causes
the
cache
access
time
to
increase
to
1.2
clock
cycles.
Using
the
AMAT
as
a
metric,
determine
if
this
is
a
good
trade-off.
(3)
If
the
^che
access
time
i
determines
the
processor's
clock
cycle
timp,
which
is
often
the
case,
AMAT
may
not
correctly
indicate
whether
one
cache
organization
is
better
than
another.
If
the
processor's
clock
cycle
time
must
be
changed
to
match
that
of
a
cache,
is
this
a
good
trade-off?
Assume
the processors
are
identical
except
for
the
clock
rate
and
the
number
of
cache
miss
cycles;
assume
1.5
references
per
instruction
and
a
CPI
without
cache
misses
of
2.
The
miss
penalty
is
20
cycles
for
both
processors.
|
59
Answer
(1)
AMAT
=
2ns
+
0.05
x(20x2ns)
=
4ns
y.
(2)
AMAT
=
(1.2
X
2
ns)
+
(20
x
2
ns
x
0.03)
=
2.4
ns
+
1.2
ns
=
3.6
ns
y
Yes,
it's
a
good
choice.
(3)
Execution
timeoid
=
2
x
IC
x
(2
+
1.5
x
20
x
0.05)
=
7IC
Execution
timenew=
2.4
x
IC
x
(2
+
1.5
x
20
x
0.03)
=
6.96
IC
So,
it's
a
good
choice
For
a
data
cache
with
a
92%
hit
rate
and
a
2-cycle
hit
latency,
calculate
the
average
memory
access
latency.
Assume
that
latency
to
memory
and
the
cache
miss
penalty
together
is
124
cycles.
Note:
The
cache
must
be
accessed
after
memory
returns
the
data.
I
~
2.
~t
Answer:
AMAT
=
2
+
0.08
x
124
=
11.92
cycles
\/
Which
of
the
following
is
generally(^m^bout
a
design
with
multiple
levels
of
caches?
(First-level
caches
are
more
concerned
about
hit
time,
and
second-level
caches
are
more
concerned
about
miss
rat^
2.
First-level
caches
are
more
concerned
about
miss
rate,
and
second-level
caches
are
more
concerned
about
hit
time.
Answer:
1
-tlw-c