DRAM Tutorial презентация

Содержание

Слайд 2

DRAM Module and Chip

DRAM Module and Chip

Слайд 3

Goals Cost Latency Bandwidth Parallelism Power Energy

Goals

Cost
Latency
Bandwidth
Parallelism
Power
Energy

Слайд 4

DRAM Chip

DRAM Chip

Слайд 5

Sense Amplifier enable top bottom Inverter

Sense Amplifier

enable

top

bottom

Inverter

Слайд 6

Sense Amplifier – Two Stable States 1 1 0 0 VDD VDD Logical “1” Logical “0”

Sense Amplifier – Two Stable States

1

1

0

0

VDD

VDD

Logical “1”

Logical “0”

Слайд 7

Sense Amplifier Operation 0 VT VB VT > VB 1 0 VDD

Sense Amplifier Operation

0

VT

VB

VT > VB

1

0

VDD

Слайд 8

DRAM Cell – Capacitor Empty State Fully Charged State Logical “0” Logical “1”

DRAM Cell – Capacitor

Empty State

Fully Charged State

Logical “0”

Logical “1”

1

2

Small – Cannot

drive circuits

Reading destroys the state

Слайд 9

Capacitor to Sense Amplifier

Capacitor to Sense Amplifier

Слайд 10

DRAM Cell Operation ½VDD ½VDD 0 1 0 VDD ½VDD+δ

DRAM Cell Operation

½VDD

½VDD

0

1

0

VDD

½VDD+δ

Слайд 11

DRAM Subarray – Building Block for DRAM Chip Row Decoder Cell Array Cell

DRAM Subarray – Building Block for DRAM Chip

Row Decoder

Cell Array

Cell Array

Array

of Sense Amplifiers (Row Buffer) 8Kb
Слайд 12

DRAM Bank Bank I/O (64b) Address Address Data

DRAM Bank

Bank I/O (64b)

Address

Address

Data

Слайд 13

DRAM Chip Shared internal bus Memory channel - 8bits

DRAM Chip

Shared internal bus

Memory channel - 8bits

Слайд 14

DRAM Operation Bank I/O Data 1 2 ACTIVATE Row READ/WRITE Column 3 PRECHARGE

DRAM Operation

Bank I/O

Data

1

2

ACTIVATE Row

READ/WRITE Column

3

PRECHARGE

Row Address

Column Address

Слайд 15

RowClone Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization Y. Kim, C.

RowClone

Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization

Y. Kim, C.

Fallin, D. Lee, R. Ausavarungnirun,
G. Pekhimenko, Y. Luo, O. Mutlu,
P. B. Gibbons, M. A. Kozuch, T. C. Mowry

Vivek Seshadri

Слайд 16

Memory Channel – Bottleneck Core Core Cache MC Memory Channel Limited Bandwidth High Energy

Memory Channel – Bottleneck

Core

Core

Cache

MC

Memory

Channel

Limited Bandwidth

High Energy

Слайд 17

Goal: Reduce Memory Bandwidth Demand Core Core Cache MC Memory Channel Reduce unnecessary data movement

Goal: Reduce Memory Bandwidth Demand

Core

Core

Cache

MC

Memory

Channel

Reduce unnecessary data movement

Слайд 18

Bulk Data Copy and Initialization Bulk Data Copy Bulk Data Initialization src dst dst val

Bulk Data Copy and Initialization

Bulk Data Copy

Bulk Data Initialization

src

dst

dst

val

Слайд 19

Bulk Data Copy and Initialization Bulk Data Copy Bulk Data Initialization src dst dst val

Bulk Data Copy and Initialization

Bulk Data Copy

Bulk Data Initialization

src

dst

dst

val

Слайд 20

Bulk Copy and Initialization – Applications Many more

Bulk Copy and Initialization – Applications

Many more

Слайд 21

Shortcomings of Existing Approach Core Core Cache MC Channel src dst High latency

Shortcomings of Existing Approach

Core

Core

Cache

MC

Channel

src

dst

High latency
(1046ns to copy 4KB)

Interference

High Energy
(3600nJ to

copy 4KB)
Слайд 22

Our Approach: In-DRAM Copy with Low Cost Core Core Cache MC Channel dst

Our Approach: In-DRAM Copy with Low Cost

Core

Core

Cache

MC

Channel

dst

High latency

Interference

High Energy

src

X

X

X

?

Слайд 23

RowClone: In-DRAM Copy

RowClone: In-DRAM Copy

Слайд 24

Two Key Observations Any operation on one sense amplifier can be easily performed

Two Key Observations

Any operation on one sense amplifier can be easily

performed in bulk

Many DRAM cells share the same sense amplifier

1

2

Слайд 25

Bulk Copy in DRAM – RowClone ½VDD ½VDD 0 1 0 VDD ½VDD

Bulk Copy in DRAM – RowClone

½VDD

½VDD

0

1

0

VDD

½VDD +δ

Data gets
copied

Слайд 26

Fast Parallel Mode – Benefits Latency Energy Bulk Data Copy (4KB across a

Fast Parallel Mode – Benefits

Latency

Energy

Bulk Data Copy (4KB across a module)

1046ns

to 90ns

3600nJ to 40nJ

No bandwidth consumption

Very little changes to the DRAM chip

11X

74X

Слайд 27

Fast Parallel Mode – Constraints Location constraint Source and destination in same subarray

Fast Parallel Mode – Constraints

Location constraint
Source and destination in same subarray
Size

constraint
Entire row gets copied (no partial copy)

1

2

Can still accelerate many existing primitives
(copy-on-write, bulk zeroing)

Alternate mechanism to copy data across banks
(pipelined serial mode – lower benefits than Fast Parallel)

Слайд 28

End-to-end System Design Software interface memcpy and meminit instructions Managing cache coherence Use

End-to-end System Design

Software interface
memcpy and meminit instructions
Managing cache coherence
Use existing DMA

support!
Maximizing use of Fast Parallel Mode
Smart OS page allocation
Слайд 29

Applications Summary

Applications Summary

Имя файла: DRAM-Tutorial.pptx
Количество просмотров: 100
Количество скачиваний: 0