Age | Commit message (Collapse) | Author |
|
Typically, a memory controller is assigned an address range of the
form [start, end). This address range might be interleaved and
therefore only a non-continuous subset of the addresses in the address
range is handed by this controller.
Prior to this patch, the DRAM controller was unaware of the
interleaving and as a result the address range could affect the
mapping of addresses to DRAM ranks, rows and columns. This patch
changes the DRAM controller, to transform the input address to a
continuous range of the form [0, size). As a result the DRAM
controller always operates on a dense and continuous address range
regardlesss of the system configuration.
Change-Id: I7d273a630928421d1854658c9bb0ab34e9360851
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/19328
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Wendy Elsasser <wendy.elsasser@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
|
|
The variable *sys in dram_ctrl.cc was only used in an assert() check,
therefore it has been removed to allow building gem5.fast without
errors. A typo in a comment in abstract_mem.hh has also been corrected.
Change-Id: I2663545449ecfdb5a27c3574b79dd42beb4a49c8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21380
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
|
|
Note that this changes the stat format used by the DRAM
controller. Previously, it would have a structure looking a bit like
this:
- system
- dram: Main DRAM controller
- dram_0: Rank 0
- dram_1: Rank 1
This structure can't be replicated with new-world stats since stats
are confined to the SimObject name space. This means that the new
structure looks like this:
- system
- dram: Main DRAM controller
- rank0: Rank 0
- rank1: Rank 1
Change-Id: I7435cfaf137c94b0c18de619d816362dd0da8125
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/21142
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Wendy Elsasser <wendy.elsasser@arm.com>
|
|
Adding an option to enable DRAM low-power states. The low power
states can have a significant impact on application performance
(sim_ticks) on the order of 2-3x, especially for compute-gpu apps.
The options allows for it to easily be enabled/disabled to compare
performance numbers. The option is disabled by default.
Change-Id: Ib9bddbb792a1a6a4afb5339003472ff8f00a5859
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18548
Reviewed-by: Wendy Elsasser <wendy.elsasser@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Tested-by: kokoro <noreply+kokoro@google.com>
|
|
MemObject doesn't provide anything beyond its base ClockedObject any
more, so this change removes it from most inheritance hierarchies.
Occasionally MemObject is replaced with SimObject when I was fairly
confident that the extra functionality of ClockedObject wasn't needed.
Change-Id: Ic014ab61e56402e62548e8c831eb16e26523fdce
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/18289
Tested-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
DRAMCtrl's decodeAddr does not need to modify the packet it
receives, nor should it modify the contents of the class,
and therefore both the packet and the function are made const.
Change-Id: I577f48d9a43611ba54878a9a793cb7b4fbb326f4
Signed-off-by: Daniel R. Carvalho <odanrc@yahoo.com.br>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17540
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
|
For atomic RMW instructions that go directly to memory, we want to put
them on the write queue instead of the read queue. Swap the if/else
condition to accomplish this.
Note: This is ignoring the read latency of the RMW, but these
instructions should usually be handled in caches anyway.
Change-Id: I62dbfff3a16ac470f1ebdb489abe878962b20bb6
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17828
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
|
Replace the getMasterPort, getSlavePort, and getEthPort functions
with getPort, and remove extraneous mechanisms that are no longer
necessary.
Change-Id: Iab7e3c02d2f3a0cf33e7e824e18c28646b5bc318
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/17040
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
A packet queue is typically used to hold on to packets that are
schedules to be sent in the future or when they need to queue behind
younger packets that have been sent out yet. Due to memory order
requirements, some MemObjects need to maintain the order for packet
(mostly responses) that reference the same cache block.
Prior to this patch the ordering requirements where determined when
the packet was scheduled to be sent. This patch moves the parameter to
the constructor.
Change-Id: Ieb4d94e86bc7514f5036b313ec23ea47dd653164
Signed-off-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/c/15555
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
This patch is turning DRAMCtrl a QoS-aware Memory Controller with "no
policy" as a default policy.
Change-Id: I48163da8c8208498cf0398b07094cb840272507f
Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/11973
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
|
Packet::checkFunctional also wrote data to/from the packet depending
on if it was read/write, respectively, which the 'check' in the name
would suggest otherwise. This renames it to doFunctional, which is
more suggestive. It also renames any function called checkFunctional
which calls Packet::checkFunctional. These are
- Bridge::BridgeMasterPort::checkFunctional
- calls Packet::checkFunctional
- MSHR::checkFunctional
- calls Packet::checkFunctional
- MSHR::TargetList::checkFunctional
- calls Packet::checkFunctional
- Queue<>::checkFunctional
(of src/mem/cache/queue.hh, not src/cpu/minor/buffers.h)
- Instantiated with Queue<WriteQueueEntry> and Queue<MSHR>
- WriteQueueEntry
- calls Packet::checkFunctional
- WriteQueueEntry::TargetList
- calls Packet::checkFunctional
- MemDelay::checkFunctional
- calls QueuedSlavePort/QueuedMasterPort::checkFunctional
- Packet::checkFunctional
- PacketQueue::checkFunctional
- calls Packet::checkFunctional
- QueuedSlavePort::checkFunctional
- calls PacketQueue::doFunctional
- QueuedMasterPort::checkFunctional
- calls PacketQueue::doFunctional
- SerialLink::SerialLinkMasterPort::checkFunctional
- calls Packet::doFunctional
Change-Id: Ieca2579c020c329040da053ba8e25820801b62c5
Reviewed-on: https://gem5-review.googlesource.com/11810
Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
This patch has 2 main aspects:
1) Add new parameter to adjust write-to-write delay
2) Enable support of more than 64 banks per controller
Changes for new parameter:
Incorporated a new parameter, tCCD_L_WR, which defaults to tCCD_L.
This parameter can be used to set a unique delay between writes and
between reads.
To incorporate this parameter in the controller, modified the DRAMCtrl
class to have separate variables for read and write column delays.
Used these variables to account for tRTW, tWTR, tBURST, tCCD_L, and tCS
as well as the new tCCD_L_WR parameter.
Changes to support more than 64 banks:
Modified the logic selecting the next command (reorderQueue
and minBankPrep functions). Replaced the unint64_t variables with
a vector of uint32_t elements. There is a uint32_t element defined
per ranks to allow up to 32 banks per rank. This will automatically
scale with ranks without issue.
Change will allow analysis of memory sub-systems beyond the current
landscape.
Change-Id: I0ce466efed58276f843ad90e9ecc0ece6c37d646
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10103
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
|
Self-refresh is entered during a refresh event, when the
rank was previously in a precharge power-down state.
The original code would enter self-refresh after a refresh
was issued. The device subsequently will issue a refresh
on self-refresh entry. On self-refresh exit, the controller
will issue another refresh command.
Devices require at least one additional refresh to be issued
between self-refresh exit and re-entry. This ensures that enough
refreshes occur in the case when the device narrowly missed a
refresh on self-refresh exit.
To minimize the number of refresh operations and still maintain
the device requirement, the current logic does the following:
1) The controller will still enter self-refresh from a refresh
event, when the previous state was precharge power-down.
However, the refresh itself will be bypassed and the controller
will immediately issue a self-refresh entry.
2) On a self-refresh exit, the controller will immediately
issue a refresh command (per the original logic). This ensures
the devices requirements are met and is a convenient way to
kick off the command state machine.
Change-Id: I1c4b0dcbfa3bdafd755f3ccd65e267fcd700c491
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/10102
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
|
Removal of unused/barely used 'using namespace' from C++ files.
Change-Id: I66dc548c04506db2e41180b9ea7ab5abd7d5375a
Reviewed-on: https://gem5-review.googlesource.com/9601
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
|
This patch syncs the DRAMPower library of gem5 to the
external github (https://github.com/ravenrd/DRAMPower).
The version pulled in is the commit:
90d6290f802c29b3de9e10233ceee22290907ce6
from 30th Oct. 2016.
This change also modifies the DRAM Ctrl interaction with the
DRAMPower, due to changes in the lib API in the above version.
Previously multiple functions were called to prepare the power
lib before calling the function that would calculate the enery. With
the new API, these functions are encompassed inside the function to
calculate the energy and therefore should now be removed from the
DRAM controller.
The other key difference is the introduction of a new function called
calcWindowEnergy which can be useful for any system that wants
to do measurements over intervals. For gem5 DRAM ctrl that means we
now need to accumulate the window energy measurements into the total
stat.
Change-Id: I3570fff2805962e166ff2a1a3217ebf2d5a197fb
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/5724
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
NOTE: With this change there is a possibility for `DRAMCtrl::Rank`s
event names to not properly match the rank they were generated by. This
could occur if the public rank member is modified after the Rank's
construction. A patch would mean refactoring Rank and `DRAMCtrl`b to
privatize many of the members of Rank behind getters.
Change-Id: I7b8bd15086f4ffdfd3f40be4aeddac5e786fd78e
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3745
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
|
This change was made so Rank objects have their name assigned
when they are instantiated. Therefore, they can initialize their
member objects with their name and it is less likely to change during
runtime.
(NOTE: I would recommend hiding the fields which would cause the name to
change behind getters. Since modification of `Rank.rank` during runtime
will cause the `name()` to change.)
Change-Id: Id51c3553b40e489792c57950e18b8ce927e43173
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3742
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
|
|
Assertion in the respondEvent erroneously fired.
The assertion verifies that the controller has not moved to a low-power
state prior to receiving read data from the memory.
The original assertion triggered if the state was not:
PWR_IDLE or PWR_ACT.
In the case that failed, a periodic refresh event occurred around the
read. The REF is stalled until the final read burst is issued
and the subsequent PRE closes the bank. While the PRE will temporarily
move the state to PWR_IDLE, state will immediately transition to PWR_REF
due to the pending refresh operation. This state does not match the
assertion, which is subsequently triggered.
Fixed the assertion by explicitly checking that the state is not a low
power state
!PWR_SREF && !PWR_PRE_PDN && !PWR_ACT_PDN
Change-Id: I82921a733bbeac2bcb5a487c2f981448d41ed50b
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
|
|
Added power-down state transitions to the DRAM controller model.
Added per rank parameter, outstandingEvents, which tracks the number
of outstanding command events and is used to determine when the
controller should transition to a low power state.
The controller will only transition when there are no outstanding events
scheduled and the number of command entries for the given rank is 0.
The outstandingEvents parameter is incremented for every RD/WR burst,
PRE, and REF event scheduled. ACT is implicitly covered by RD/WR
since burst will always issue and complete after a required ACT.
The parameter is decremented when the event is serviced (completed).
The controller will automatically transition to ACT power down,
PRE power down, or SREF.
Transition to ACT power down state scheduled from:
1) The RespondEvent, where read data is received from the memory.
ACT power-down entry will be scheduled when one or more banks is
open, all commands for the rank have completed (no more commands
scheduled), and there are no commands in queue for the rank
Transition to PRE power down scheduled from:
1) respondEvent, when all banks are closed, all commands have
completed, and there are no commands in queue for the rank
2) prechargeEvent when all banks are closed, all commands have
completed, and there are no commands in queue for the rank
3) refreshEvent, after the refresh is complete when the previous
state was ACT power-down
4) refreshEvent, after the refresh is complete when the previous
state was PRE power-down and there are commands in the queue.
Transition to SREF will be scheduled from:
1) refreshEvent, after the refresh is completes when the previous
state was PRE power-down with no commands in queue
Power-down exit commands are scheduled from:
1) The refreshEvent, prior to issuing a refresh
2) doDRAMAccess, to wake-up the rank for RD/WR command issue.
Self-refresh exit commands are scheduled from:
1) The next request event, when the queue has commands for the rank
in the readQueue or there are commands for the rank in the
writeQueue and the bus state is WRITE.
Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
The per rank statistics are periodically updated based on
state transition and refresh events.
Add a method to update these when a dump event occurs to
ensure they reflect accurate values.
Specifically, need to ensure that the low-power state
durations, power, and energy are logged correctly.
Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
Add constraint that all ranks have to be in PWR_IDLE
before signaling drain complete
This will ensure that the banks are all closed and the rank
has exited any low-power states.
On suspend, update the power stats to sync the DRAM power logic
The logic maintains the location of the signalDrainDone
method, which is still triggered from either:
1) Read response event
2) Next request event
This ensures that the drain will complete in the READ bus
state and minimizes the changes required.
Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
Add local variable to stores commands to be issued.
These commands are in order within a single bank but will be out
of order across banks & ranks.
A new procedure, flushCmdList, sorts commands across banks / ranks,
and flushes the sorted list, up to curTick() to DRAMPower.
This is currently called in refresh, once all previous commands are
guaranteed to have completed. Could be called in other events like
the powerEvent as well.
By only flushing commands up to curTick(), will not get out of sync
when flushed at a periodic stats dump (done in subsequent patch).
Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
|
|
This patch introduces the ability of making the coherent crossbar the
point of coherency. If so, the crossbar does not forward packets where
a cache with ownership has already committed to responding, and also
does not forward any coherency-related packets that are not intended
for a downstream memory controller. Thus, invalidations and upgrades
are turned around in the crossbar, and the memory controller only sees
normal reads and writes.
In addition this patch moves the express snoop promotion of a packet
to the crossbar, thus allowing the downstream cache to check the
express snoop flag (as it should) for bypassing any blocking, rather
than relying on whether a cache is responding or not.
|
|
Result of running 'hg m5style --skip-all --fix-control -a'.
|
|
This patch changes the name of a bunch of packet flags and MSHR member
functions and variables to make the coherency protocol easier to
understand. In addition the patch adds and updates lots of
descriptions, explicitly spelling out assumptions.
The following name changes are made:
* the packet memInhibit flag is renamed to cacheResponding
* the packet sharedAsserted flag is renamed to hasSharers
* the packet NeedsExclusive attribute is renamed to NeedsWritable
* the packet isSupplyExclusive is renamed responderHadWritable
* the MSHR pendingDirty is renamed to pendingModified
The cache states, Modified, Owned, Exclusive, Shared are also called
out in the cache and MSHR code to make it easier to understand.
|
|
This patch enforces insertion order transmission of packets on the
response path in the cache. Note that the logic to enforce order is
already present in the packet queue, this patch simply turns it on for
queues in the response path.
Without this patch, there are corner cases where a request-response is
faster than a response-response forwarded through the cache. This
violation of queuing order causes problems in the snoop filter leaving
it with inaccurate information. This causes assert failures in the
snoop filter later on.
A follow on patch relaxes the order enforcement in the packet queue to
limit the performance impact.
|
|
This patch aligns how the memory-system slaves, i.e. the various
memory controllers and the bridge, identify and deal with sinking of
inhibited packets that are only useful within the coherent part of the
memory system.
In the future we could shift the onus to the crossbar, and add a
parameter "is_point_of_coherence" that would allow it to sink the
aforementioned packets.
|
|
This patch unifies how we deal with delayed packet deletion, where the
receiving slave is responsible for deleting the packet, but the
sending agent (e.g. a cache) is still relying on the pointer until the
call to sendTimingReq completes. Previously we used a mix of a
deletion vector and a construct using unique_ptr. With this patch we
ensure all slaves use the latter approach.
|
|
A few minor fixes to issues identified by the clang static analyzer.
|
|
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.
This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.
Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
|
|
Draining is currently done by traversing the SimObject graph and
calling drain()/drainResume() on the SimObjects. This is not ideal
when non-SimObjects (e.g., ports) need draining since this means that
SimObjects owning those objects need to be aware of this.
This changeset moves the responsibility for finding objects that need
draining from SimObjects and the Python-side of the simulator to the
DrainManager. The DrainManager now maintains a set of all objects that
need draining. To reduce the overhead in classes owning non-SimObjects
that need draining, objects inheriting from Drainable now
automatically register with the DrainManager. If such an object is
destroyed, it is automatically unregistered. This means that drain()
and drainResume() should never be called directly on a Drainable
object.
While implementing the new functionality, the DrainManager has now
been made thread safe. In practice, this means that it takes a lock
whenever it manipulates the set of Drainable objects since SimObjects
in different threads may create Drainable objects
dynamically. Similarly, the drain counter is now an atomic_uint, which
ensures that it is manipulated correctly when objects signal that they
are done draining.
A nice side effect of these changes is that it makes the drain state
changes stricter, which the simulation scripts can exploit to avoid
redundant drains.
|
|
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
|
|
This patch updates the command arbitration so that bank group timing
as well as rank-to-rank delays will be taken into account. The
resulting arbitration no longer selects commands (prepped or not) that
cannot issue seamlessly if there are commands that can issue
back-to-back, minimizing the effect of rank-to-rank (tCS) & same bank
group (tCCD_L) delays.
The arbitration selects a new command based on the following priority.
Within each priority band, the arbitration will use FCFS to select the
appropriate command:
1) Bank is prepped and burst can issue seamlessly, without a bubble
2) Bank is not prepped, but can prep and issue seamlessly, without a
bubble
3) Bank is prepped but burst cannot issue seamlessly. In this case, a
bubble will occur on the bus
Thus, to enable more parallelism in subsequent selections, an
unprepped packet is given higher priority if the bank prep can be
hidden. If the bank prep cannot be hidden, the selection logic will
choose a prepped packet that cannot issue seamlessly if one exist.
Otherwise, the default selection will choose the packet with the
minimum bank prep delay.
|
|
This patch adds a simple lookup structure to avoid iterating over the
write queue to find read matches, and for the merging of write
bursts. Instead of relying on iteration we simply store a set of
currently-buffered write-burst addresses and compare against
these. For the reads we still perform the iteration if we have a
match. For the writes, we rely entirely on the set. Note that there
are corner-cases where sub-bursts would actually not be mergeable
without a read-modify-write. We ignore these cases and opt for speed.
|
|
This patch adds eviction notices to the caches, to provide accurate
tracking of cache blocks in snoop filters. We add the CleanEvict
message to the memory heirarchy and use both CleanEvicts and
Writebacks with BLOCK_CACHED flags to propagate notice of clean and
dirty evictions respectively, down the memory hierarchy. Note that the
BLOCK_CACHED flag indicates whether there exist any copies of the
evicted block in the caches above the evicting cache.
The purpose of the CleanEvict message is to notify snoop filters of
silent evictions in the relevant caches. The CleanEvict message
behaves much like a Writeback. CleanEvict is a write and a request but
unlike a Writeback, CleanEvict does not have data and does not need
exclusive access to the block. The cache generates the CleanEvict
message on a fill resulting in eviction of a clean block. Before
travelling downwards CleanEvict requests generate zero-time snoop
requests to check if the same block is cached in upper levels of the
memory heirarchy. If the block exists, the cache discards the
CleanEvict message. The snoops check the tags, writeback queue and the
MSHRs of upper level caches in a manner similar to snoops generated
from HardPFReqs. Currently CleanEvicts keep travelling towards main
memory unless they encounter the block corresponding to their address
or reach main memory (since we have no well defined point of
serialisation). Main memory simply discards CleanEvict messages.
We have modified the behavior of Writebacks, such that they generate
snoops to check for the presence of blocks in upper level caches. It
is possible in our current implmentation for a lower level cache to be
writing back a block while a shared copy of the same block exists in
the upper level cache. If the snoops find the same block in upper
level caches, we set the BLOCK_CACHED flag in the Writeback message.
We have also added logic to account for interaction of other message
types with CleanEvicts waiting in the writeback queue. A simple
example is of a response arriving at a cache removing any CleanEvicts
to the same address from the cache's writeback queue.
|
|
Both open_adaptive and close_adaptive page polices keep the page
open if a row hit is found. If a row hit is not found, close_adaptive
page policy precharges the row, and open_adaptive policy precharges
the row only if there is a bank conflict request waiting in the queue.
This patch makes the checks for above conditions simpler.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch makes the caches and memory controllers consume the delay
that is annotated to a packet by the crossbar. Previously many
components simply threw these delays away. Note that the devices still
do not pay for these delays.
|
|
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
|
|
This patch clarifies the packet timings annotated
when going through a crossbar.
The old 'firstWordDelay' is replaced by 'headerDelay' that represents
the delay associated to the delivery of the header of the packet.
The old 'lastWordDelay' is replaced by 'payloadDelay' that represents
the delay needed to processing the payload of the packet.
For now the uses and values remain identical. However, going forward
the payloadDelay will be additive, and not include the
headerDelay. Follow-on patches will make the headerDelay capture the
pipeline latency incurred in the crossbar, whereas the payloadDelay
will capture the additional serialisation delay.
|
|
This patch fixes a bug where the DRAM controller tried to access the
system cacheline size before the system pointer was initialised. It
also fixes a bug where the granularity is 0 (no interleaving).
|
|
This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).
The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).
|
|
This patch addresses an issue seen with the KVM CPU where the refresh
events scheduled by the DRAM controller forces the simulator to switch
out of the KVM mode, thus killing performance.
The current patch works around the fact that we currently have no
proper API to inform a SimObject of the mode switches. Instead we rely
on drainResume being called after any switch, and cache the previous
mode locally to be able to decide on appropriate actions.
The switcheroo regression require a minor stats bump as a result.
|
|
This patch adds rank-wise refresh to the controller, as opposed to the
channel-wide refresh currently in place. In essence each rank can be
refreshed independently, and for this to be possible the controller
is extended with a state machine per rank.
Without this patch the data bus is always idle during a refresh, as
all the ranks are refreshing at the same time. With the rank-wise
refresh it is possible to use one rank while another one is
refreshing, and thus the data bus can be kept busy.
The patch introduces a Rank class to encapsulate the state per rank,
and also shifts all the relevant banks, activation tracking etc to the
rank. The arbitration is also updated to consider the state of the rank.
|
|
Fix a minor issue that affects multi-rank systems.
|
|
This patch adds a first cut GDDR5 config to accommodate the users
combining gem5 and GPUSim. The config is based on a SK Hynix
datasheet, and the Nvidia GTX580 specification. Someone from the
GPUSim user-camp should tweak the default page-policy and static
frontend and backend latencies.
|
|
|
|
Ensure that we do the proper event scheduling also when the activation
limit is disabled.
|
|
This patch adds the size of the DRAM device to the DRAM config. It
also compares the actual DRAM size (calculated using information from
the config) to the size defined in the system. If these two values do
not match gem5 will print a warning. In order to do correct DRAM
research the size of the memory defined in the system should match the
size of the DRAM in the config. The timing and current parameters
found in the DRAM configs are defined for a DRAM device with a
specific size and would differ for another device with a different
size.
|