Age | Commit message (Collapse) | Author |
|
This patch removes the deprecated RubyMemoryControl. The DRAMCtrl
module should be used instead.
|
|
Previously when an InvalidateReq snooped a cache with a dirty block or
a pending modified MSHR, it would invalidate the block or set the
postInv flag. The cache would not send an InvalidateResp. though,
causing memory order violations. This patches changes this behavior,
making the cache with the dirty block or pending modified MSHR the
ordering point.
Change-Id: Ib4c31012f4f6693ffb137cd77258b160fbc239ca
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
|
|
Previously an MSHR with one or more invalidating targets would first
service all targets in the MSHR TargetList and then invalidate the
block. As a result any service snooping targets would lookup in the
cache and incorrectly find the block. This patch forces the
invalidation to happen when the first invalidating target is
encountered.
Change-Id: I9df15de24e1d351cd96f5a2c424d9a03d81c2cce
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
|
|
This patch changes an assertion that previously assumed that a non
invalidating snoop request should never be serviced by an
InvalidateReq MSHR. The MSHR serves as the ordering point for the
snooping packet. When the InvalidateResp reaches the cache the
snooping packet snoops the caches above to find the requested
block. One or more of the caches above will have the block since
earlier it has seen a WriteLineReq.
Change-Id: I0c147c8b5d5019e18bd34adf9af0fccfe431ae07
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
|
|
When the snoopFilter receives a response, it updates its state using
the hasSharers flag (indicates whether there are more than one copies
of the block in the caches above). The hasSharers flag of the packet
was previously populated when the request was traversing and snooping
the caches looking for the block.
1) When the response is coming from the memory-side port, its order
with respect to other responses is not necessarily preserved (e.g., a
request that arrived second to the xbar can get its response first). As
a result the snoopFilter might process responses out of order updating
its residency information using the non valid hasSharers flag which was
populated much earlier.
2) When the response is from an on-chip, the MSHRs preserve a well
defined order and the hasSharers flag should contain valid
information.
This patch changes the snoopFilter by avoiding the hasSharers flag
when the response is from the memory-side port.
Change-Id: Ib2d22a5b7bf3eccac64445127d2ea20ee74bb25b
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
|
|
Previously, a WriteLineReq that missed in a cache would send out an
InvalidateReq if the block lookup failed or an UpgradeReq if the
block lookup succeeded but the block had sharers. This changes ensures
that a WriteLineReq always sends an InvalidateReq to invalidate all
copies of the block and satisfy the WriteLineReq.
Change-Id: I207ff5b267663abf02bc0b08aeadde69ad81be61
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
|
|
Change-Id: Ie3beeef25331f84a0a5bcc17f7a791f4a829695b
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
|
|
This patch fixes an issue where an MSHR would incorrectly be perceived
to provide data to targets arriving after an InvalidateReq. To address
this the InvalidateReq is now treated as isForward, much like an
UpgradeReq that did not hit in the cache.
Change-Id: Ia878444d949539b5c33fd19f3e12b0b8a872275e
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
|
|
Previously DPRINTFs printing information about a packet would use ad hoc
formats. This patch changes all DPRINTFs to use the print function
defined by the packet class, making the packet printing format more
uniform and easier to change.
Change-Id: Idd436a9758d4bf70c86a574d524648b2a2580970
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
|
|
A response to a ReadReq can either be a ReadResp or a
ReadRespWithInvalidate. As we add targets to an MSHR for a ReadReq we
assume that the response will be a ReadResp. When the response is
invalidating (ReadRespWithInvalidate) servicing more than one targets
can potentially violate the memory ordering. This change fixes the way
we handle a ReadRespWithInvalidate. When a cache receives a
ReadRespWithInvalidate we service only the first FromCPU target and
all the FromSnoop targets from the MSHR target list. The rest of the
FromCPU targets are deferred and serviced by a new request.
Change-Id: I75c30c268851987ee5f8644acb46f440b4eeeec2
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
|
|
Previously the information of whether a response was allocating or not
was a property of the MSHR. This change makes this flag a property of
the TargetList. Differernt TargetLists, e.g. the targets and the
deferred targets lists might have different values. Additionally, the
information about whether each of the target expects an allocating
response is stored inside the TargetList container. This allows for
repopulating the flag in case some of the targets are removed.
Change-Id: If3ec2516992f42a6d9da907009ffe3ab8d0d2021
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
|
|
This patch adds support for repopulating the flags of an MSHR
TargetList. The added functionality makes it possible to remove
targets from a TargetList without leaving it in an inconsistent state.
Change-Id: I3f7a8e97bfd3e2e49bebad056d11bbfb087aad91
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
|
|
In MessageBuffer the m_not_avail_count member is incremented but not used.
This causes an overflow reported by ASAN. This patch changes from an int to
Stats::Scalar, since the count is useful in debugging finite MessageBuffers.
|
|
If the cache access mode is parallel, i.e. "sequential_access" parameter
is set to "False", tags and data are accessed in parallel. Therefore,
the hit_latency is the maximum latency between tag_latency and
data_latency. On the other hand, if the cache access mode is
sequential, i.e. "sequential_access" parameter is set to "True",
tags and data are accessed sequentially. Therefore, the hit_latency
is the sum of tag_latency plus data_latency.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
|
|
1. Delete unused variable from struct LinkEntry
2. Correct GarnetExtLink and GarnetIntLink inheritance
|
|
not all uses of MachineID initialize its fields, so here we add a default
ctor.
|
|
SequencerMsg is autogenerated by slicc scripts and the MessageSizeType is
initialized to the max enume value by default. The DMASequencer pushes this
message to the mandatory queue and since the MessageSizeType is unitialized,
string_to_MessageSizeType() function used by traces to print the message fails
with a panic. This patch avoids this problem by initializing MessageSizeType
of SequencerMsg to Request_Control.
|
|
fixes to appease clang++. tested on:
Ubuntu clang version 3.5.0-4ubuntu2~trusty2
(tags/RELEASE_350/final) (based on LLVM 3.5.0)
Ubuntu clang version 3.6.0-2ubuntu1~trusty1
(tags/RELEASE_360/final) (based on LLVM 3.6.0)
the fixes address the following five issues:
1) the exec continuations in gpu_static_inst.hh were marked
as protected when they should be public. here we mark
them as public
2) the Abs instruction uses std::abs() in its execute method.
because Abs is templated, it can also operate on U32 and U64,
types, which cause Abs::execute() to pass uint32_t and uint64_t
types to std::abs() respectively. this triggers a warning
because std::abs() has no effect in this case. to rememdy this
we add template specialization for the execute() method of Abs
when its template paramter is U32 or U64.
3) Some potocols that utilize the code in cprintf.hh were missing
includes to BoolVec.hh, which defines operator<< for the BoolVec
type. This would cause issues when the generated code would try
to pass a BoolVec type to a method in cprintf.hh that used
operator<< on an instance of a BoolVec.
4) Surprise, clang doesn't like it when you clobber all the bits
in a newly allocated object. I.e., this code:
tlb = new GpuTlbEntry\[size\];
std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size);
Let's use std::vector to track the TLB entries in the GpuTlb now...
5) There were a few variables used only in DPRINTFs, so we mark them
with M5_VAR_USED.
|
|
DMA sequencers and protocols can currently only issue one DMA access at
a time. This patch implements the necessary functionality to support
multiple outstanding DMA requests in Ruby.
|
|
the RequestDesc was previously implemented as a std::pair, which made
the implementation overly complex and error prone. here we encapsulate the
packet, primary, and secondary types all in a single data structure with
all members properly intialized in a ctor
|
|
Change-Id: I763cffe0c69f5ebbbf6a6eb12bec5c13d5d0161d
Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
Added power-down state transitions to the DRAM controller model.
Added per rank parameter, outstandingEvents, which tracks the number
of outstanding command events and is used to determine when the
controller should transition to a low power state.
The controller will only transition when there are no outstanding events
scheduled and the number of command entries for the given rank is 0.
The outstandingEvents parameter is incremented for every RD/WR burst,
PRE, and REF event scheduled. ACT is implicitly covered by RD/WR
since burst will always issue and complete after a required ACT.
The parameter is decremented when the event is serviced (completed).
The controller will automatically transition to ACT power down,
PRE power down, or SREF.
Transition to ACT power down state scheduled from:
1) The RespondEvent, where read data is received from the memory.
ACT power-down entry will be scheduled when one or more banks is
open, all commands for the rank have completed (no more commands
scheduled), and there are no commands in queue for the rank
Transition to PRE power down scheduled from:
1) respondEvent, when all banks are closed, all commands have
completed, and there are no commands in queue for the rank
2) prechargeEvent when all banks are closed, all commands have
completed, and there are no commands in queue for the rank
3) refreshEvent, after the refresh is complete when the previous
state was ACT power-down
4) refreshEvent, after the refresh is complete when the previous
state was PRE power-down and there are commands in the queue.
Transition to SREF will be scheduled from:
1) refreshEvent, after the refresh is completes when the previous
state was PRE power-down with no commands in queue
Power-down exit commands are scheduled from:
1) The refreshEvent, prior to issuing a refresh
2) doDRAMAccess, to wake-up the rank for RD/WR command issue.
Self-refresh exit commands are scheduled from:
1) The next request event, when the queue has commands for the rank
in the readQueue or there are commands for the rank in the
writeQueue and the bus state is WRITE.
Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
The per rank statistics are periodically updated based on
state transition and refresh events.
Add a method to update these when a dump event occurs to
ensure they reflect accurate values.
Specifically, need to ensure that the low-power state
durations, power, and energy are logged correctly.
Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
Add constraint that all ranks have to be in PWR_IDLE
before signaling drain complete
This will ensure that the banks are all closed and the rank
has exited any low-power states.
On suspend, update the power stats to sync the DRAM power logic
The logic maintains the location of the signalDrainDone
method, which is still triggered from either:
1) Read response event
2) Next request event
This ensures that the drain will complete in the READ bus
state and minimizes the changes required.
Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
Add local variable to stores commands to be issued.
These commands are in order within a single bank but will be out
of order across banks & ranks.
A new procedure, flushCmdList, sorts commands across banks / ranks,
and flushes the sorted list, up to curTick() to DRAMPower.
This is currently called in refresh, once all previous commands are
guaranteed to have completed. Could be called in other events like
the powerEvent as well.
By only flushing commands up to curTick(), will not get out of sync
when flushed at a periodic stats dump (done in subsequent patch).
Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
Change-Id: I8992ddc1664c3ed4b2d36d8a34e4ce8be113b9de
Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
|
|
|
|
|
|
This removes errors when building gem5.fast
|
|
Revamped version of garnet with more optimized single-cycle routers,
more configurability, and cleaner code.
|
|
Only garnet2.0 will be supported henceforth.
|
|
This patch adds port direction names to the links during topology
creation, which can be used for better printed names for the links
or for users to code up their own adaptive routing algorithms.
It also adds support for every router to have an independent latency
value to support heterogeneous topologies with the subsequent
garnet2.0 patch.
|
|
This patch makes the internal links within the network topology
unidirectional, thus allowing any deadlock-free routing algorithms to
be specified from the topology itself using weights.
This patch also renames Mesh.py and MeshDirCorners.py to
Mesh_XY.py and MeshDirCorners_XY.py (Mesh with XY routing).
It also adds a Mesh_westfirst.py and CrossbarGarnet.py topologies.
|
|
Over the past 6 years, we realized that the protocol is essentially used
to run the garnet network in a standalone manner, and feed standard synthetic
traffic patterns through it.
|
|
Fixed AbstractController::queueMemoryWritePartial to specify the
correct size for partial memory writes.
|
|
print number of bytes written as a decimal number, not hex
|
|
Only map memories into the KVM guest address space that are
marked as usable by KVM. Create BackingStoreEntry class
containing flags for is_conf_reported, in_addr_map, and
kvm_map.
|
|
Previously printing an mshr would trigger an assertion if the MSHR was
not in service or if the targets list was empty. This patch changes
the print function to bypasses the accessor functions for
postInvalidate and postDowngrade and avoid the relevant assertions. It
also checks if the targets list is empty before calling print on it.
Change-Id: Ic18bee6cb088f63976112eba40e89501237cfe62
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Secure and non-secure data can coexist in the cache and therefore the
snoop filter should treat differently packets with secure and non
secure accesses. This patch uses the lower bits of the line address to
keep track of whether the packet is addressing secure memory or not.
Change-Id: I54a5e614dad566a5083582bede86c86896f2c2c1
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
|
|
This patch changes the default behaviour of the SystemXBar, adding a
snoop filter. With the recent updates to the snoop filter allocation
behaviour this change no longer causes problems for the regressions
without caches.
Change-Id: Ibe0cd437b71b2ede9002384126553679acc69cc1
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
|
|
This patch improves the snoop filter allocation decisions by not only
looking at whether a port is snooping or not, but also if the packet
actually came from a cache. The issue with only looking at isSnooping
is that the CPU ports, for example, are snooping, but not actually
caching. Previously we ended up incorrectly allocating entries in
systems without caches (such as the atomic and timing quick
regressions). Eventually these misguided allocations caused the snoop
filter to panic due to an excessive size.
On the request path we now include the fromCache check on the packet
itself, and for responses we check if we actually have a snoop-filter
entry.
Change-Id: Idd2dbc4f00c7e07d331e9a02658aee30d0350d7e
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
|
|
This patch takes yet another step in maintaining the clusivity, in
that it allows a mostly-inclusive cache to hold on to blocks even when
responding to a ReadExReq or UpgradeReq. Previously the cache simply
invalidated these blocks, but there is no strict need to do so.
The most important part of this patch is that we simply mark the block
clean when satisfying the upstream request where the cache is allowed
to keep the block. The only tricky part of the patch is in the memory
management of deferred snoops, where we need to distinguish the cases
where only the packet was copied (we expected to respond), and the
cases where we created an entirely new packet and request (we kept it
only to replay later).
The code in satisfyRequest is definitely ready for some refactoring
after this.
Change-Id: I201ddc7b2582eaa46fb8cff0c7ad09e02d64b0fc
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
|
|
This patch changes how the mostly exclusive policy is enforced to
ensure that we drop blocks when we should. As part of this change, the
actual invalidation due to the clusivity enforcement is moved outside
the hit handling, to a separate method maintainClusivity. For the
timing mode that means we can deal with all MSHR targets before taking
any action and possibly dropping the block. The method
satisfyCpuSideRequest is also renamed satisfyRequest as part of this
change (since we only ever see requests from the cpu-side port).
Change-Id: If6f3d1e0c3e7be9a67b72a55e4fc2ec4a90fd3d2
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
|
|
This patch adds a FromCache attribute to the packet, and updates a
number of the existing request commands to reflect that the request
originates from a cache. The attribute simplifies checking if a
requests came from a cache or not, and this is used by both the cache
and snoop filter in follow-on patches.
Change-Id: Ib0a7a080bbe4d6036ddd84b46fd45bc7eb41cd8f
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Steve Reinhardt <stever@gmail.com>
|
|
There are cases where we want to put boot ROMs on the PIO bus. Ruby
currently doesn't support functional accesses to such memories since
functional accesses are always assumed to go to physical memory. Add
the required support for routing functional accesses to the PIO bus.
Change-Id: Ia5b0fcbe87b9642bfd6ff98a55f71909d1a804e3
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Brad Beckmann <brad.beckmann@amd.com>
Reviewed-by: Michael LeBeane <michael.lebeane@amd.com>
|
|
|
|
Change-Id: I70dd11c23b45dfc606ef08233d2e50fcc0817505
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Currently garnet will not run due to double statistic registration of new
stats in ClockedObject. This occurs because a temporary array named 'cls'
is being added as a child to garnet internal and external link SimObjects.
This patch simply renames the temporary array which prevents it from
being added as a child object and avoids the assertion that a statistic
was already registered.
Committed by Jason Lowe-Power <jason@lowepower.com>
|
|
Sync DRAMPower to external tool
This patch syncs the DRAMPower library of gem5 to the external
one on github (https://github.com/ravenrd/DRAMPower) of which
I am a maintainer.
The version used is the commit:
902a00a1797c48a9df97ec88868f20e847680ae6
from 07. May. 2016.
Committed by Jason Lowe-Power <jason@lowepower.com>
|
|
In this new hmc configuration we have used the existing components in gem5
mainly [SerialLink] [NoncoherentXbar]& [DRAMCtrl] to define 3 different
architecture for HMC.
Highlights
1- It explores 3 different HMC architectures
2- It creates 4-HMC crossbars and attaches 16 vault controllers with it.
This will connect vaults to serial links
3- From the previous version, HMCController with round robin funtionality
is being removed and all the serial links are being accessible directly
from user ports
4- Latency incorporated by HMCController (in previous version) is being
added to SerialLink
Committed by Jason Lowe-Power <jason@lowepower.com>
|