Age | Commit message (Collapse) | Author |
|
Add some missing initialisation, and fix a handful benign resource
leaks (including some false positives).
|
|
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.
As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.
--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
|
|
Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
|
|
Support full-block writes directly rather than requiring RMW:
* a cache line is allocated in the cache upon receipt of a
WriteInvalidateReq, not the WriteInvalidateResp.
* only top-level caches allocate the line; the others just pass
the request along and invalidate as necessary.
* to close a timing window between the *Req and the *Resp, a new
metadata bit tracks whether another cache has read a copy of
the new line before the writeback to memory.
|
|
Previously, they were treated so much like loads that they could stall
at the head of the ROB. Now they are always treated like L1 hits.
If they actually miss, a new request is created at the L1 and tracked
from the MSHRs there if necessary (i.e. if it didn't coalesce with
an existing outstanding load).
|
|
Put the packet type swizzling (that is currently done in a lot of places)
into a refineCommand() member function.
|
|
This patch squashes prefetch requests from downstream caches,
so that they do not steal cachelines away from caches closer
to the cpu. It was originally coded by Mitch Hayenga and
modified by Aasheesh Kolli.
|
|
This patch adds the basic building blocks required to support e.g. ARM
TrustZone by discerning secure and non-secure memory accesses.
|
|
Add a "const" keywords to the getters in the Packet class so these can be
invoked on const Packet objects.
|
|
This patch provides useful printouts throughut the memory system. This
includes pretty-printed cache tags and function call messages
(call-stack like).
|
|
This patch changes the bus-related time accounting done in the packet
to be relative. Besides making it easier to align the cache timing to
cache clock cycles, it also makes it possible to create a Last-Level
Cache (LLC) directly to a memory controller without a bus inbetween.
The bus is unique in that it does not ever make the packets wait to
reflect the time spent forwarding them. Instead, the cache is
currently responsible for making the packets wait. Thus, the bus
annotates the packets with the time needed for the first word to
appear, and also the last word. The cache then delays the packets in
its queues before passing them on. It is worth noting that every
object attached to a bus (devices, memories, bridges, etc) should be
doing this if we opt for keeping this way of accounting for the bus
timing.
|
|
This patch removes the time field from the packet as it was only used
by the preftecher. Similar to the packet queue, the prefetcher now
wraps the packet in a deferred packet, which also has a tick
representing the absolute time when the packet should be sent.
|
|
This patch fixes a potential deadlock in the caches. This deadlock
could occur when more than one cache is used in a system, and
pkt->senderState is modified in between the two caches. This happened
as the caches relied on the senderState remaining unchanged, and used
it for instantaneous upstream communication with other caches.
This issue has been addressed by iterating over the linked list of
senderStates until we are either able to cast to a MSHR* or
senderState is NULL. If the cast is successful, we know that the
packet has previously passed through another cache, and therefore
update the downstreamPending flag accordingly. Otherwise, we do
nothing.
|
|
This patch adds a predecessor field to the SenderState base class to
make the process of linking them up more uniform, and enable a
traversal of the stack without knowing the specific type of the
subclasses.
There are a number of simplifications done as part of changing the
SenderState, particularly in the RubyTest.
|
|
For example if DRAM is at two locations and mirrored this patch allows the
mirroring to occur.
|
|
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).
The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.
Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
|
|
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
|
|
This patch removes the Packet::NodeID typedef and unifies it with the
Port::PortId. The src and dest fields in the packet are used to hold a
port id (e.g. in the bus), and thus the two should actually be the
same.
The typedef PortID is now global (in base/types.hh) and aligned with
the ThreadID in terms of capitalisation and naming of the
InvalidPortID constant.
Before this patch, two flags were used for valid destination and
source, rather than relying on a named value (InvalidPortID), and
this is now redundant, as the src and dest field themselves are
sufficient to tell whether the current value is a valid port
identifier or not. Consequently, the VALID_SRC and VALID_DST are
removed.
As part of the cleaning up, a number of int parameters and local
variables are updated to use PortID.
Note that Ruby still has its own NodeID typedef. Furthermore, the
MemObject getMaster/SlavePort still has an int idx parameter with a
default value of -1 which should eventually change to PortID idx =
InvalidPortID.
|
|
This patch updates the comments for the src and dest fields to reflect
their actual use. Due to a number of patches (e.g. removing the
Broadcast flag), the old comments are no longer indicative of the
current usage.
|
|
This patch removes unused commands and attributes from the packet to
avoid any confusion. It is part of an effort to clear up how and where
different commands and attributes are used.
|
|
This patch simplifies the packet by removing the broadcast flag and
instead more firmly relying on (and enforcing) the semantics of
transactions in the classic memory system, i.e. request packets are
routed from a master to a slave based on the address, and when they
are created they have neither a valid source, nor destination. On
their way to the slave, the request packet is updated with a source
field for all modules that multiplex packets from multiple master
(e.g. a bus). When a request packet is turned into a response packet
(at the final slave), it moves the potentially populated source field
to the destination field, and the response packet is routed through
any multiplexing components back to the master based on the
destination field.
Modules that connect multiplexing components, such as caches and
bridges store any existing source and destination field in the sender
state as a stack (just as before).
The packet constructor is simplified in that there is no longer a need
to pass the Packet::Broadcast as the destination (this was always the
case for the classic memory system). In the case of Ruby, rather than
using the parameter to the constructor we now rely on setDest, as
there is already another three-argument constructor in the packet
class.
In many places where the packet information was printed as part of
DPRINTFs, request packets would be printed with a numeric "dest" that
would always be -1 (Broadcast) and that field is now removed from the
printing.
|
|
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
|
|
This command will be sent from the memory system (Ruby) to the LSQ of
an O3 CPU so that the LSQ, if it needs to, invalidates the address in
the request packet.
|
|
This adds the derived class FunctionalPacket to fix a long standing
deficiency in the Packet class where it was unable to handle finding data to
partially satisfy a functional access. Made this a derived class as
functional accesses are used only in certain contexts and to not add any
additional overhead to the existing Packet class.
|
|
This patch rpovides functional access support in Ruby. Currently only
the M5Port of RubyPort supports functional accesses. The support for
functional through the PioPort will be added as a separate patch.
|
|
|
|
|
|
The packet now identifies whether static or dynamic data has been allocated and
is used by Ruby to determine whehter to copy the data pointer into the ruby
request. Subsequently, Ruby can be told not to update phys memory when
receiving packets.
|
|
This step makes it easy to replace the accessor functions
(which still access a global variable) with ones that access
per-thread curTick values.
|
|
|
|
If we write back an exclusive copy, we now mark it
as such, so the cache receiving the writeback can
mark its copy as exclusive. This avoids some
unnecessary upgrade requests when a cache later
tries to re-acquire exclusive access to the block.
|
|
Corrects an oversight in cset f97b62be544f. The fix there only
failed queued SCUpgradeReq packets that encountered an
invalidation, which meant that the upgrade had to reach the L2
cache. To handle pending requests in the L1 we must similarly
fail StoreCondReq packets too.
|
|
|
|
Added support so that ruby can determine the outcome of store conditional
operations and reflect that outcome to M5 physical memory and cpus.
|
|
Requires new "SCUpgradeReq" message that marks upgrades
for store conditionals, so downstream caches can fail
these when they run into invalidations.
See http://www.m5sim.org/flyspray/task/197
|
|
Only set the dirty bit when we actually write to a block
(not if we thought we might but didn't, as in a failed
SC or CAS). This requires makeing sure the dirty bit
stays set when we get an exclusive (writable) copy
in a cache-to-cache transfer from another owner, which
n turn requires copying the mem-inhibit flag from
timing-mode requests to their associated responses.
|
|
|
|
|
|
|
|
|
|
|
|
--HG--
rename : src/sim/host.hh => src/base/types.hh
|
|
This frees up needed space for more public flags. Also:
- remove unused Request accessor methods
- make Packet use public Request accessors, so it need not be a friend
|
|
|
|
|
|
|
|
I did some of the flags and assertions wrong. Thanks to Brad Beckmann
for pointing this out. I should have run the opt regressions instead
of the fast. I also screwed up some of the logical functions in the Flags
class.
|
|
|
|
memory range.
|
|
|