Age | Commit message (Collapse) | Author |
|
Without this tweak, a prefetcher will happily prefetch data that will
promptly be invalidated and overwritten by a WriteInvalidate.
|
|
The cache's MemSidePacketQueue schedules a sendEvent based upon
nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever
a future prefetch is ready. However, a prefetch being ready does not guarentee
that it can obtain an MSHR. So, when all MSHRs are full,
the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond
until an MSHR is finally freed and the prefetch can happen.
This patch fixes this by not signaling the prefetch ready time if the prefetch
could not be generated. The event is rescheduled as soon as a MSHR becomes
available.
|
|
Previously the code commented about an unhandled case where it might be
possible for a writeback to arrive after a prefetch was generated but
before it was sent to the memory system. I hit that case. Luckily
the prefetchSquash() logic already in the code handles dropping prefetch
request in certian circumstances.
|
|
Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.
Finally, the stride-based prefetcher now assumes a custimizable lookup table
(sets/ways) rather than the previous fully associative structure.
|
|
Adds a new parameter that reserves some number of MSHR entries for demand
accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand
requests from the CPU to stall.
|
|
This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).
The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).
|
|
This patch addresses an issue seen with the KVM CPU where the refresh
events scheduled by the DRAM controller forces the simulator to switch
out of the KVM mode, thus killing performance.
The current patch works around the fact that we currently have no
proper API to inform a SimObject of the mode switches. Instead we rely
on drainResume being called after any switch, and cache the previous
mode locally to be able to decide on appropriate actions.
The switcheroo regression require a minor stats bump as a result.
|
|
This patch adds rank-wise refresh to the controller, as opposed to the
channel-wide refresh currently in place. In essence each rank can be
refreshed independently, and for this to be possible the controller
is extended with a state machine per rank.
Without this patch the data bus is always idle during a refresh, as
all the ranks are refreshing at the same time. With the rank-wise
refresh it is possible to use one rank while another one is
refreshing, and thus the data bus can be kept busy.
The patch introduces a Rank class to encapsulate the state per rank,
and also shifts all the relevant banks, activation tracking etc to the
rank. The arbitration is also updated to consider the state of the rank.
|
|
Fix a minor issue that affects multi-rank systems.
|
|
This patch adds the stack distance calculator to the CommMonitor. The
stats are disabled by default.
|
|
This patch adds a stand-alone stack distance calculator. The stack
distance calculator is a passive SimObject that observes the addresses
passed to it. It calculates stack distances (LRU Distances) of
incoming addresses based on the partial sum hierarchy tree algorithm
described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043.
For each transaction a hashtable look-up is performed. At every
non-unique transaction the tree is traversed from the leaf at the
returned index to the root, the old node is deleted from the tree, and
the sums (to the right) are collected and decremented. The collected
sum represets the stack distance of the found node. At every unique
transaction the stack distance is returned as
numeric_limits<uint64>::max().
In addition to the basic stack distance calculation, a feature to mark
an old node in the tree is added. This is useful if it is required to
see the reuse pattern. For example, Writebacks to the lower level
(e.g. membus from L2), can be marked instead of being removed from the
stack (isMarked flag of Node set to True). And then later if this same
address is accessed (by L1), the value of the isMarked flag would be
True. This gives some insight on how the Writeback policy of the
lower level affect the read/write accesses in an application.
Debugging is enabled by setting the verify flag to true. Debugging is
implemented using a dummy stack that behaves in a naive way, using STL
vectors. Note that this has a large impact on run time.
|
|
This patch adds the MemChecker and MemCheckerMonitor classes. While
MemChecker can be integrated anywhere in the system and is independent,
the most convenient usage is through the MemCheckerMonitor -- this
however, puts limitations on where the MemChecker is able to observe
read/write transactions.
|
|
This patch takes a clean-slate approach to providing WriteInvalidate
(write streaming, full cache line writes without first reading)
support.
Unlike the prior attempt, which took an aggressive approach of directly
writing into the cache before handling the coherence actions, this
approach follows the existing cache flows as closely as possible.
|
|
Prepare for a different implementation following in the next patch
|
|
This patch allows objects to get the src/dest of a packet even if it
is not set to a valid port id. This simplifies (ab)using the bridge as
a buffer and latency adapter in situations where the neighbouring
MemObjects are not crossbars.
The checks that were done in the packet are now shifted to the
crossbar where the fields are used to index into the port
arrays. Thus, the carrier of the information is not burdened with
checking, and the crossbar can check not only that the destination is
set, but also that the port index is within limits.
|
|
This patch attempts to make the rules for data allocation in the
packet explicit, understandable, and easy to verify. The constructor
that copies a packet is extended with an additional flag "alloc_data"
to enable the call site to explicitly say whether the newly created
packet is short-lived (a zero-time snoop), or has an unknown life-time
and therefore should allocate its own data (or copy a static pointer
in the case of static data).
The tricky case is the static data. In essence this is a
copy-avoidance scheme where the original source of the request (DMA,
CPU etc) does not ask the memory system to return data as part of the
packet, but instead provides a pointer, and then the memory system
carries this pointer around, and copies the appropriate data to the
location itself. Thus any derived packet actually never copies any
data. As the original source does not copy any data from the response
packet when arriving back at the source, we must maintain the copy of
the original pointer to not break the system. We might want to revisit
this one day and pay the price for a few extra memcpy invocations.
All in all this patch should make it easier to grok what is going on
in the memory system and how data is actually copied (or not).
|
|
This patch cleans up the use of hasData and checkFunctional in the
packet. The hasData function is unfortunately suggesting that it
checks if the packet has a valid data pointer, when it does in fact
only check if the specific packet type is specified to have a data
payload. The confusion led to a bug in checkFunctional. The latter
function is also tidied up to avoid name overloading.
|
|
This adds a basic level of sanity checking to the packet by ensuring
that a request is not modified once the packet is created. The only
issue that had to be worked around is the relaying of
software-prefetches in the cache. The specific situation is now solved
by first copying the request, and then creating a new packet
accordingly.
|
|
This patch tidies up the Request class, making all getters const. The
odd one out is incAccessDepth which is called by the memory system as
packets carry the request around. This is also const to enable the
packet to hold on to a const Request.
|
|
|
|
This patch simplifies how we deal with dynamically allocated data in
the packet, always assuming that it is array allocated, and hence
should be array deallocated (delete[] as opposed to delete). The only
uses of dataDynamic was in the Ruby testers.
The ARRAY_DATA flag in the packet is removed accordingly. No
defragmentation of the flags is done at this point, leaving a gap in
the bit masks.
As the last part the patch, it renames dataDynamicArray to dataDynamic.
|
|
This patch cleans up the packet memory allocation confusion. The data
is always allocated at the requesting side, when a packet is created
(or copied), and there is never a need for any device to allocate any
space if it is merely responding to a paket. This behaviour is in line
with how SystemC and TLM works as well, thus increasing
interoperability, and matching established conventions.
The redundant calls to Packet::allocate are removed, and the checks in
the function are tightened up to make sure data is only ever allocated
once. There are still some oddities in the packet copy constructor
where we copy the data pointer if it is static (without ownership),
and allocate new space if the data is dynamic (with ownership). The
latter is being worked on further in a follow-on patch.
|
|
This patch changes the various write functions in the port proxies
to use const pointers for all sources (similar to how memcpy works).
The one unfortunate aspect is the need for a const_cast in the packet,
to avoid having to juggle a const and a non-const data pointer. This
design decision can always be re-evaluated at a later stage.
|
|
This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.
The patch also removes the unused isReadWrite function.
|
|
This patch removes the parameter that enables bypassing the null check
in the Packet::getPtr method. A number of call sites assume the value
to be non-null.
The one odd case is the RubyTester, which issues zero-sized
prefetches(!), and despite being reads they had no valid data
pointer. This is now fixed, but the size oddity remains (unless anyone
object or has any good suggestions).
Finally, in the Ruby Sequencer, appropriate checks are made for flush
packets as they have no valid data pointer.
|
|
This patch adds a first cut GDDR5 config to accommodate the users
combining gem5 and GPUSim. The config is based on a SK Hynix
datasheet, and the Nvidia GTX580 specification. Someone from the
GPUSim user-camp should tweak the default page-policy and static
frontend and backend latencies.
|
|
Mostly addressing uninitialised members.
|
|
This patch adds uncacheable/cacheable and read-only/read-write attributes to
the map method of PageTableBase. It also modifies the constructor of TlbEntry
structs for all architectures to consider the new attributes.
|
|
The multi level page table was giving false positives for already mapped
translations. This patch fixes the bogus behavior.
|
|
Trimmed down all the lines greater than 78 characters.
|
|
|
|
With recent changes OSX clang compilation fails due to an unused variable.
|
|
Ruby's functional accesses are not guaranteed to succeed as of now. While
this is not a problem for the protocols that are currently in the mainline
repo, it seems that coherence protocols for gpus rely on a backing store to
supply the correct data. The aim of this patch is to make this backing store
configurable i.e. it comes into play only when a particular option:
--access-backing-store is invoked.
The backing store has been there since M5 and GEMS were integrated. The only
difference is that earlier the system used to maintain the backing store and
ruby's copy was write-only. Sometime last year, we moved to data being
supplied supplied by ruby in SE mode simulations. And now we have patches on
the reviewboard, which remove ruby's copy of memory altogether and rely
completely on the system's memory to supply data. This patch adds back a
SimpleMemory member to RubySystem. This member is used only if the option:
access-backing-store is set to true. By default, the memory would not be
accessed.
|
|
This patch is the final in the series. The whole series and this patch in
particular were written with the aim of interfacing ruby's directory controller
with the memory controller in the classic memory system. This is being done
since ruby's memory controller has not being kept up to date with the changes
going on in DRAMs. Classic's memory controller is more up to date and
supports multiple different types of DRAM. This also brings classic and
ruby ever more close. The patch also changes ruby's memory controller to
expose the same interface.
|
|
This function was added when I had incorrectly arrived at the conclusion
that such a function can improve the chances of a functional read succeeding.
As was later realized, this is not possible in the current setup. While the
code using this function was dropped long back, this function was not. Hence
the patch.
|
|
This patch removes the data block present in the directory entry structure
of each protocol in gem5's mainline. Firstly, this is required for moving
towards common set of memory controllers for classic and ruby memory systems.
Secondly, the data block was being misused in several places. It was being
used for having free access to the physical memory instead of calling on the
memory controller.
From now on, the directory controller will not have a direct visibility into
the physical memory. The Memory Vector object now resides in the
Memory Controller class. This also means that some significant changes are
being made to the functional accesses in ruby.
|
|
|
|
In my opinion, it creates needless complications in rest of the code.
Also, this structure hinders the move towards common set of code for
physical memory controllers.
|
|
Both ruby and the system used to maintain memory copies. With the changes
carried for programmed io accesses, only one single memory is required for
fs simulations. This patch sets the copy of memory that used to reside
with the system to null, so that no space is allocated, but address checks
can still be carried out. All the memory accesses now source and sink values
to the memory maintained by ruby.
|
|
As of now DMASequencer inherits from the RubyPort class. But the code in
RubyPort class is heavily tailored for the CPU Sequencer. There are parts of
the code that are not required at all for the DMA sequencer. Moreover, the
next patch uses the dma sequencer for carrying out memory accesses for all the
io devices. Hence, it is better to have a leaner dma sequencer.
|
|
|
|
WriteInvalidate semantics depend on the unconditional writeback
or they won't complete. Also, there's no point in deferring snoops
on their MSHRs, as they don't get new data at the end of their life
cycle the way other transactions do.
Add comment in the cache about a minor inefficiency re: WriteInvalidate.
|
|
Since WriteInvalidate directly writes into the cache, it can
create tricky timing interleavings with reads and writes to the
same cache line that haven't yet completed. This patch ensures
that these requests, when completed, don't overwrite the newer
data from the WriteInvalidate.
|
|
Ensure that we do the proper event scheduling also when the activation
limit is disabled.
|
|
This patch adds the size of the DRAM device to the DRAM config. It
also compares the actual DRAM size (calculated using information from
the config) to the size defined in the system. If these two values do
not match gem5 will print a warning. In order to do correct DRAM
research the size of the memory defined in the system should match the
size of the DRAM in the config. The timing and current parameters
found in the DRAM configs are defined for a DRAM device with a
specific size and would differ for another device with a different
size.
|
|
Bring the PhysicalMemory up-to-date by making use of range-based for
loops and vector intialisation where possible.
|
|
The new location seems like a better fit. The iterator typedefs are
removed in favour of using C++11 auto.
|
|
This patch adds two MemoryObject's: ExternalMaster and ExternalSlave.
Each object has a single port which can be bound to an externally-
provided bridge to a port of another simulation system at
initialisation.
|
|
This patch transitions the Ruby Message and its derived classes from
the ad-hoc RefCountingPtr to the c++11 shared_ptr. There are no
changes in behaviour, and the code modifications are mainly replacing
"new" with "make_shared".
The cloning of derived messages is slightly changed as they previously
relied on overriding the base-class through covariant return types.
|
|
This patch makes the memory system ISA-agnostic by enabling the Ruby
Sequencer to dynamically determine if it has to do a store check. To
enable this check, the ISA is encoded as an enum, and the system
is able to provide the ISA to the Sequencer at run time.
--HG--
rename : src/arch/x86/insts/microldstop.hh => src/arch/x86/ldstflags.hh
|