summaryrefslogtreecommitdiff
path: root/src/mem/cache/cache_impl.hh
AgeCommit message (Collapse)Author
2013-07-18mem: Add cache class destructor to avoid memory leaksXiangyu Dong
Make valgrind a little bit happier
2013-06-27mem: Reorganize cache tags and make them a SimObjectPrakash Ramrakhyani
This patch reorganizes the cache tags to allow more flexibility to implement new replacement policies. The base tags class is now a clocked object so that derived classes can use a clock if they need one. Also having deriving from SimObject allows specialized Tag classes to be swapped in/out in .py files. The cache set is now templatized to allow it to contain customized cache blocks with additional informaiton. This involved moving code to the .hh file and removing cacheset.cc. The statistics belonging to the cache tags are now including ".tags" in their name. Hence, the stats need an update to reflect the change in naming.
2013-06-27mem: Align cache timing to clock edgesAndreas Hansson
This patch changes the cache timing calculations such that the results are aligned to clock edges. Plenty stats change as a results of this patch.
2013-06-27mem: Cycles converted to Ticks in atomic cache accessesAndreas Hansson
This patch fixes an outstanding issue in the cache timing calculations where an atomic access returned a time in Cycles, but the port forwarded it on as if it was in Ticks. A separate patch will update the regression stats.
2013-06-27mem: Remove a redundant heap allocation for a snoop packetAndreas Hansson
This patch changes the updards snoop packet to avoid allocating and later deleting it. As the code executes in 0 time and the lifetime of the packet does not extend beyond the block there is no reason to heap allocate it.
2013-04-22mem: Adding verbose debug output in the memory systemUri Wiener
This patch provides useful printouts throughut the memory system. This includes pretty-printed cache tags and function call messages (call-stack like).
2013-03-27mem: Fix cache latency bugMitch Hayenga
Fixes a latency calculation bug for accesses during a cache line fill. Under a cache miss, before the line is filled, accesses to the cache are associated with a MSHR and marked as targets. Once the line fill completes, MSHR target packets pay an additional latency of "responseLatency + busSerializationLatency". However, the "whenReady" field of the cache line is only set to an additional delay of "busSerializationLatency". This lacks the responseLatency component of the fill. It is possible for accesses that occur on the cycle of (or briefly after) the line fill to respond without properly paying the responseLatency. This also creates the situation where two accesses to the same address may be serviced in an order opposite of how they were received by the cache. For stores to the same address, this means that although the cache performs the stores in the order they were received, acknowledgements may be sent in a different order. Adding the responseLatency component to the whenReady field preserves the penalty that should be paid and prevents these ordering issues. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-02-19mem: Fix sender state bug and delay poppingAndreas Hansson
This patch fixes a newly introduced bug where the sender state was popped before checking that it should be. Amazingly all regressions pass, but Linux fails to boot on the detailed CPU with caches enabled.
2013-02-19scons: Fix up numerous warnings about name shadowingAndreas Hansson
This patch address the most important name shadowing warnings (as produced when using gcc/clang with -Wshadow). There are many locations where constructor parameters and function parameters shadow local variables, but these are left unchanged.
2013-02-19mem: Enforce strict use of busFirst- and busLastWordTimeAndreas Hansson
This patch adds a check to ensure that the delay incurred by the bus is not simply disregarded, but accounted for by someone. At this point, all the modules do is to zero it out, and no additional time is spent. This highlights where the bus timing is simply dropped instead of being paid for. As a follow up, the locations identified in this patch should add this additional time to the packets in one way or another. For now it simply acts as a sanity check and highlights where the delay is simply ignored. Since no time is added, all regressions remain the same.
2013-02-19mem: Change accessor function names to match the port interfaceAndreas Hansson
This patch changes the names of the cache accessor functions to be in line with those used by the ports. This is done to avoid confusion and get closer to a one-to-one correspondence between the interface of the memory object (the cache in this case) and the port itself. The member function timingAccess has been split into a snoop/non-snoop part to avoid branching on the isResponse() of the packet.
2013-02-19mem: Make packet bus-related time accounting relativeAndreas Hansson
This patch changes the bus-related time accounting done in the packet to be relative. Besides making it easier to align the cache timing to cache clock cycles, it also makes it possible to create a Last-Level Cache (LLC) directly to a memory controller without a bus inbetween. The bus is unique in that it does not ever make the packets wait to reflect the time spent forwarding them. Instead, the cache is currently responsible for making the packets wait. Thus, the bus annotates the packets with the time needed for the first word to appear, and also the last word. The cache then delays the packets in its queues before passing them on. It is worth noting that every object attached to a bus (devices, memories, bridges, etc) should be doing this if we opt for keeping this way of accounting for the bus timing.
2013-02-19sim: Make clock private and access using clockPeriod()Andreas Hansson
This patch makes the clock member private to the ClockedObject and forces all children to access it using clockPeriod(). This makes it impossible to inadvertently change the clock, and also makes it easier to transition to a situation where the clock is derived from e.g. a clock domain, or through a multiplier.
2013-02-19mem: Add predecessor to SenderState base classAndreas Hansson
This patch adds a predecessor field to the SenderState base class to make the process of linking them up more uniform, and enable a traversal of the stack without knowing the specific type of the subclasses. There are a number of simplifications done as part of changing the SenderState, particularly in the RubyTest.
2013-02-15mem: Tighten up cache constness and scopingAndreas Hansson
This patch merely adopts a more strict use of const for the cache member functions and variables, and also moves a large portion of the member functions from public to protected.
2013-02-15sim: Add a system-global option to bypass cachesAndreas Sandberg
Virtualized CPUs and the fastmem mode of the atomic CPU require direct access to physical memory. We currently require caches to be disabled when using them to prevent chaos. This is not ideal when switching between hardware virutalized CPUs and other CPU models as it would require a configuration change on each switch. This changeset introduces a new version of the atomic memory mode, 'atomic_noncaching', where memory accesses are inserted into the memory system as atomic accesses, but bypass caches. To make memory mode tests cleaner, the following methods are added to the System class: * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'. * isTimingMode() -- True if the memory mode is 'timing'. * bypassCaches() -- True if caches should be bypassed. The old getMemoryMode() and setMemoryMode() methods should never be used from the C++ world anymore.
2013-01-07mem: Fix guest corruption when caches handle uncacheable accessesAndreas Sandberg
When the classic gem5 cache sees an uncacheable memory access, it used to ignore it or silently drop the cache line in case of a write. Normally, there shouldn't be any data in the cache belonging to an uncacheable address range. However, since some architecture models don't implement cache maintenance instructions, there might be some dirty data in the cache that is discarded when this happens. The reason it has mostly worked before is because such cache lines were most likely evicted by normal memory activity before a TLB flush was requested by the OS. Previously, the cache model would invalidate cache lines when they were accessed by an uncacheable write. This changeset alters this behavior so all uncacheable memory accesses cause a cache flush with an associated writeback if necessary. This is implemented by reusing the cache flushing machinery used when draining the cache, which implies that writebacks are performed using functional accesses.
2013-01-07cache: add note about where conflicts are handledAli Saidi
2012-11-02mem: Add support for writing back and flushing cachesAndreas Sandberg
This patch adds support for the following optional drain methods in the classical memory system's cache model: memWriteback() - Write back all dirty cache lines to memory using functional accesses. memInvalidate() - Invalidate all cache lines. Dirty cache lines are lost unless a writeback is requested. Since memWriteback() is called when checkpointing systems, this patch adds support for checkpointing systems with caches. The serialization code now checks whether there are any dirty lines in the cache. If there are dirty lines in the cache, the checkpoint is flagged as bad and a warning is printed.
2012-10-15Mem: Use cycles to express cache-related latenciesAndreas Hansson
This patch changes the cache-related latencies from an absolute time expressed in Ticks, to a number of cycles that can be scaled with the clock period of the caches. Ultimately this patch serves to enable future work that involves dynamic frequency scaling. As an immediate benefit it also makes it more convenient to specify cache performance without implicitly assuming a specific CPU core operating frequency. The stat blocked_cycles that actually counter in ticks is now updated to count in cycles. As the timing is now rounded to the clock edges of the cache, there are some regressions that change. Plenty of them have very minor changes, whereas some regressions with a short run-time are perturbed quite significantly. A follow-on patch updates all the statistics for the regressions.
2012-09-25Cache: add a response latency to the cachesMrinmoy Ghosh
In the current caches the hit latency is paid twice on a miss. This patch lets a configurable response latency be set of the cache for the backward path.
2012-09-19AddrRange: Transition from Range<T> to AddrRangeAndreas Hansson
This patch takes the final plunge and transitions from the templated Range class to the more specific AddrRange. In doing so it changes the obvious Range<Addr> to AddrRange, and also bumps the range_map to be AddrRangeMap. In addition to the obvious changes, including the removal of redundant includes, this patch also does some house keeping in preparing for the introduction of address interleaving support in the ranges. The Range class is also stripped of all the functionality that is never used. --HG-- rename : src/base/range.hh => src/base/addr_range.hh rename : src/base/range_map.hh => src/base/addr_range_map.hh
2012-09-11Cache: Split invalidateBlk up to seperate block vs. tagsLena Olson
This seperates the functionality to clear the state in a block into blk.hh and the functionality to udpate the tag information into the tags. This gets rid of the case where calling invalidateBlk on an already-invalid block does something different than calling it on a valid block, which was confusing.
2012-08-22Packet: Remove NACKs from packet and its use in endpointsAndreas Hansson
This patch removes the NACK frrom the packet as there is no longer any module in the system that issues them (the bridge was the only one and the previous patch removes that). The handling of NACKs was mostly avoided throughout the code base, by using e.g. panic or assert false, but in a few locations the NACKs were actually dealt with (although NACKs never occured in any of the regressions). Most notably, the DMA port will now never receive a NACK and the backoff time is thus never changed. As a consequence, the entire backoff mechanism (similar to a PCI bus) is now removed and the DMA port entirely relies on the bus performing the arbitration and issuing a retry when appropriate. This is more in line with e.g. PCIe. Surprisingly, this patch has no impact on any of the regressions. As mentioned in the patch that removes the NACK from the bridge, a follow-up patch should change the request and response buffer size for at least one regression to also verify that the system behaves as expected when the bridge fills up.
2012-08-22Port: Extend the QueuedPort interface and use where appropriateAndreas Hansson
This patch extends the queued port interfaces with methods for scheduling the transmission of a timing request/response. The methods are named similar to the corresponding sendTiming(Snoop)Req/Resp, replacing the "send" with "sched". As the queues are currently unbounded, the methods always succeed and hence do not return a value. This functionality was previously provided in the subclasses by calling PacketQueue::schedSendTiming with the appropriate parameters. With this change, there is no need to introduce these extra methods in the subclasses, and the use of the queued interface is more uniform and explicit.
2012-07-27cache: don't allow dirty data in the i-cacheAnthony Gutierrez
removes the optimization that forwards an exclusive copy to a requester on a read, only for the i-cache. this optimization isn't necessary because we typically won't be writing to the i-cache.
2012-07-09Port: Align port names in C++ and PythonAndreas Hansson
This patch is a first step to align the port names used in the Python world and the C++ world. Ultimately it serves to make the use of config.json together with output from the simulation easier, including post-processing of statistics. Most notably, the CPU, cache, and bus is addressed in this patch, and there might be other ports that should be updated accordingly. The dash name separator has also been replaced with a "." which is what is used to concatenate the names in python, and a separation is made between the master and slave port in the bus.
2012-07-09Port: Make getAddrRanges constAndreas Hansson
This patch makes getAddrRanges const throughout the code base. There is no reason why it should not be, and making it const prevents adding any unintentional side-effects.
2012-07-09Port: Add isSnooping to slave port (asking master port)Andreas Hansson
This patch adds isSnooping to the slave port, and thus avoids going through getMasterPort to be able to ask the master. Over the course of the next few patches, all getMasterPort/getSlavePort in Port and MemObject are to be protocol agnostic, and the snooping is part of the protocol layer. The function is already present on the master port, where it is implemented by the module itself, e.g. a cache. On the slave side, it is merely asking the connected master port. The same name is used by both functions despite their difference in behaviour. The initial design used isMasterSnooping on the slave port side, but the more verbose function name was later changed.
2012-06-29Mem: fix master id assertion in cache_impl.hhDam Sunwoo
The assertion was applied to the wrong packet. This patch fixes the issue rerported by Xiang Jiang on the gem5-dev mailing list.
2012-06-29Cache: Only invalidate a line in the cache when an uncacheable write is seen.Ali Saidi
2012-06-07mem: Delay deleting of incoming packets by one call.Ali Saidi
This patch is a temporary fix until Andreas' four-phase patches get reviewed and committed. Removing FastAlloc seems to have exposed an issue which previously was reasonable rare in which packets are freed before the sending cache is done with them. This change puts incoming packets no a pendingDelete queue which are deleted at the start of the next call and thus breaks the dependency between when the caller returns true and when the packet is actually used by the sending cache. Running valgrind on a multi-core linux boot and the memtester results in no valgrind warnings.
2012-06-05sim: Remove FastAllocAli Saidi
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe. After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc when running twolf for ARM.
2012-05-30Bus: Turn the PortId into a transport function parameterAndreas Hansson
The main aim of this patch is to arrive at a suitable port interface for vector ports, including both the packet and the port id. This patch changes the bus transport functions (recvFunctional/Atomic/Timing) to require a PortId parameter indicating the source port. Previously this information was passed by setting the source field of the packet, and this is only required in the case of a timing request. With this patch, the use of the source and destination field is also more restrictive, as they are only needed for timing accesses. The modifications to these fields for atomic snoops is now removed entirely, also making minor modifications to the cache.
2012-05-30Packet: Unify the use of PortID in packet and portAndreas Hansson
This patch removes the Packet::NodeID typedef and unifies it with the Port::PortId. The src and dest fields in the packet are used to hold a port id (e.g. in the bus), and thus the two should actually be the same. The typedef PortID is now global (in base/types.hh) and aligned with the ThreadID in terms of capitalisation and naming of the InvalidPortID constant. Before this patch, two flags were used for valid destination and source, rather than relying on a named value (InvalidPortID), and this is now redundant, as the src and dest field themselves are sufficient to tell whether the current value is a valid port identifier or not. Consequently, the VALID_SRC and VALID_DST are removed. As part of the cleaning up, a number of int parameters and local variables are updated to use PortID. Note that Ruby still has its own NodeID typedef. Furthermore, the MemObject getMaster/SlavePort still has an int idx parameter with a default value of -1 which should eventually change to PortID idx = InvalidPortID.
2012-05-10Cache: restructure code that actually isn't a loopAli Saidi
2012-05-10Cache: Panic if you attempt to create a checkpoint with a cache in the systemAli Saidi
2012-05-01MEM: Separate requests and responses for timing accessesAndreas Hansson
This patch moves send/recvTiming and send/recvTimingSnoop from the Port base class to the MasterPort and SlavePort, and also splits them into separate member functions for requests and responses: send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq, send/recvTimingSnoopResp. A master port sends requests and receives responses, and also receives snoop requests and sends snoop responses. A slave port has the reciprocal behaviour as it receives requests and sends responses, and sends snoop requests and receives snoop responses. For all MemObjects that have only master ports or slave ports (but not both), e.g. a CPU, or a PIO device, this patch merely adds more clarity to what kind of access is taking place. For example, a CPU port used to call sendTiming, and will now call sendTimingReq. Similarly, a response previously came back through recvTiming, which is now recvTimingResp. For the modules that have both master and slave ports, e.g. the bus, the behaviour was previously relying on branches based on pkt->isRequest(), and this is now replaced with a direct call to the apprioriate member function depending on the type of access. Please note that send/recvRetry is still shared by all the timing accessors and remains in the Port base class for now (to maintain the current bus functionality and avoid changing the statistics of all regressions). The packet queue is split into a MasterPort and SlavePort version to facilitate the use of the new timing accessors. All uses of the PacketQueue are updated accordingly. With this patch, the type of packet (request or response) is now well defined for each type of access, and asserts on pkt->isRequest() and pkt->isResponse() are now moved to the appropriate send member functions. It is also worth noting that sendTimingSnoopReq no longer returns a boolean, as the semantics do not alow snoop requests to be rejected or stalled. All these assumptions are now excplicitly part of the port interface itself.
2012-04-14MEM: Remove the Broadcast destination from the packetAndreas Hansson
This patch simplifies the packet by removing the broadcast flag and instead more firmly relying on (and enforcing) the semantics of transactions in the classic memory system, i.e. request packets are routed from a master to a slave based on the address, and when they are created they have neither a valid source, nor destination. On their way to the slave, the request packet is updated with a source field for all modules that multiplex packets from multiple master (e.g. a bus). When a request packet is turned into a response packet (at the final slave), it moves the potentially populated source field to the destination field, and the response packet is routed through any multiplexing components back to the master based on the destination field. Modules that connect multiplexing components, such as caches and bridges store any existing source and destination field in the sender state as a stack (just as before). The packet constructor is simplified in that there is no longer a need to pass the Packet::Broadcast as the destination (this was always the case for the classic memory system). In the case of Ruby, rather than using the parameter to the constructor we now rely on setDest, as there is already another three-argument constructor in the packet class. In many places where the packet information was printed as part of DPRINTFs, request packets would be printed with a numeric "dest" that would always be -1 (Broadcast) and that field is now removed from the printing.
2012-04-14MEM: Separate snoops and normal memory requests/responsesAndreas Hansson
This patch introduces port access methods that separates snoop request/responses from normal memory request/responses. The differentiation is made for functional, atomic and timing accesses and builds on the introduction of master and slave ports. Before the introduction of this patch, the packets belonging to the different phases of the protocol (request -> [forwarded snoop request -> snoop response]* -> response) all use the same port access functions, even though the snoop packets flow in the opposite direction to the normal packet. That is, a coherent master sends normal request and receives responses, but receives snoop requests and sends snoop responses (vice versa for the slave). These two distinct phases now use different access functions, as described below. Starting with the functional access, a master sends a request to a slave through sendFunctional, and the request packet is turned into a response before the call returns. In a system without cache coherence, this is all that is needed from the functional interface. For the cache-coherent scenario, a slave also sends snoop requests to coherent masters through sendFunctionalSnoop, with responses returned within the same packet pointer. This is currently used by the bus and caches, and the LSQ of the O3 CPU. The send/recvFunctional and send/recvFunctionalSnoop are moved from the Port super class to the appropriate subclass. Atomic accesses follow the same flow as functional accesses, with request being sent from master to slave through sendAtomic. In the case of cache-coherent ports, a slave can send snoop requests to a master through sendAtomicSnoop. Just as for the functional access methods, the atomic send and receive member functions are moved to the appropriate subclasses. The timing access methods are different from the functional and atomic in that requests and responses are separated in time and send/recvTiming are used for both directions. Hence, a master uses sendTiming to send a request to a slave, and a slave uses sendTiming to send a response back to a master, at a later point in time. Snoop requests and responses travel in the opposite direction, similar to what happens in functional and atomic accesses. With the introduction of this patch, it is possible to determine the direction of packets in the bus, and no longer necessary to look for both a master and a slave port with the requested port id. In contrast to the normal recvFunctional, recvAtomic and recvTiming that are pure virtual functions, the recvFunctionalSnoop, recvAtomicSnoop and recvTimingSnoop have a default implementation that calls panic. This is to allow non-coherent master and slave ports to not implement these functions.
2012-03-30MEM: Introduce the master/slave port sub-classes in C++William Wang
This patch introduces the notion of a master and slave port in the C++ code, thus bringing the previous classification from the Python classes into the corresponding simulation objects and memory objects. The patch enables us to classify behaviours into the two bins and add assumptions and enfore compliance, also simplifying the two interfaces. As a starting point, isSnooping is confined to a master port, and getAddrRanges to slave ports. More of these specilisations are to come in later patches. The getPort function is not getMasterPort and getSlavePort, and returns a port reference rather than a pointer as NULL would never be a valid return value. The default implementation of these two functions is placed in MemObject, and calls fatal. The one drawback with this specific patch is that it requires some code duplication, e.g. QueuedPort becomes QueuedMasterPort and QueuedSlavePort, and BusPort becomes BusMasterPort and BusSlavePort (avoiding multiple inheritance). With the later introduction of the port interfaces, moving the functionality outside the port itself, a lot of the duplicated code will disappear again.
2012-03-22MEM: Split SimpleTimingPort into PacketQueue and portsAndreas Hansson
This patch decouples the queueing and the port interactions to simplify the introduction of the master and slave ports. By separating the queueing functionality from the port itself, it becomes much easier to distinguish between master and slave ports, and still retain the queueing ability for both (without code duplication). As part of the split into a PacketQueue and a port, there is now also a hierarchy of two port classes, QueuedPort and SimpleTimingPort. The QueuedPort is useful for ports that want to leave the packet transmission of outgoing packets to the queue and is used by both master and slave ports. The SimpleTimingPort inherits from the QueuedPort and adds the implemention of recvTiming and recvFunctional through recvAtomic. The PioPort and MessagePort are cleaned up as part of the changes. --HG-- rename : src/mem/tport.cc => src/mem/packet_queue.cc rename : src/mem/tport.hh => src/mem/packet_queue.hh
2012-03-09cache: Allow main memory to be at disjoint address ranges.Ali Saidi
2012-03-01Cache: Fix an issue with LRU when bonus block is used to complete transaction.Ali Saidi
The block is never inserted because it's the one extra block in the cache, but it can be invalidated twice in a row. In that case the block doesn't have a new master id (beacuse it was never inserted), however it is valid and the accounting goes wrong at that point.
2012-02-24MEM: Simplify cache ports preparing for master/slave splitAndreas Hansson
This patch splits the two cache ports into a master (memory-side) and slave (cpu-side) subclass of port with slightly different functionality. For example, it is only the CPU-side port that blocks incoming requests, and only the memory-side port that schedules send events outside of what the transmit list dictates. This patch simplifies the two classes by relying further on SimpleTimingPort and also generalises the latter to better accommodate the changes (introducing trySendTiming and scheduleSend). The memory-side cache port overrides sendDeferredPacket to be able to not only send responses from the transmit list, but also send requests based on the MSHRs. A follow on patch further simplifies the SimpleTimingPort and the cache ports.
2012-02-12mem: fix cache stats to use request ids correctlyDam Sunwoo
This patch fixes the cache stats to use the new request ids. Cache stats also display the requestor names in the vector subnames. Most cache stats now include "nozero" and "nonan" flags to reduce the amount of excessive cache stat dump. Also, simplified incMissCount()/incHitCount() functions.
2012-02-12mem: Add a master ID to each request object.Ali Saidi
This change adds a master id to each request object which can be used identify every device in the system that is capable of issuing a request. This is part of the way to removing the numCpus+1 stats in the cache and replacing them with the master ids. This is one of a series of changes that make way for the stats output to be changed to python.
2012-02-12prefetcher: Make prefetcher a sim object instead of it being a parameter on ↵Mrinmoy Ghosh
cache
2012-01-31MEM: Remove the otherPort from the cache portsAndreas Hansson
This patch is a very straight-forward simplification, removing the unecessary otherPort pointer from the cache port. The pointer was only used to forward range changes, and the address range is fixed for the cache. Removing the pointer simplifies the transition to master/slave ports.
2012-01-17MEM: Remove the functional ports from the memory systemWilliam Wang
The functional ports are no longer used and this patch cleans up the legacy that is still present in buses, memories, CPUs etc. Note that this does not refer to the class FunctionalPort (already removed), but rather ports with the name (and use) functional.