summaryrefslogtreecommitdiff
path: root/src/mem/cache/cache_impl.hh
AgeCommit message (Collapse)Author
2015-03-02mem: Add option to force in-order insertion in PacketQueueStephan Diestelhorst
By default, the packet queue is ordered by the ticks of the to-be-sent packages. With the recent modifications of packages sinking their header time when their resposne leaves the caches, there could be cases of MSHR targets being allocated and ordered A, B, but their responses being sent out in the order B,A. This led to inconsistencies in bus traffic, in particular the snoop filter observing first a ReadExResp and later a ReadRespWithInv. Logically, these were ordered the other way around behind the MSHR, but due to the timing adjustments when inserting into the PacketQueue, they were sent out in the wrong order on the bus, confusing the snoop filter. This patch adds a flag (off by default) such that these special cases can request in-order insertion into the packet queue, which might offset timing slighty. This is expected to occur rarely and not affect timing results.
2015-03-02mem: Downstream components consumes new crossbar delaysMarco Balboni
This patch makes the caches and memory controllers consume the delay that is annotated to a packet by the crossbar. Previously many components simply threw these delays away. Note that the devices still do not pay for these delays.
2015-03-02mem: Tidy up the cache debug messagesAndreas Hansson
Avoid redundant inclusion of the name in the DPRINTF string.
2015-03-02mem: Split port retry for all different packet classesAndreas Hansson
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
2015-03-02mem: Fix prefetchSquash + memInhibitAsserted bugAli Jafri
This patch resolves a bug with hardware prefetches. Before a hardware prefetch is sent towards the memory, the system generates a snoop request to check all caches above the prefetch generating cache for the presence of the prefetth target. If the prefetch target is found in the tags or the MSHRs of the upper caches, the cache sets the prefetchSquashed flag in the snoop packet. When the snoop packet returns with the prefetchSquashed flag set, the prefetch generating cache deallocates the MSHR reserved for the prefetch. If the prefetch target is found in the writeback buffer of the upper cache, the cache sets the memInhibit flag, which signals the prefetch generating cache to expect the data from the writeback. When the snoop packet returns with the memInhibitAsserted flag set, it marks the allocated MSHR as inService and waits for the data from the writeback. If the prefetch target is found in multiple upper level caches, specifically in the tags or MSHRs of one upper level cache and the writeback buffer of another, the snoop packet will return with both prefetchSquashed and memInhibitAsserted set, while the current code is not written to handle such an outcome. Current code checks for the prefetchSquashed flag first, if it finds the flag, it deallocates the reserved MSHR. This leads to assert failure when the data from the writeback appears at cache. In this fix, we simply switch the order of checks. We first check for memInhibitAsserted and then for prefetch squashed.
2015-02-11mem: Clarification of packet crossbar timingsMarco Balboni
This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.
2015-02-11mem: Clarify usage of latency in the cacheMarco Balboni
This patch adds some much-needed clarity in the specification of the cache timing. For now, hit_latency and response_latency are kept as top-level parameters, but the cache itself has a number of local variables to better map the individual timing variables to different behaviours (and sub-components). The introduced variables are: - lookupLatency: latency of tag lookup, occuring on any access - forwardLatency: latency that occurs in case of outbound miss - fillLatency: latency to fill a cache block We keep the existing responseLatency The forwardLatency is used by allocateInternalBuffer() for: - MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer); - MSHR allocateMissBuffer (cacheable miss in MSHR queue); - MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR queue) It is our assumption that the time for the above three buffers is the same. Similarly, for snoop responses passing through the cache we use forwardLatency.
2015-02-03mem: Clarify express snoop behaviourAndreas Hansson
This patch adds a bit of documentation with insights around how express snoops really work.
2015-02-03mem: Clarify cache behaviour for pending dirty responsesAndreas Hansson
This patch adds a bit of clarification around the assumptions made in the cache when packets are sent out, and dirty responses are pending. As part of the change, the marking of an MSHR as in service is simplified slightly, and comments are added to explain what assumptions are made.
2015-01-22mem: Remove Packet source from ForwardResponseRecordAndreas Hansson
This patch removes the source field from the ForwardResponseRecord, but keeps the class as it is part of how the cache identifies responses to hardware prefetches that are snooped upwards.
2015-01-20mem: Fix bug in cache request retry mechanismAndreas Hansson
This patch ensures that inhibited packets that are about to be turned into express snoops do not update the retry flag in the cache.
2014-12-23mem: Fix event scheduling issue for prefetchesMitch Hayenga
The cache's MemSidePacketQueue schedules a sendEvent based upon nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever a future prefetch is ready. However, a prefetch being ready does not guarentee that it can obtain an MSHR. So, when all MSHRs are full, the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond until an MSHR is finally freed and the prefetch can happen. This patch fixes this by not signaling the prefetch ready time if the prefetch could not be generated. The event is rescheduled as soon as a MSHR becomes available.
2014-12-23mem: Fix bug relating to writebacks and prefetchesMitch Hayenga
Previously the code commented about an unhandled case where it might be possible for a writeback to arrive after a prefetch was generated but before it was sent to the memory system. I hit that case. Luckily the prefetchSquash() logic already in the code handles dropping prefetch request in certian circumstances.
2014-12-23mem: Rework the structuring of the prefetchersMitch Hayenga
Re-organizes the prefetcher class structure. Previously the BasePrefetcher forced multiple assumptions on the prefetchers that inherited from it. This patch makes the BasePrefetcher class truly representative of base functionality. For example, the base class no longer enforces FIFO order. Instead, prefetchers with FIFO requests (like the existing stride and tagged prefetchers) now inherit from a new QueuedPrefetcher base class. Finally, the stride-based prefetcher now assumes a custimizable lookup table (sets/ways) rather than the previous fully associative structure.
2014-12-23mem: Add parameter to reserve MSHR entries for demand accessMitch Hayenga
Adds a new parameter that reserves some number of MSHR entries for demand accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand requests from the CPU to stall.
2014-12-02mem: Support WriteInvalidate (again)Curtis Dunham
This patch takes a clean-slate approach to providing WriteInvalidate (write streaming, full cache line writes without first reading) support. Unlike the prior attempt, which took an aggressive approach of directly writing into the cache before handling the coherence actions, this approach follows the existing cache flows as closely as possible.
2014-12-02mem: Remove WriteInvalidate supportCurtis Dunham
Prepare for a different implementation following in the next patch
2014-12-02mem: Clean up packet data allocationAndreas Hansson
This patch attempts to make the rules for data allocation in the packet explicit, understandable, and easy to verify. The constructor that copies a packet is extended with an additional flag "alloc_data" to enable the call site to explicitly say whether the newly created packet is short-lived (a zero-time snoop), or has an unknown life-time and therefore should allocate its own data (or copy a static pointer in the case of static data). The tricky case is the static data. In essence this is a copy-avoidance scheme where the original source of the request (DMA, CPU etc) does not ask the memory system to return data as part of the packet, but instead provides a pointer, and then the memory system carries this pointer around, and copies the appropriate data to the location itself. Thus any derived packet actually never copies any data. As the original source does not copy any data from the response packet when arriving back at the source, we must maintain the copy of the original pointer to not break the system. We might want to revisit this one day and pay the price for a few extra memcpy invocations. All in all this patch should make it easier to grok what is going on in the memory system and how data is actually copied (or not).
2014-12-02mem: Cleanup Packet::checkFunctional and hasData usageAndreas Hansson
This patch cleans up the use of hasData and checkFunctional in the packet. The hasData function is unfortunately suggesting that it checks if the packet has a valid data pointer, when it does in fact only check if the specific packet type is specified to have a data payload. The confusion led to a bug in checkFunctional. The latter function is also tidied up to avoid name overloading.
2014-12-02mem: Make the requests carried by packets constAndreas Hansson
This adds a basic level of sanity checking to the packet by ensuring that a request is not modified once the packet is created. The only issue that had to be worked around is the relaying of software-prefetches in the cache. The specific situation is now solved by first copying the request, and then creating a new packet accordingly.
2014-12-02mem: Add checks and explanation for assertMemInhibit usageAndreas Hansson
2014-12-02mem: Remove redundant Packet::allocate callsAndreas Hansson
This patch cleans up the packet memory allocation confusion. The data is always allocated at the requesting side, when a packet is created (or copied), and there is never a need for any device to allocate any space if it is merely responding to a paket. This behaviour is in line with how SystemC and TLM works as well, thus increasing interoperability, and matching established conventions. The redundant calls to Packet::allocate are removed, and the checks in the function are tightened up to make sure data is only ever allocated once. There are still some oddities in the packet copy constructor where we copy the data pointer if it is static (without ownership), and allocate new space if the data is dynamic (with ownership). The latter is being worked on further in a follow-on patch.
2014-12-02mem: Add const getters for write packet dataAndreas Hansson
This patch takes a first step in tightening up how we use the data pointer in write packets. A const getter is added for the pointer itself (getConstPtr), and a number of member functions are also made const accordingly. In a range of places throughout the memory system the new member is used. The patch also removes the unused isReadWrite function.
2014-10-21mem: don't inhibit WriteInv's or defer snoops on their MSHRsCurtis Dunham
WriteInvalidate semantics depend on the unconditional writeback or they won't complete. Also, there's no point in deferring snoops on their MSHRs, as they don't get new data at the end of their life cycle the way other transactions do. Add comment in the cache about a minor inefficiency re: WriteInvalidate.
2014-10-29mem: have WriteInvalidate obsolete MSHRsCurtis Dunham
Since WriteInvalidate directly writes into the cache, it can create tricky timing interleavings with reads and writes to the same cache line that haven't yet completed. This patch ensures that these requests, when completed, don't overwrite the newer data from the WriteInvalidate.
2014-10-09mem: Add packet sanity checks to cache and MSHRsAndreas Hansson
This patch adds a number of asserts to the cache, checking basic assumptions about packets being requests or responses.
2014-09-27misc: Fix a bunch of minor issues identified by static analysisAndreas Hansson
Add some missing initialisation, and fix a handful benign resource leaks (including some false positives).
2014-09-20mem: Rename Bus to XBar to better reflect its behaviourAndreas Hansson
This patch changes the name of the Bus classes to XBar to better reflect the actual timing behaviour. The actual instances in the config scripts are not renamed, and remain as e.g. iobus or membus. As part of this renaming, the code has also been clean up slightly, making use of range-based for loops and tidying up some comments. The only changes outside the bus/crossbar code is due to the delay variables in the packet. --HG-- rename : src/mem/Bus.py => src/mem/XBar.py rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh rename : src/mem/bus.cc => src/mem/xbar.cc rename : src/mem/bus.hh => src/mem/xbar.hh
2014-09-19mem: Add checks to sendTimingReq in cacheAndreas Hansson
A small fix to ensure the return value is not ignored.
2014-06-27mem: write streaming support via WriteInvalidate promotionCurtis Dunham
Support full-block writes directly rather than requiring RMW: * a cache line is allocated in the cache upon receipt of a WriteInvalidateReq, not the WriteInvalidateResp. * only top-level caches allocate the line; the others just pass the request along and invalidate as necessary. * to close a timing window between the *Req and the *Resp, a new metadata bit tracks whether another cache has read a copy of the new line before the writeback to memory.
2014-09-03mem: Fix a bug in the cache port flow controlAndreas Hansson
This patch fixes a bug in the cache port where the retry flag was reset too early, allowing new requests to arrive before the retry was actually sent, but with the event already scheduled. This caused a deadlock in the interactions with the O3 LSQ. The patche fixes the underlying issue by shifting the resetting of the flag to be done by the event that also calls sendRetry(). The patch also tidies up the flow control in recvTimingReq and ensures that we also check if we already have a retry outstanding.
2014-05-13cpu, mem: Make software prefetches non-blockingCurtis Dunham
Previously, they were treated so much like loads that they could stall at the head of the ROB. Now they are always treated like L1 hits. If they actually miss, a new request is created at the L1 and tracked from the MSHRs there if necessary (i.e. if it didn't coalesce with an existing outstanding load).
2014-09-03cache: Fix handling of LL/SC requests under contentionGeoffrey Blake
If a set of LL/SC requests contend on the same cache block we can get into a situation where CPUs will deadlock if they expect a failed SC to supply them data. This case happens where 3 or more cores are contending for a cache block using LL/SC and the system is configured where 2 cores are connected to a local bus and the third is connected to a remote bus. If a core on the local bus sends an SCUpgrade and the core on the remote bus sends and SCUpgrade they will race to see who will win the SC access. In the meantime if the other core appends a read to one of the SCUpgrades it will expect to be supplied data by that SCUpgrade transaction. If it happens that the SCUpgrade that was picked to supply the data is failed, it will drop the appended request for data and never respond, leaving the requesting core to deadlock. This patch makes all SC's behave as normal stores to prevent this case but still makes sure to check whether it can perform the update.
2014-08-13mem: Properly set cache block status fields on writebacksMitch Hayenga
When a cacheline is written back to a lower-level cache, tags->insertBlock() sets various status parameters. However these status bits were cleared immediately after calling. This patch makes it so that these status fields are not cleared by moving them outside of the tags->insertBlock() call.
2014-05-09mem: Squash prefetch requests from downstream cachesMitch Hayenga
This patch squashes prefetch requests from downstream caches, so that they do not steal cachelines away from caches closer to the cpu. It was originally coded by Mitch Hayenga and modified by Aasheesh Kolli.
2014-03-07mem: Fix incorrect assert failure in the CachePrakash Ramrakhyani
This patch fixes an assert condition that is not true at all times. There are valid situations that arise in dual-core dual-workload runs where the assert condition is false. The function call following the assert however needs to be called only when the condition is true (a block cannot be invalidated in the tags structure if has not been allocated in the structure, and the tempBlock is never allocated). Hence the 'assert' has been replaced with an 'if'.
2014-02-18mem: Filter cache snoops based on address rangesAndreas Hansson
This patch adds a filter to the cache to drop snoop requests that are not for a range covered by the cache. This fixes an issue observed when multiple caches are placed in parallel, covering different address ranges. Without this patch, all the caches will forward the snoop upwards, when only one should do so.
2014-01-29mem: prefetcher: add options, support for unaligned addressesMitch Hayenga ext:(%2C%20Amin%20Farmahini%20%3Caminfar%40gmail.com%3E)
This patch extends the classic prefetcher to work on non-block aligned addresses. Because the existing prefetchers in gem5 mask off the lower address bits of cache accesses, many predictable strides fail to be detected. For example, if a load were to stride by 48 bytes, with 64 byte cachelines, the current stride based prefetcher would see an access pattern of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern. This patch fixes this, by training the prefetcher on access and not masking off the lower address bits. It also adds the following configuration options: 1) Training/prefetching only on cache misses, 2) Training/prefetching only on data acceses, 3) Optionally tagging prefetches with a PC address. #3 allows prefetchers to train off of prefetch requests in systems with multiple cache levels and PC-based prefetchers present at multiple levels. It also effectively allows a pipelining of prefetch requests (like in POWER4) across multiple levels of cache hierarchy. Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7% for SPECFP (geomean).
2014-01-28mem: Remove redundant findVictim() input argumentAmin Farmahini
The patch (1) removes the redundant writeback argument from findVictim() (2) fixes the description of access() function Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-24mem: Add support for a security bit in the memory systemGiacomo Gabrielli
This patch adds the basic building blocks required to support e.g. ARM TrustZone by discerning secure and non-secure memory accesses.
2014-01-24mem: per-thread cache occupancy and per-block agesDam Sunwoo
This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump.
2013-10-17cpu: add consistent guarding to *_impl.hh files.Matt Horsnell
2013-07-18mem: Add cache class destructor to avoid memory leaksXiangyu Dong
Make valgrind a little bit happier
2013-06-27mem: Reorganize cache tags and make them a SimObjectPrakash Ramrakhyani
This patch reorganizes the cache tags to allow more flexibility to implement new replacement policies. The base tags class is now a clocked object so that derived classes can use a clock if they need one. Also having deriving from SimObject allows specialized Tag classes to be swapped in/out in .py files. The cache set is now templatized to allow it to contain customized cache blocks with additional informaiton. This involved moving code to the .hh file and removing cacheset.cc. The statistics belonging to the cache tags are now including ".tags" in their name. Hence, the stats need an update to reflect the change in naming.
2013-06-27mem: Align cache timing to clock edgesAndreas Hansson
This patch changes the cache timing calculations such that the results are aligned to clock edges. Plenty stats change as a results of this patch.
2013-06-27mem: Cycles converted to Ticks in atomic cache accessesAndreas Hansson
This patch fixes an outstanding issue in the cache timing calculations where an atomic access returned a time in Cycles, but the port forwarded it on as if it was in Ticks. A separate patch will update the regression stats.
2013-06-27mem: Remove a redundant heap allocation for a snoop packetAndreas Hansson
This patch changes the updards snoop packet to avoid allocating and later deleting it. As the code executes in 0 time and the lifetime of the packet does not extend beyond the block there is no reason to heap allocate it.
2013-04-22mem: Adding verbose debug output in the memory systemUri Wiener
This patch provides useful printouts throughut the memory system. This includes pretty-printed cache tags and function call messages (call-stack like).
2013-03-27mem: Fix cache latency bugMitch Hayenga
Fixes a latency calculation bug for accesses during a cache line fill. Under a cache miss, before the line is filled, accesses to the cache are associated with a MSHR and marked as targets. Once the line fill completes, MSHR target packets pay an additional latency of "responseLatency + busSerializationLatency". However, the "whenReady" field of the cache line is only set to an additional delay of "busSerializationLatency". This lacks the responseLatency component of the fill. It is possible for accesses that occur on the cycle of (or briefly after) the line fill to respond without properly paying the responseLatency. This also creates the situation where two accesses to the same address may be serviced in an order opposite of how they were received by the cache. For stores to the same address, this means that although the cache performs the stores in the order they were received, acknowledgements may be sent in a different order. Adding the responseLatency component to the whenReady field preserves the penalty that should be paid and prevents these ordering issues. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-02-19mem: Fix sender state bug and delay poppingAndreas Hansson
This patch fixes a newly introduced bug where the sender state was popped before checking that it should be. Amazingly all regressions pass, but Linux fails to boot on the detailed CPU with caches enabled.