summaryrefslogtreecommitdiff
path: root/src/mem/cache
AgeCommit message (Collapse)Author
2014-01-29mem: Add additional tolerance to stride prefetcherMitch Hayenga
Forces the prefetcher to mispredict twice in a row before resetting the confidence of prefetching. This helps cases where a load PC strides by a constant factor, however it may operate on different arrays at times. Avoids the cost of retraining. Primarily helps with small iteration loops. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29mem: Allowed tagged instruction prefetching in stride prefetcherMitch Hayenga
For systems with a tightly coupled L2, a stride-based prefetcher may observe access requests from both instruction and data L1 caches. However, the PC address of an instruction miss gives no relevant training information to the stride based prefetcher(there is no stride to train). In theses cases, its better if the L2 stride prefetcher simply reverted back to a simple N-block ahead prefetcher. This patch enables this option. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-29mem: prefetcher: add options, support for unaligned addressesMitch Hayenga ext:(%2C%20Amin%20Farmahini%20%3Caminfar%40gmail.com%3E)
This patch extends the classic prefetcher to work on non-block aligned addresses. Because the existing prefetchers in gem5 mask off the lower address bits of cache accesses, many predictable strides fail to be detected. For example, if a load were to stride by 48 bytes, with 64 byte cachelines, the current stride based prefetcher would see an access pattern of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern. This patch fixes this, by training the prefetcher on access and not masking off the lower address bits. It also adds the following configuration options: 1) Training/prefetching only on cache misses, 2) Training/prefetching only on data acceses, 3) Optionally tagging prefetches with a PC address. #3 allows prefetchers to train off of prefetch requests in systems with multiple cache levels and PC-based prefetchers present at multiple levels. It also effectively allows a pipelining of prefetch requests (like in POWER4) across multiple levels of cache hierarchy. Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7% for SPECFP (geomean).
2014-01-28mem: Remove redundant findVictim() input argumentAmin Farmahini
The patch (1) removes the redundant writeback argument from findVictim() (2) fixes the description of access() function Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-01-24mem: Add support for a security bit in the memory systemGiacomo Gabrielli
This patch adds the basic building blocks required to support e.g. ARM TrustZone by discerning secure and non-secure memory accesses.
2014-01-24Cache: Collect very basic stats on tag and data accessesTimothy M. Jones
Adds very basic statistics on the number of tag and data accesses within the cache, which is important for power modelling. For the tags, simply count the associativity of the cache each time. For the data, this depends on whether tags and data are accessed sequentially, which is given by a new parameter. In the parallel case, all data blocks are accessed each time, but with sequential accesses, a single data block is accessed only on a hit.
2014-01-24mem: per-thread cache occupancy and per-block agesDam Sunwoo
This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump.
2014-01-24mem: track per-request latencies and access depths in the cache hierarchyMatt Horsnell
Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request.
2013-10-17cpu: add consistent guarding to *_impl.hh files.Matt Horsnell
2013-09-04arch: Resurrect the NOISA build target and rename it NULLAndreas Hansson
This patch makes it possible to once again build gem5 without any ISA. The main purpose is to enable work around the interconnect and memory system without having to build any CPU models or device models. The regress script is updated to include the NULL ISA target. Currently no regressions make use of it, but all the testers could (and perhaps should) transition to it. --HG-- rename : build_opts/NOISA => build_opts/NULL rename : src/arch/noisa/SConsopts => src/arch/null/SConsopts rename : src/arch/noisa/cpu_dummy.hh => src/arch/null/cpu_dummy.hh rename : src/cpu/intr_control.cc => src/cpu/intr_control_noisa.cc
2013-07-18mem: Set the cache line size on a system levelAndreas Hansson
This patch removes the notion of a peer block size and instead sets the cache line size on the system level. Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used. A follow-on patch updates the configuration scripts accordingly.
2013-07-18mem: Add cache class destructor to avoid memory leaksXiangyu Dong
Make valgrind a little bit happier
2013-06-27mem: Reorganize cache tags and make them a SimObjectPrakash Ramrakhyani
This patch reorganizes the cache tags to allow more flexibility to implement new replacement policies. The base tags class is now a clocked object so that derived classes can use a clock if they need one. Also having deriving from SimObject allows specialized Tag classes to be swapped in/out in .py files. The cache set is now templatized to allow it to contain customized cache blocks with additional informaiton. This involved moving code to the .hh file and removing cacheset.cc. The statistics belonging to the cache tags are now including ".tags" in their name. Hence, the stats need an update to reflect the change in naming.
2013-06-27mem: Remove the cache builderAndreas Hansson
This patch removes the redundant cache builder class.
2013-06-27mem: Align cache timing to clock edgesAndreas Hansson
This patch changes the cache timing calculations such that the results are aligned to clock edges. Plenty stats change as a results of this patch.
2013-06-27mem: Cycles converted to Ticks in atomic cache accessesAndreas Hansson
This patch fixes an outstanding issue in the cache timing calculations where an atomic access returned a time in Cycles, but the port forwarded it on as if it was in Ticks. A separate patch will update the regression stats.
2013-06-27mem: Remove a redundant heap allocation for a snoop packetAndreas Hansson
This patch changes the updards snoop packet to avoid allocating and later deleting it. As the code executes in 0 time and the lifetime of the packet does not extend beyond the block there is no reason to heap allocate it.
2013-05-30mem: Spring cleaning of MSHR and MSHRQueueAndreas Hansson
This patch does some minor tidying up of the MSHR and MSHRQueue. The clean up started as part of some ad-hoc tracing and debugging, but seems worthwhile enough to go in as a separate patch. The highlights of the changes are reduced scoping (private) members where possible, avoiding redundant new/delete, and constructor initialisation to please static code analyzers.
2013-05-30mem: Fix MSHR print formatAndreas Hansson
This patch fixes an incorrect print format string by adding an additional string element.
2013-04-22mem: Adding verbose debug output in the memory systemUri Wiener
This patch provides useful printouts throughut the memory system. This includes pretty-printed cache tags and function call messages (call-stack like).
2013-03-27mem: Fix cache latency bugMitch Hayenga
Fixes a latency calculation bug for accesses during a cache line fill. Under a cache miss, before the line is filled, accesses to the cache are associated with a MSHR and marked as targets. Once the line fill completes, MSHR target packets pay an additional latency of "responseLatency + busSerializationLatency". However, the "whenReady" field of the cache line is only set to an additional delay of "busSerializationLatency". This lacks the responseLatency component of the fill. It is possible for accesses that occur on the cycle of (or briefly after) the line fill to respond without properly paying the responseLatency. This also creates the situation where two accesses to the same address may be serviced in an order opposite of how they were received by the cache. For stores to the same address, this means that although the cache performs the stores in the order they were received, acknowledgements may be sent in a different order. Adding the responseLatency component to the whenReady field preserves the penalty that should be paid and prevents these ordering issues. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2013-03-26mem: Cancel cache retry event when blocking portRene de Jong
This patch solves the corner case scenario where the sendRetryEvent could be scheduled twice, when an io device stresses the IOcache in the system. This should not be possible in the cache system.
2013-02-19mem: Fix sender state bug and delay poppingAndreas Hansson
This patch fixes a newly introduced bug where the sender state was popped before checking that it should be. Amazingly all regressions pass, but Linux fails to boot on the detailed CPU with caches enabled.
2013-02-19scons: Fix up numerous warnings about name shadowingAndreas Hansson
This patch address the most important name shadowing warnings (as produced when using gcc/clang with -Wshadow). There are many locations where constructor parameters and function parameters shadow local variables, but these are left unchanged.
2013-02-19mem: Enforce strict use of busFirst- and busLastWordTimeAndreas Hansson
This patch adds a check to ensure that the delay incurred by the bus is not simply disregarded, but accounted for by someone. At this point, all the modules do is to zero it out, and no additional time is spent. This highlights where the bus timing is simply dropped instead of being paid for. As a follow up, the locations identified in this patch should add this additional time to the packets in one way or another. For now it simply acts as a sanity check and highlights where the delay is simply ignored. Since no time is added, all regressions remain the same.
2013-02-19mem: Change accessor function names to match the port interfaceAndreas Hansson
This patch changes the names of the cache accessor functions to be in line with those used by the ports. This is done to avoid confusion and get closer to a one-to-one correspondence between the interface of the memory object (the cache in this case) and the port itself. The member function timingAccess has been split into a snoop/non-snoop part to avoid branching on the isResponse() of the packet.
2013-02-19mem: Make packet bus-related time accounting relativeAndreas Hansson
This patch changes the bus-related time accounting done in the packet to be relative. Besides making it easier to align the cache timing to cache clock cycles, it also makes it possible to create a Last-Level Cache (LLC) directly to a memory controller without a bus inbetween. The bus is unique in that it does not ever make the packets wait to reflect the time spent forwarding them. Instead, the cache is currently responsible for making the packets wait. Thus, the bus annotates the packets with the time needed for the first word to appear, and also the last word. The cache then delays the packets in its queues before passing them on. It is worth noting that every object attached to a bus (devices, memories, bridges, etc) should be doing this if we opt for keeping this way of accounting for the bus timing.
2013-02-19mem: Add deferred packet class to prefetcherAndreas Hansson
This patch removes the time field from the packet as it was only used by the preftecher. Similar to the packet queue, the prefetcher now wraps the packet in a deferred packet, which also has a tick representing the absolute time when the packet should be sent.
2013-02-19sim: Make clock private and access using clockPeriod()Andreas Hansson
This patch makes the clock member private to the ClockedObject and forces all children to access it using clockPeriod(). This makes it impossible to inadvertently change the clock, and also makes it easier to transition to a situation where the clock is derived from e.g. a clock domain, or through a multiplier.
2013-02-19mem: Fix SenderState related cache deadlockSascha Bischoff
This patch fixes a potential deadlock in the caches. This deadlock could occur when more than one cache is used in a system, and pkt->senderState is modified in between the two caches. This happened as the caches relied on the senderState remaining unchanged, and used it for instantaneous upstream communication with other caches. This issue has been addressed by iterating over the linked list of senderStates until we are either able to cast to a MSHR* or senderState is NULL. If the cast is successful, we know that the packet has previously passed through another cache, and therefore update the downstreamPending flag accordingly. Otherwise, we do nothing.
2013-02-19mem: Add predecessor to SenderState base classAndreas Hansson
This patch adds a predecessor field to the SenderState base class to make the process of linking them up more uniform, and enable a traversal of the stack without knowing the specific type of the subclasses. There are a number of simplifications done as part of changing the SenderState, particularly in the RubyTest.
2013-02-15mem: Tighten up cache constness and scopingAndreas Hansson
This patch merely adopts a more strict use of const for the cache member functions and variables, and also moves a large portion of the member functions from public to protected.
2013-02-15sim: Add a system-global option to bypass cachesAndreas Sandberg
Virtualized CPUs and the fastmem mode of the atomic CPU require direct access to physical memory. We currently require caches to be disabled when using them to prevent chaos. This is not ideal when switching between hardware virutalized CPUs and other CPU models as it would require a configuration change on each switch. This changeset introduces a new version of the atomic memory mode, 'atomic_noncaching', where memory accesses are inserted into the memory system as atomic accesses, but bypass caches. To make memory mode tests cleaner, the following methods are added to the System class: * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'. * isTimingMode() -- True if the memory mode is 'timing'. * bypassCaches() -- True if caches should be bypassed. The old getMemoryMode() and setMemoryMode() methods should never be used from the C++ world anymore.
2013-01-28cache: remove drainManager because it's not usedAnthony Gutierrez
the cache drainManager is set but never cleared, this is because the cache itself does not need to be drained and thus never triggers a signalDrainDone(). because the drainManager variable is not used properly and does not appear to be necessary it has been removed with this patch.
2013-01-08mem: Make LL/SC locks fine grainedMitch Hayenga
The current implementation in gem5 just keeps a list of locks per cacheline. Due to this, a store to a non-overlapping portion of the cacheline can cause an LL/SC pair to fail. This patch simply adds an address range to the lock structure, so that the lock is only invalidated if the store overlaps the lock range.
2013-01-07mem: Fix guest corruption when caches handle uncacheable accessesAndreas Sandberg
When the classic gem5 cache sees an uncacheable memory access, it used to ignore it or silently drop the cache line in case of a write. Normally, there shouldn't be any data in the cache belonging to an uncacheable address range. However, since some architecture models don't implement cache maintenance instructions, there might be some dirty data in the cache that is discarded when this happens. The reason it has mostly worked before is because such cache lines were most likely evicted by normal memory activity before a TLB flush was requested by the OS. Previously, the cache model would invalidate cache lines when they were accessed by an uncacheable write. This changeset alters this behavior so all uncacheable memory accesses cause a cache flush with an associated writeback if necessary. This is implemented by reusing the cache flushing machinery used when draining the cache, which implies that writebacks are performed using functional accesses.
2013-01-07mem: Remove the IIC replacement policyAndreas Sandberg
The IIC replacement policy seems to be unused and has probably gathered too much bit rot to be useful. This patch removes the IIC and its associated cache parameters.
2013-01-07sim: Fatal if a clocked object is set to have a clock of 0Andreas Hansson
This patch adds a check to the clocked object constructor to ensure it is not configured to have a clock period of 0.
2013-01-07cache: add note about where conflicts are handledAli Saidi
2012-11-02mem: Add support for writing back and flushing cachesAndreas Sandberg
This patch adds support for the following optional drain methods in the classical memory system's cache model: memWriteback() - Write back all dirty cache lines to memory using functional accesses. memInvalidate() - Invalidate all cache lines. Dirty cache lines are lost unless a writeback is requested. Since memWriteback() is called when checkpointing systems, this patch adds support for checkpointing systems with caches. The serialization code now checks whether there are any dirty lines in the cache. If there are dirty lines in the cache, the checkpoint is flagged as bad and a warning is printed.
2012-11-02sim: Move the draining interface into a separate base classAndreas Sandberg
This patch moves the draining interface from SimObject to a separate class that can be used by any object needing draining. However, objects not visible to the Python code (i.e., objects not deriving from SimObject) still depend on their parents informing them when to drain. This patch also gets rid of the CountedDrainEvent (which isn't really an event) and replaces it with a DrainManager.
2012-11-02sim: Include object header files in SWIG interfacesAndreas Sandberg
When casting objects in the generated SWIG interfaces, SWIG uses classical C-style casts ( (Foo *)bar; ). In some cases, this can degenerate into the equivalent of a reinterpret_cast (mainly if only a forward declaration of the type is available). This usually works for most compilers, but it is known to break if multiple inheritance is used anywhere in the object hierarchy. This patch introduces the cxx_header attribute to Python SimObject definitions, which should be used to specify a header to include in the SWIG interface. The header should include the declaration of the wrapped object. We currently don't enforce header the use of the header attribute, but a warning will be generated for objects that do not use it.
2012-10-15Port: Add protocol-agnostic ports in the port hierarchyAndreas Hansson
This patch adds an additional level of ports in the inheritance hierarchy, separating out the protocol-specific and protocl-agnostic parts. All the functionality related to the binding of ports is now confined to use BaseMaster/BaseSlavePorts, and all the protocol-specific parts stay in the Master/SlavePort. In the future it will be possible to add other protocol-specific implementations. The functions used in the binding of ports, i.e. getMaster/SlavePort now use the base classes, and the index parameter is updated to use the PortID typedef with the symbolic InvalidPortID as the default.
2012-10-15Fix: Address a few minor issues identified by cppcheckAndreas Hansson
This patch addresses a number of smaller issues identified by the code inspection utility cppcheck. There are a number of identified leaks in the arm/linux/system.cc (although the function only get's called once so it is not a major problem), a few deletes in dev/x86/i8042.cc that were not array deletes, and sprintfs where the character array had one element less than needed. In the IIC tags there was a function allocating an array of longs which is in fact never used.
2012-10-15Mem: Use cycles to express cache-related latenciesAndreas Hansson
This patch changes the cache-related latencies from an absolute time expressed in Ticks, to a number of cycles that can be scaled with the clock period of the caches. Ultimately this patch serves to enable future work that involves dynamic frequency scaling. As an immediate benefit it also makes it more convenient to specify cache performance without implicitly assuming a specific CPU core operating frequency. The stat blocked_cycles that actually counter in ticks is now updated to count in cycles. As the timing is now rounded to the clock edges of the cache, there are some regressions that change. Plenty of them have very minor changes, whereas some regressions with a short run-time are perturbed quite significantly. A follow-on patch updates all the statistics for the regressions.
2012-09-25MEM: Put memory system document into doxygenDjordje Kovacevic
2012-09-25Cache: add a response latency to the cachesMrinmoy Ghosh
In the current caches the hit latency is paid twice on a miss. This patch lets a configurable response latency be set of the cache for the backward path.
2012-09-19AddrRange: Transition from Range<T> to AddrRangeAndreas Hansson
This patch takes the final plunge and transitions from the templated Range class to the more specific AddrRange. In doing so it changes the obvious Range<Addr> to AddrRange, and also bumps the range_map to be AddrRangeMap. In addition to the obvious changes, including the removal of redundant includes, this patch also does some house keeping in preparing for the introduction of address interleaving support in the ranges. The Range class is also stripped of all the functionality that is never used. --HG-- rename : src/base/range.hh => src/base/addr_range.hh rename : src/base/range_map.hh => src/base/addr_range_map.hh
2012-09-11clang: Fix issues identified by the clang static analyzerAndreas Hansson
This patch addresses a few minor issues reported by the clang static analyzer. The analysis was run with: scan-build -disable-checker deadcode \ -enable-checker experimental.core \ -disable-checker experimental.core.CastToStruct \ -enable-checker experimental.cpluscplus
2012-09-11Cache: Split invalidateBlk up to seperate block vs. tagsLena Olson
This seperates the functionality to clear the state in a block into blk.hh and the functionality to udpate the tag information into the tags. This gets rid of the case where calling invalidateBlk on an already-invalid block does something different than calling it on a valid block, which was confusing.